All Issue

2022 Vol.41, Issue 3 Preview Page

Research Article

31 May 2022. pp. 335-341
B. Juang and L. Rabiner, "Hidden Markov models for speech recognition," Technometrics, 33, 251-272 (1991). 10.1080/00401706.1991.10484833
A. Senior, H. Sak, and I. Shafran, "Context dependent phone models for LSTM RNN acoustic modelling," Proc. IEEE ICASSP, 4585-4589 (2015). 10.1109/ICASSP.2015.7178839
J. Li, V. Lavrukhin, B. Ginsburg, and R. Leary, "Jasper: An end-to-end convolutional neural acoustic model," arXiv preprint arXiv:1904.03288 (2019). 10.21437/Interspeech.2019-1819
K. Chen and Q. Huo, "Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive- chunk BPTT approach," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (2016). 10.1109/TASLP.2016.2539499
L. Bahl, P. Brown, P. Souza, and R. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," Proc. ICASSP, 49-52 (1986).
D. Povey, D. Kanevsky, B. Kingsbury, B. Ranabhadran, G. Saon, and K. Visweswariah, "Boosted MMI for model and feature-space discriminative training," Proc. IEEE ICASSP, 4057-4060 (2008). 10.1109/ICASSP.2008.4518545
M. Gibson and T. Hain, "Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition," Proc. Interspeech, 2406-2409 (2006). 10.21437/Interspeech.2006-603
D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, and V. Manohar, "Purely sequence-trained neural networks for ASR based on lattice-free MMI," Proc. Interspeech, 2751-2755 (2016). 10.21437/Interspeech.2016-595
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, 30 (2017).
K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequence-discriminative training of deep neural networks," Proc. Interspeech, 2345-2349 (2013). 10.21437/Interspeech.2013-548
Y. Wang, A. mohamed, D. Le, C. Liu, and A. Xiao, "Transformer-based acoustic modeling for hybrid speech recognition," Proc. IEEE ICASSP, 6874-6878 (2020). 10.1109/ICASSP40776.2020.905434533123278PMC7591995
V. Panayotov, G. Chen, D. Povey, and S.Khudanpur, "Librispeech: an asr corpus based on public domain audio books," Proc. IEEE ICASSP, 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
S. Watanabe, T. Hori, S. karita, and T. Hayashi, "Espnet: End-to-end speech processing toolkit," arXiv preprint arXiv:1804.00015 (2018). 10.21437/Interspeech.2018-145629730221
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," Proc. ASRU, (2011).
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, and Z. Vito, "Pytorch: An imperative style, high- performance deep learning library," Advances in neural information processing systems, 32 (2019).
L. Lu, X. Xiao, Z. Chen, and Y. Gong, "Pykaldi2: Yet another speech toolkit based on kaldi and pytorch," arXiv preprint arXiv:1907.05955 (2019).
Y. Shao and Y. Wang, "Pychain: A fully parallelized pytorch implementation of lf-mmi for end-to-end asr," arXiv preprint arXiv:2005.09824 (2020). 10.21437/Interspeech.2020-3053PMC7696626
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 41
  • No :3
  • Pages :335-341
  • Received Date :2022. 03. 21
  • Revised Date :2022. 05. 09
  • Accepted Date : 2022. 05. 09