All Issue

2022 Vol.41, Issue 3 Preview Page

Research Article

31 May 2022. pp. 335-341
Abstract
References
1
B. Juang and L. Rabiner, "Hidden Markov models for speech recognition," Technometrics, 33, 251-272 (1991). 10.1080/00401706.1991.10484833
2
A. Senior, H. Sak, and I. Shafran, "Context dependent phone models for LSTM RNN acoustic modelling," Proc. IEEE ICASSP, 4585-4589 (2015). 10.1109/ICASSP.2015.7178839
3
J. Li, V. Lavrukhin, B. Ginsburg, and R. Leary, "Jasper: An end-to-end convolutional neural acoustic model," arXiv preprint arXiv:1904.03288 (2019). 10.21437/Interspeech.2019-1819
4
K. Chen and Q. Huo, "Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive- chunk BPTT approach," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (2016). 10.1109/TASLP.2016.2539499
5
L. Bahl, P. Brown, P. Souza, and R. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," Proc. ICASSP, 49-52 (1986).
6
D. Povey, D. Kanevsky, B. Kingsbury, B. Ranabhadran, G. Saon, and K. Visweswariah, "Boosted MMI for model and feature-space discriminative training," Proc. IEEE ICASSP, 4057-4060 (2008). 10.1109/ICASSP.2008.4518545
7
M. Gibson and T. Hain, "Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition," Proc. Interspeech, 2406-2409 (2006). 10.21437/Interspeech.2006-603
8
D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, and V. Manohar, "Purely sequence-trained neural networks for ASR based on lattice-free MMI," Proc. Interspeech, 2751-2755 (2016). 10.21437/Interspeech.2016-595
9
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, 30 (2017).
10
K. Vesely, A. Ghoshal, L. Burget, and D. Povey, "Sequence-discriminative training of deep neural networks," Proc. Interspeech, 2345-2349 (2013). 10.21437/Interspeech.2013-548
11
Y. Wang, A. mohamed, D. Le, C. Liu, and A. Xiao, "Transformer-based acoustic modeling for hybrid speech recognition," Proc. IEEE ICASSP, 6874-6878 (2020). 10.1109/ICASSP40776.2020.905434533123278PMC7591995
12
V. Panayotov, G. Chen, D. Povey, and S.Khudanpur, "Librispeech: an asr corpus based on public domain audio books," Proc. IEEE ICASSP, 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
13
S. Watanabe, T. Hori, S. karita, and T. Hayashi, "Espnet: End-to-end speech processing toolkit," arXiv preprint arXiv:1804.00015 (2018). 10.21437/Interspeech.2018-145629730221
14
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," Proc. ASRU, (2011).
15
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, and Z. Vito, "Pytorch: An imperative style, high- performance deep learning library," Advances in neural information processing systems, 32 (2019).
16
L. Lu, X. Xiao, Z. Chen, and Y. Gong, "Pykaldi2: Yet another speech toolkit based on kaldi and pytorch," arXiv preprint arXiv:1907.05955 (2019).
17
Y. Shao and Y. Wang, "Pychain: A fully parallelized pytorch implementation of lf-mmi for end-to-end asr," arXiv preprint arXiv:2005.09824 (2020). 10.21437/Interspeech.2020-3053PMC7696626
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 41
  • No :3
  • Pages :335-341
  • Received Date :2022. 03. 21
  • Revised Date :2022. 05. 09
  • Accepted Date : 2022. 05. 09