All Issue

2021 Vol.40, Issue 5 Preview Page

Research Article

30 September 2021. pp. 488-495
S. Zhao, X. Xiao, Z. Zhang, T. N. T. Nguyen, X. Zhong, B. Ren, L. Wang, D. L. Jones, E. S. Chng, and H. Li, "Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reductio," Proc. 2015 IEEE ASRU. 460-467 (2015). 10.1109/ASRU.2015.7404831
Y. Tachioka, T. Narita, L. Miura, T. Uramoto, N. Monta, S. Uenohara, K. Furuya, S. Wanatanabe, and J. Le Roux, "Coupled initialization of multi-channel non- negative matrix factorization based on spatial and spectral information," Proc. Interspeech, 2461-2465 (2017). 10.21437/Interspeech.2017-61
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu, "Deep speech 2: end-to-end speech recognition in English and Mandarim," arXiv:1512.02595v1 (2015).
A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," Proc. ICML. 1764-1772 (2014).
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016). 10.1109/ICASSP.2016.7472621
A. Graves, A. r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," arXiv:1303.5778 (2013). 10.1109/ICASSP.2013.6638947
L. Dong, S. Xu, and B. Xu, "Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition," Proc. ICASSP. 5884-5888 (2018). 10.1109/ICASSP.2018.8462506
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: a generative model for raw audio," arXiv:1609.03499 (2016).
N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS 1-11 (2017).
Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," arXiv:1612.08083v3 (2017).
P. Ramachandran, B. Zoph, and Q. V. Le, " Swish: A self-gated activation function," arXiv:1710.05941v1 (2017).
S. Kim, S. Bae, and C. Won, "Open-source toolkit for end-to-end Korean speech recognition," Software Impacts, 7, 1-4 (2021). 10.1016/j.simpa.2021.100054
  • Publisher :The Acoustical Society Of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :5
  • Pages :488-495
  • Received Date :2021. 08. 02
  • Accepted Date : 2021. 09. 14