All Issue

2021 Vol.40, Issue 5 Preview Page

Research Article

30 September 2021. pp. 488-495
Abstract
References
1
S. Zhao, X. Xiao, Z. Zhang, T. N. T. Nguyen, X. Zhong, B. Ren, L. Wang, D. L. Jones, E. S. Chng, and H. Li, "Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reductio," Proc. 2015 IEEE ASRU. 460-467 (2015). 10.1109/ASRU.2015.7404831
2
Y. Tachioka, T. Narita, L. Miura, T. Uramoto, N. Monta, S. Uenohara, K. Furuya, S. Wanatanabe, and J. Le Roux, "Coupled initialization of multi-channel non- negative matrix factorization based on spatial and spectral information," Proc. Interspeech, 2461-2465 (2017). 10.21437/Interspeech.2017-61
3
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu, "Deep speech 2: end-to-end speech recognition in English and Mandarim," arXiv:1512.02595v1 (2015).
4
A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," Proc. ICML. 1764-1772 (2014).
5
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016). 10.1109/ICASSP.2016.7472621
6
A. Graves, A. r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," arXiv:1303.5778 (2013). 10.1109/ICASSP.2013.6638947
7
L. Dong, S. Xu, and B. Xu, "Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition," Proc. ICASSP. 5884-5888 (2018). 10.1109/ICASSP.2018.8462506
8
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: a generative model for raw audio," arXiv:1609.03499 (2016).
9
N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS 1-11 (2017).
10
Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," arXiv:1612.08083v3 (2017).
11
P. Ramachandran, B. Zoph, and Q. V. Le, " Swish: A self-gated activation function," arXiv:1710.05941v1 (2017).
12
S. Kim, S. Bae, and C. Won, "Open-source toolkit for end-to-end Korean speech recognition," Software Impacts, 7, 1-4 (2021). 10.1016/j.simpa.2021.100054
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :5
  • Pages :488-495
  • Received Date : 2021-08-02
  • Accepted Date : 2021-09-14