All Issue

2020 Vol.39, Issue 5 Preview Page

Research Article

30 September 2020. pp. 476-482
Abstract
References
1
I. Sutskever, O. Vinyals, and Q. Le, "Sequence to sequence learning with neural networks," Proc. Int. Conf. NIPS. 3104-3112 (2014).
2
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv:1409.0473 (2014).
3
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "Show, attend and tell: neural image caption generation with visual attention," Proc. ICML. 2048-2057 (2015).
4
S. Watanabe, T. Hori, S. Kim, J. Hershey, and T. Hayashi, "Hybrid CTC/attention architecture for end- to-end speech recognition," IEEE J. Selected Topics in Signal Processing, 11, 1240-1253 (2017).
10.1109/JSTSP.2017.2763455
5
H. Soltau, H. Liao, and H. Sak, "Neural speech recognizer: acoustic-to-word LSTM model for large vocabulary speech recognition," Proc. Interspeech, 3707- 3711 (2017).
10.21437/Interspeech.2017-1566
6
K. Audhkhasi, B. Kingsbury, B. Ramabhadran, G. Saon, and M. Picheny, "Building competitive direct acoustics-to-word models for English conversational speech recognition," Proc. IEEE ICASSP. 4759-4763 (2018).
10.1109/ICASSP.2018.8461935
7
C. Chiu, T. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani, "State-of-the-art speech recognition with sequence-to-sequence models," Proc. IEEE ICASSP. 4774-4778 (2018).
10.1109/ICASSP.2018.8462105
8
J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, "Attention-based models for speech recognition," Proc. Int. Conf. NIPS. 577-585 (2015).
10.1109/ICASSP.2016.7472618
9
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: a neural network for large vocabulary conversational speech recognition," Proc. IEEE ICASSP. 4960-4964 (2016).
10.1109/ICASSP.2016.7472621
10
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaizer, and I. Polosukhin, "Attention is all you need," Proc. Int. Conf. NIPS. 5998-6008 (2017).
11
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, 9, 1735-1780 (1997).
10.1162/neco.1997.9.8.17359377276
12
K. Greff, R. Srivastava, J. Koutnik, B. Steunebrink, and J. Schmidhuber, "LSTM: a search space odyssey," IEEE Trans. on Neural Networks and Learning Systems, 28, 2222-2232 (2017).
10.1109/TNNLS.2016.258292427411231
13
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time-series," in Handbook of Brain Theory and Neural Networks, edited by M. A. Arbib (MIT Press, 1995).
14
O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional neural networks for speech recognition," IEEE/ACM Trans. on Audio, Speech, and Language Processing, 22, 1533-1545 (2014).
10.1109/TASLP.2014.2339736
15
D. Lim, Improving seq2seq by revising attention mechanism for speech recognition, (Dissertation, Korea University, 2018).
16
Y. Zhang, W. Chan, and N. Jaitly, "Very deep convolutional networks for end-to-end speech recognition," Proc. IEEE ICASSP. 4845-4849 (2017).
10.1109/ICASSP.2017.7953077PMC5568090
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 39
  • No :5
  • Pages :476-482
  • Received Date : 2020-07-31
  • Accepted Date : 2020-09-14