All Issue

2021 Vol.40, Issue 6 Preview Page

Research Article

30 November 2021. pp. 578-586
Abstract
References
1
F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," Proc. INTERSPEECH, 437-440 (2011). 10.21437/Interspeech.2011-169
2
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016). 10.1109/ICASSP.2016.7472621
3
A. Vaswami, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS. 5998-6008 (2017).
4
T. Hori, R. Astudillo, T. Hayashi, Y. Zhang, S. Watanabe, and J. L. Roux, "Cycle-consistency training for end-to-end speech recognition," Proc. ICASSP. 6271-6275 (2019). 10.1109/ICASSP.2019.8683307
5
M.-K. Baskar, S. Watanabe, R. Astudillo, T. Hori, L. Burget, and J. Cernocky, "Semi-supervised sequence- to-sequence ASR using unpaired speech and text," Proc. ICASSP. 3790-3794 (2019). 10.21437/Interspeech.2019-3167
6
Q. Xie, Z. Dai, E. Hovy, M. T. Luong, and Q. V. Le, "Unsupervised data augmentation for consistency training," arXiv:1904.12848 (2019).
7
J. Li, M. L. Seltzer, X. Wang, R. Zhao, and Y. Gong, "Large-scale domain adaptation via teacher-student learning," Proc. INTERSPEECH, 2386-2390 (2017). 10.21437/Interspeech.2017-519
8
Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self- training with noisy student improves ImageNet classification," Proc. CVPR. 10687-10698 (2020). 10.1109/CVPR42600.2020.01070
9
N. Jaitly and G. E. Hinton, "Vocal tract length perturbation (VTLP) improves speech recognition," Proc. ICML. 625-660 (2013).
10
D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, "SpecAugment: A simple data augmentation method for automatic speech recognition," Proc. INTERSPEECH, 2613-2617 (2019). 10.21437/Interspeech.2019-2680
11
X. Song, Z. Wu, Y. Huang, D. Su, and H. Meng, "SpecSwap: A simple data augmentation method for end-to-end speech recognition," Proc. INTERSPEECH, 581-585 (2020). 10.21437/Interspeech.2020-2275
12
D. P. Kingma and M. Welling, "Auto-encoding variational bayes," Proc. ICLR. 1-14 (2014).
13
D. B. Paul and J. M. Baker, "The design for the Wall Street Journal-based CSR corpus," Proc. ACL. 357-362 (1992). 10.3115/1075527.1075614
14
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "LibriSpeech: An ASR corpus based on public domain audio books," Proc. ICASSP. 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
15
S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "ESPnet: End- to-end speech processing toolkit," Proc. INTERSPEECH, 2207-2211 (2018). 10.21437/Interspeech.2018-145629730221
16
L. V. D. Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008).
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :6
  • Pages :578-586
  • Received Date : 2021-09-28
  • Accepted Date : 2021-11-04