All Issue

2022 Vol.41, Issue 6 Preview Page

Research Article

30 November 2022. pp. 647-654
Abstract
References
1
H. Liao, G. Pundak, O. Siohan, M. Carroll, N. Coccaro, Q.-M. Jiang, T. N. Sainath, A. Senior, F. Beaufays, and M. Bacchiani, "Large vocabulary automatic speech recognition for children," Proc. Interspeech, 1611-1615 (2015). 10.21437/Interspeech.2015-37326129866PMC5568652
2
P. G. Shivakumar and P. Georgiou, "Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations," Computer speech and language, arXiv:1805.03322 (2020).
3
L. Rumberg, H. Ehlert, U. Lüdtke, and J. Ostermann, "Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning," Proc. Interspeech, 3850-3854 (2021). 10.21437/Interspeech.2021-1241
4
V. Kadyan, S. Shanawazuddin, and A. Singh, "Developing children's speech recognition system for low resource Punjabi language," Applied Acoustics, 178, 108002 (2021). 10.1016/j.apacoust.2021.108002
5
R. Serizel and D. Giuliani, "Vocal tract length normalisation approaches to DNN-based children's and adults' speech recognition," Proc. IEEE SLT, 135-140 (2014). 10.1109/SLT.2014.7078563
6
P. G. Shivakumar, A. Potamianos, S. Lee, and S. S. Narayanan, "Improving speech recognition for children using acoustic adaptation and pronunciation modeling," Proc. WOCCI, 15-19 (2014).
7
S. S. Gray, D. Willett, J. Lu, J. Pinto, P. Maergner, and N. Bodenstab, "Child automatic speech recognition for US English: child interaction with living-room- electronic-devices," Proc. WOCCI, 21-26 (2014).
8
R. Duan and N. F. Chen, "Unsupervised feature adaptation using adversarial multi-task training for automatic evaluation of children's speech," Proc. Interspeech, 3037-3041 (2020). 10.21437/Interspeech.2020-1657
9
Y. Cui, M. Jia, T. Y. Lin, Y. Song, and S. Belongie, "Class-balanced loss based on effective number of samples," Proc. IEEE CVPR, 9268-9277 (2019). 10.1109/CVPR.2019.00949
10
A. Sellami and H. Hwang, "A robust deep convolutional neural network with batch-weighted loss for heartbeat classification," Expert Systems with Applications, 122, 75-84 (2019). 10.1016/j.eswa.2018.12.037
11
K. R. M. Fernando and C. P. Tsokos, "Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks," IEEE Trans. Neural Netw. Learn. Syst. 33, 2940-2951 (2021). 10.1109/TNNLS.2020.304733533444149
12
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, "Analysis of representations for domain adaptation," Proc. NIPS, 137-144 (2006).
13
Y. Ganin and V. Lempitsky, "Unsupervised domain adaptation by backpropagation," Proc. ICML, 1180- 1189 (2015).
14
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł.Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS, 5998-6008 (2017).
15
L. Dong, S. Xu, and B. Xu, "Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition," Proc. IEEE ICASSP, 5884-5888 (2018). 10.1109/ICASSP.2018.8462506
16
H. Miao, G. Cheng, C. Gao, P. Zhang, and Y. Yan, "Transformer-based online CTC/attention end-to-end speech recognition architecture," Proc. IEEE ICASSP, 6084-6088 (2020). 10.1109/ICASSP40776.2020.9053165
17
Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, "Domain-adversarial training of neural networks," J. Mach. Learn. Res. 17, 2096-2030 (2016). 10.1007/978-3-319-58347-1_10
18
S. Kullback and R. A. Leibler. "On information and sufficiency," Ann. Math. Stat. 22, 79-86 (1951). 10.1214/aoms/1177729694
19
M. Chen, S. Zhao, H. Liu, and D. Cai, "Adversarial- learned loss for domain adaptation," Proc. AAAI, 3521-3528 (2020). 10.1609/aaai.v34i04.5757
20
PyTorch 1.12 documentation, "https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html ," (Last viewed September 16, 2022).
21
AI Hub Free Conversation (General Men and Women) Dataset, https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=109 , (Last viewed July 27, 2022).
22
AI Hub Free Conversation (Children, Infants) Dataset, https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=108 , (Last viewed July 27, 2022).
23
S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, and T. Ochiai, "ESPnet: end-to-end speech processing toolkit," arXiv: 1804.00015 (2018). 10.21437/Interspeech.2018-145629730221
24
T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing," Proc. EMNLP 66-71 (2018). 10.18653/v1/D18-201229382465
25
A. Tripathi, A. Mohan, S. Anand, and M. Singh, "Adversarial learning of raw speech features for domain invariant speech recognition," Proc. IEEE ICASSP, 5959-5963 (2018). 10.1109/ICASSP.2018.8462452
26
S. Sun, C. F. Yeh, M. Y. Hwang, M. Ostendorf, and L. Xie, "Domain adversarial training for accented speech recognition," Proc. IEEE ICASSP, 4854-4858 (2018). 10.1109/ICASSP.2018.8462663PMC6242895
27
L. Van der Maaten and G. Hinton, "Visualizing data using t-SNE," J. Mach. Learn. Res. 9, 2579-2605 (2008).
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 41
  • No :6
  • Pages :647-654
  • Received Date : 2022-07-29
  • Revised Date : 2022-09-16
  • Accepted Date : 2022-09-30