All Issue

2025 Vol.44, Issue 2 Preview Page

Research Article

31 March 2025. pp. 132-143
Abstract
References
1

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust speech recognition via large-scale weak supervision," Proc. 40th ICML, 28492-28518 (2023).

2

Y. Gong, S. Khurana, L. Karlinsky, and J. Glass, "Whisper-AT: Noise-robust automatic speech recognizers are also strong general audio event taggers," Proc. Interspeech, 2798-2802 (2023).

10.21437/Interspeech.2023-2193
3

A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A framework for self-supervised learning of speech representations," Adv. Neural. Inf. Process. Syst. 33, 12449-12460 (2020).

4

W. N. Hsu, B. Bolte, Y. H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, "Hubert: selfsupervised speech representation learning by masked prediction of hidden units," IEEE/ACM Trans. Audio. Speech. Lang. Process. 29, 3451-3460 (2021).

10.1109/TASLP.2021.3122291
5

S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," Proc. IEEE, ICASSP, 131-135 (2017).

10.1109/ICASSP.2017.7952132
6

Y. Gong, Y. A. Chung, and J. Glass, "AST: Audio spectrogram transformer," Proc. Interspeech, 571-575 (2021).

10.21437/Interspeech.2021-69833559302
7

P. Y. Huang, H. Xu, J. Li, A. Baevski, M. Auli, W. Galuba, F. Metze, and C. Feichtenhofer, "Masked autoencoders that listen," Adv. Neural. Inf. Process. Syst. 35, 28708-28720 (2022).

8

J. Kim, K. Min, M. Jung, and S. Chi, "Occupant behavior monitoring and emergency event detection in single-person households using deep learning-based sound recognition," Build. Environ. 181, 107092 (2020).

10.1016/j.buildenv.2020.107092
9

J. Sharma, O. C. Granmo, and M. Goodwin, "Emergency detection with environment sound using deep convolutional neural networks," Proc. 5th ICICT, 144-154 (2020).

10.1007/978-981-15-5859-7_14
10

Y. J. Jeong, Y. A. Jung, S. H. Kim, and D. H. Kim, "Implementation of integrated platform of risk prevention and STT service for the deaf using deep learning," J. Digit. Contents Soc. Kr. 23, 1459-1467 (2022).

10.9728/dcs.2022.23.8.1459
11

D. Macháček, R. Dabre, and O. Bojar, "Turning whisper into real-time transcription system," Proc. IJCNLP-AACL Demos, 17-24 (2023).

10.18653/v1/2023.ijcnlp-demo.3
12

D. Liu, G. Spanakis, and J. Niehues, "Low-latency sequence-to-sequence speech recognition and translation by partial hypothesis selection," Proc. Interspeech, 3620-3624 (2020).

10.21437/Interspeech.2020-2897PMC7594873
13

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Adv. Neural. Inf. Process. Syst. 31, 6000-6010 (2017).

Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 44
  • No :2
  • Pages :132-143
  • Received Date : 2025-01-03
  • Accepted Date : 2025-02-27