Research Article
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, "Robust speech recognition via large-scale weak supervision," Proc. 40th ICML, 28492-28518 (2023).
Y. Gong, S. Khurana, L. Karlinsky, and J. Glass, "Whisper-AT: Noise-robust automatic speech recognizers are also strong general audio event taggers," Proc. Interspeech, 2798-2802 (2023).
10.21437/Interspeech.2023-2193A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A framework for self-supervised learning of speech representations," Adv. Neural. Inf. Process. Syst. 33, 12449-12460 (2020).
W. N. Hsu, B. Bolte, Y. H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, "Hubert: selfsupervised speech representation learning by masked prediction of hidden units," IEEE/ACM Trans. Audio. Speech. Lang. Process. 29, 3451-3460 (2021).
10.1109/TASLP.2021.3122291S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," Proc. IEEE, ICASSP, 131-135 (2017).
10.1109/ICASSP.2017.7952132Y. Gong, Y. A. Chung, and J. Glass, "AST: Audio spectrogram transformer," Proc. Interspeech, 571-575 (2021).
10.21437/Interspeech.2021-69833559302P. Y. Huang, H. Xu, J. Li, A. Baevski, M. Auli, W. Galuba, F. Metze, and C. Feichtenhofer, "Masked autoencoders that listen," Adv. Neural. Inf. Process. Syst. 35, 28708-28720 (2022).
J. Kim, K. Min, M. Jung, and S. Chi, "Occupant behavior monitoring and emergency event detection in single-person households using deep learning-based sound recognition," Build. Environ. 181, 107092 (2020).
10.1016/j.buildenv.2020.107092J. Sharma, O. C. Granmo, and M. Goodwin, "Emergency detection with environment sound using deep convolutional neural networks," Proc. 5th ICICT, 144-154 (2020).
10.1007/978-981-15-5859-7_14Y. J. Jeong, Y. A. Jung, S. H. Kim, and D. H. Kim, "Implementation of integrated platform of risk prevention and STT service for the deaf using deep learning," J. Digit. Contents Soc. Kr. 23, 1459-1467 (2022).
10.9728/dcs.2022.23.8.1459D. Macháček, R. Dabre, and O. Bojar, "Turning whisper into real-time transcription system," Proc. IJCNLP-AACL Demos, 17-24 (2023).
10.18653/v1/2023.ijcnlp-demo.3- Publisher :The Acoustical Society of Korea
- Publisher(Ko) :한국음향학회
- Journal Title :The Journal of the Acoustical Society of Korea
- Journal Title(Ko) :한국음향학회지
- Volume : 44
- No :2
- Pages :132-143
- Received Date : 2025-01-03
- Accepted Date : 2025-02-27
- DOI :https://doi.org/10.7776/ASK.2025.44.2.132