Research Article
J. Lim and A. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. Acoustics, Speech, and Signal Process. 26, 197-210 (1978).
10.1109/TASSP.1978.1163086S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE ICASSP, 27, 113-120 (1979).
10.1109/TASSP.1979.1163209J.-h. Jung and W. Kim, "A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum" (in Korean), J. Acoust. Soc. Kr. 41, 38-44 (2022).
Z. Huang, S. Watanabe, S. W. Yang, P. García, and S. Khudanpur, "Investigating self-supervised learning for speech enhancement and separation," Proc. IEEE ICASSP, 6837-6841 (2022).
10.1109/ICASSP43922.2022.9746303K.-H. Hung, S.-w. Fu, H.-h. Tseng, H.-T. Chiang, Y. Tsao, and C.-W. Lin, "Boosting self-supervised embeddings for speech enhancement," Proc. Interspeech, 186-190 (2022).
10.21437/Interspeech.2022-10002O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," Proc. MICCAI, 234-241 (2015).
10.1007/978-3-319-24574-4_28A. Baevski, H. Zhou, A. Mohamed, and M. Auli, "Wav2vec 2.0: a framework for self-supervised learning of speech representations," Proc. 34th Int. Conf. NeurIPS, 12449-12460 (2020).
C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, "Investigating RNN-based speech enhancement methods for noiserobust text-to-speech," Proc. 9th ISCA Speech Synthesis Workshop, 146-152 (2016).
10.21437/SSW.2016-24J. Thiemann, N. Ito, and E. Vincent, "The diverse environments multi-channel acoustic noise database: A database of multichannel environmental noise recordings," J. Acoust. Soc. Am. 133, 3591-3591 (2013).
10.1121/1.4806631E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, and Lang. Process. 14, 1462-1469 (2006).
10.1109/TSA.2005.858005A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," IEEE ICASSP, 749-752 (2001).
10.1109/ICASSP.2001.941023C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," IEEE ICASSP, 4214-4217 (2010).
10.1109/ICASSP.2010.5495701I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Trans. Audio, Speech, Lang. Process. 11, 466-475 (2003).
10.1109/TSA.2003.811544S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech enhancement generative adversarial network," Proc. Interspeech, 3642-3646 (2017).
10.21437/Interspeech.2017-1428- Publisher :The Acoustical Society of Korea
- Publisher(Ko) :한국음향학회
- Journal Title :The Journal of the Acoustical Society of Korea
- Journal Title(Ko) :한국음향학회지
- Volume : 44
- No :1
- Pages :58-65
- Received Date : 2024-11-14
- Revised Date : 2024-12-27
- Accepted Date : 2024-12-31
- DOI :https://doi.org/10.7776/ASK.2025.44.1.058