Research Article
J. Hansen and T. Hasan, “Speaker recognition by machines and humans: A tutorial review,” IEEE Signal Process. Mag. 32, 74-99 (2015).
10.1109/MSP.2015.2462851M. MohammadAmini, D. Matrouf, J.-F. Bonastre, S. Dowerah, R. Serizel, and D. Jouvet, “Learning noise robust resnet-based speaker embedding for speaker recognition,” Proc. Odyssey 2022: The Speaker and Language Recognition Workshop, 41-46 (2022).
10.21437/Odyssey.2022-6C. Lim, H. Shin, J. Kim, J. Heo, K. Koo, S. Kim, and H. Yu, “Improving noise robustness in self-supervised pre-trained model for speaker verification,” Proc. Interspeech, 2665-2669 (2024).
10.21437/Interspeech.2024-1630K. Zhang, Z. Hua, R. Lan, Y. Guo, Y. Zhang, and G. Xu, “Multi-view collaborative learning network for speech deepfake detection,” Proc. AAAI Conf. Artif. Intell. 1075-1083 (2025).
10.1609/aaai.v39i1.32094J. Chung, A. Nagrani, and A. Zisserman, “VoxCeleb2: Deep speaker recognition,” Proc. Interspeech, 1086- 1090 (2018).
10.21437/Interspeech.2018-1929D. Snyder, G. Chen, and D. Povey, “MUSAN: A music, speech, and noise corpus,” arXiv:1510.08484 (2015).
G. Hu and D. Wang, “A tandem algorithm for pitch estimation and voiced speech segregation,” IEEE Trans. on Audio, Speech, and Lang. Process. 18, 2067-2079 (2010).
10.1109/TASL.2010.2041110J. Huh, J. S. Chung, A. Nagrani, A. Brown, J.-W. Jung, D. Garcia-Romero, and A. Zisserman, “The VoxCeleb speaker recognition challenge: A retrospective,” IEEE/ACM Trans. Audio Speech Lang. Process. 32, 3850-3866 (2024).
10.1109/TASLP.2024.3444456H. S. Heo, K. Nam, B. J. Lee, Y. Kwon, M. Lee, Y. J. Kim, and J. S. Chung, “Rethinking session variability: Leveraging session embeddings for session robustness in speaker verification,” Proc. ICASSP, 12321-12325 (2024).
10.1109/ICASSP48485.2024.10445987C. Richey, M. A. Barrios, Z. Armstrong, C. Bartels, H. Franco, M. Graciarena, A. Lawson, M. K. Nandwana, A. Stauffer, J. van Hout, P. Gamble, J. Hetherly, C. Stephenson, and K. Ni, “Voices Obscured in Complex Environmental Settings (VOiCES) corpus,” Proc. Interspeech, 1566-1570 (2018).
10.21437/Interspeech.2018-1454C. Sanyuan, W. Chengyi, C. Zhengyang, W. Yu, L. Shujie, C. Zhuo, L. Jinyu, K. Naoyuki, Y. Takuya, X. Xiong, W. Jian, Z. Long, R. Shuo, Q. Yanmin, Q. Yao, Z. Michael, Y. Xiangzhan, and W. Furu, “WavLM: Large-scale self-supervised pre-training for full stack speech processing,” IEEE J. Sel. Top. Signal Process. 16, 1505-1518 (2022).
10.1109/JSTSP.2022.3188113- Publisher :The Acoustical Society of Korea
- Publisher(Ko) :한국음향학회
- Journal Title :The Journal of the Acoustical Society of Korea
- Journal Title(Ko) :한국음향학회지
- Volume : 44
- No :5
- Pages :548-555
- Received Date : 2025-08-08
- Accepted Date : 2025-09-09
- DOI :https://doi.org/10.7776/ASK.2025.44.5.548



The Journal of the Acoustical Society of Korea









