All Issue

2025 Vol.44, Issue 5 Preview Page

Research Article

30 September 2025. pp. 548-555
Abstract
References
1

J. Hansen and T. Hasan, “Speaker recognition by machines and humans: A tutorial review,” IEEE Signal Process. Mag. 32, 74-99 (2015).

10.1109/MSP.2015.2462851
2

M. MohammadAmini, D. Matrouf, J.-F. Bonastre, S. Dowerah, R. Serizel, and D. Jouvet, “Learning noise robust resnet-based speaker embedding for speaker recognition,” Proc. Odyssey 2022: The Speaker and Language Recognition Workshop, 41-46 (2022).

10.21437/Odyssey.2022-6
3

C. Lim, H. Shin, J. Kim, J. Heo, K. Koo, S. Kim, and H. Yu, “Improving noise robustness in self-supervised pre-trained model for speaker verification,” Proc. Interspeech, 2665-2669 (2024).

10.21437/Interspeech.2024-1630
4

K. Zhang, Z. Hua, R. Lan, Y. Guo, Y. Zhang, and G. Xu, “Multi-view collaborative learning network for speech deepfake detection,” Proc. AAAI Conf. Artif. Intell. 1075-1083 (2025).

10.1609/aaai.v39i1.32094
5

J. Chung, A. Nagrani, and A. Zisserman, “VoxCeleb2: Deep speaker recognition,” Proc. Interspeech, 1086- 1090 (2018).

10.21437/Interspeech.2018-1929
6

D. Snyder, G. Chen, and D. Povey, “MUSAN: A music, speech, and noise corpus,” arXiv:1510.08484 (2015).

7

G. Hu and D. Wang, “A tandem algorithm for pitch estimation and voiced speech segregation,” IEEE Trans. on Audio, Speech, and Lang. Process. 18, 2067-2079 (2010).

10.1109/TASL.2010.2041110
8

J. Huh, J. S. Chung, A. Nagrani, A. Brown, J.-W. Jung, D. Garcia-Romero, and A. Zisserman, “The VoxCeleb speaker recognition challenge: A retrospective,” IEEE/ACM Trans. Audio Speech Lang. Process. 32, 3850-3866 (2024).

10.1109/TASLP.2024.3444456
9

H. S. Heo, K. Nam, B. J. Lee, Y. Kwon, M. Lee, Y. J. Kim, and J. S. Chung, “Rethinking session variability: Leveraging session embeddings for session robustness in speaker verification,” Proc. ICASSP, 12321-12325 (2024).

10.1109/ICASSP48485.2024.10445987
10

C. Richey, M. A. Barrios, Z. Armstrong, C. Bartels, H. Franco, M. Graciarena, A. Lawson, M. K. Nandwana, A. Stauffer, J. van Hout, P. Gamble, J. Hetherly, C. Stephenson, and K. Ni, “Voices Obscured in Complex Environmental Settings (VOiCES) corpus,” Proc. Interspeech, 1566-1570 (2018).

10.21437/Interspeech.2018-1454
11

C. Sanyuan, W. Chengyi, C. Zhengyang, W. Yu, L. Shujie, C. Zhuo, L. Jinyu, K. Naoyuki, Y. Takuya, X. Xiong, W. Jian, Z. Long, R. Shuo, Q. Yanmin, Q. Yao, Z. Michael, Y. Xiangzhan, and W. Furu, “WavLM: Large-scale self-supervised pre-training for full stack speech processing,” IEEE J. Sel. Top. Signal Process. 16, 1505-1518 (2022).

10.1109/JSTSP.2022.3188113
12

B. Desplanques, J. Thienpondt, and K. Demuynck, “ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification,” Proc. Interspeech, 3830-3834 (2020).

10.21437/Interspeech.2020-2650
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 44
  • No :5
  • Pages :548-555
  • Received Date : 2025-08-08
  • Accepted Date : 2025-09-09