Research Article
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Trans. Audio Speech Lang. Process. 19, 788-798 (2011).
10.1109/TASL.2010.2064307E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, “Deep neural networks for small footprint text-dependent speaker verification,” Proc. ICASSP, 4052-4056 (2014).
10.1109/ICASSP.2014.6854363D. Snyder, D. Garcia-Romero, D. Povey, and S. Khudanpur, “Deep neural network embeddings for text-independent speaker verification,” Proc. Interspeech, 999-1003 (2017).
10.21437/Interspeech.2017-620D. Garcia-Romero, D. Snyder, G. Sell, D. Povey, and A. McCree, “Speaker diarization using deep neural network embeddings,” Proc ICASSP, 4930-4934 (2017).
10.1109/ICASSP.2017.7953094G. Sell and D. Garcia-Romero, “Diarization resegmentation in the factor analysis subspace,” Proc. ICASSP, 4794-4798 (2015).
10.1109/ICASSP.2015.7178881Q. Wang, C. Downey, L. Wan, P. A. Mansfield, and I. L. Moreno, “Speaker diarization with LSTM,” Proc. ICASSP, 5239-5243 (2018).
10.1109/ICASSP.2018.8462628M. Diez, L. Burget, S. Wang, J. Rohdin, and J. Černocký, “Bayesian HMM based x-vector clustering for speaker diarization,” Proc. Interspeech, 346-350 (2019).
10.21437/Interspeech.2019-2813Y. Fujita, N. Kanda, S. Horiguchi, K. Nagamatsu, and S. Watanabe, “End-to-end neural speaker diarization with permutation-free objectives,” Proc. Interspeech, 4300-4304 (2019).
10.21437/Interspeech.2019-2899Y. C. Liu, E. Han, C. Lee, and A. Stolcke, “End-to- end neural diarization: from transformer to conformer,” Proc. Interspeech, 3081-3085 (2021).
10.21437/Interspeech.2021-1909Y. Fujita, N. Kanda, S. Horiguchi, Y. Xue, K. Nagamatsu, and S. Watanabe, “End-to-end neural speaker diarization with self-attention,” Proc. ASRU, 296-303 (2019).
10.1109/ASRU46091.2019.9003959S. Horiguchi, Y. Fujita, S. Watanabe, Y. Xue, and P. García, “Encoder-decoder based attractors for end-to- end neural diarization,” in IEEE/ACM Trans. Audio Speech Lang. Process. 30, 1493-1507 (2022).
10.1109/TASLP.2022.3162080Y. Yu, D. Park, and H. Kook Kim, “Auxiliary loss of transformer with residual connection for end-to-end speaker diarization,” Proc. ICASSP, 8377-8381 (2022).
10.1109/ICASSP43922.2022.9746602J. Jung and W. Kim, “A study on end-to-end speaker diarization system using single-label classification” (in Korean), J. Acoust. Soc. Kr. 42, 536-543 (2023).
C. Qi and F. Su, “Contrastive-center loss for deep neural networks,” Proc. ICIP, 2851-2855 (2017).
10.1109/ICIP.2017.8296803V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” Proc. ICASSP, 5206-5210 (2015).
10.1109/ICASSP.2015.7178964D. Snyder, G. Chen, and D. Povey, “Musan: A music, speech, and noise corpus,” arXiv preprint arXiv:1510. 08484 (2015).
T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, “A study on data augmentation of reverberant speech for robust speech recognition,” Proc. ICASSP, 5220-5224 (2017).
10.1109/ICASSP.2017.79531522000 Nist Speaker Recognition Evaluation, https://catalog.ldc.upenn.edu/LDC2001S97, (Last viewed September, 23, 2025).
The 2009 (rt-09) Rich Transcription Meeting Recognition Evaluation Plan, https://web.archive.org/web/20100606092041if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf, (Last viewed September, 23, 2025).
- Publisher :The Acoustical Society of Korea
- Publisher(Ko) :한국음향학회
- Journal Title :The Journal of the Acoustical Society of Korea
- Journal Title(Ko) :한국음향학회지
- Volume : 44
- No :5
- Pages :525-532
- Received Date : 2025-08-05
- Accepted Date : 2025-09-04
- DOI :https://doi.org/10.7776/ASK.2025.44.5.525



The Journal of the Acoustical Society of Korea









