All Issue

2021 Vol.40, Issue 5 Preview Page

Research Article

30 September 2021. pp. 496-502
Abstract
References
1
J. H. L. Hansen and T. Hasan, "Speaker recognition by machines and humans: A tutorial review," IEEE Signal Processing Magazine, 32, 74-99 (2015). 10.1109/MSP.2015.2462851
2
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans on. Audio, Speech, and Lang. Process. 19, 788-798 (2011). 10.1109/TASL.2010.2064307
3
S. Ioffe, "Probabilistic linear discriminant analysis," Proc. ECCV. 531-542 (2006). 10.1007/11744085_41
4
A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, and M. Mason, "I-vector based speaker recognition on short utterances," Proc. Interspeech, 2341-2344 (2011). 10.21437/Interspeech.2011-58
5
A. Hajavi and A. Etemad, "A deep neural network for short-segment speaker recognition," Proc. Interspeech, 2878-2882 (2019). 10.21437/Interspeech.2019-2240
6
Y. Jung, S. M. Kye, Y. Choi, M. Jung, and H. Kim, "Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances," Proc. Interspeech, 1501- 1505 (2020). 10.21437/Interspeech.2020-1025
7
Y. Jung, Y. Choi, H. Lim, and H. Kim, "A unified deep learning framework for short-duration speaker verification in adverse environments," IEEE Access, 8, 175448-175466 (2020). 10.1109/ACCESS.2020.3025941
8
V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," Proc. Interspeech, 3214- 3218 (2015.) 10.21437/Interspeech.2015-647
9
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Proc. ICLR. 1-14 (2015).
10
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016). 10.1109/CVPR.2016.9026180094
11
A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A largescale speaker identification dataset," Proc. Interspeech, 2616-2620 (2017). 10.21437/Interspeech.2017-950
12
W. Cai, J. Chen, and M. Li, "Exploring the encoding layer and loss function in end-to-end speaker and language recognition system," Proc. Odyssey, 74-81 (2018). 10.21437/Odyssey.2018-11PMC5865263
13
Y. Jung, Y. Kim, H. Lim, Y. Choi, and H. Kim, "Spatial pyramid encoding with convex length normalization for text-independent speaker verification," Proc. Interspeech, 4030-4034 (2019). 10.21437/Interspeech.2019-2177
14
E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," Proc. IEEE ICASSP. 4052-4056 (2014). 10.1109/ICASSP.2014.6854363
15
Z. Huang, S. Wang, and K. Yu, "Angular softmax for short-duration text-independent speaker verification," Proc. Interspeech, 3623-3627 (2018). 10.21437/Interspeech.2018-1545
16
Y. Liu, L. He, and J. Liu, "Large margin softmax loss for speaker verification," Proc. Interspeech, 2873-2877 (2019). 10.21437/Interspeech.2019-2357
17
Y. Kim, W. Park, M-C. Roh, and J. Shin, "Groupface: learning latent groups and constructing group-based representations for face recognition," Proc. IEEE CVPR. 5621-5630 (2020). 10.1109/CVPR42600.2020.0056632538414
18
K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive statistics pooling for deep speaker embedding," Proc. Interspeech, 2252-2256 (2018). 10.21437/Interspeech.2018-993
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :5
  • Pages :496-502
  • Received Date : 2021-07-16
  • Accepted Date : 2021-08-23