All Issue

2021 Vol.40, Issue 5 Preview Page

Research Article

30 September 2021. pp. 515-522
H. Hu, M. Xu, and W. Wu, "GMM supervector based SVM with spectral features for speech emotion recognition," Proc. ICASSP. 413-416 (2007). 10.1109/ICASSP.2007.36693717611316
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, and B. Schuller, "Deep neural networks for acoustic emotion recognition: Raising the benchmarks," Proc. ICASSP. 5688-5691 (2011). 10.1109/ICASSP.2011.5947651
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," Proc. ICASSP. 5200-5204 (2016). 10.1109/ICASSP.2016.7472669
S. Mirsamadi, E. Barsoum, and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," Proc. ICASSP. 2227-2231 (2017). 10.1109/ICASSP.2017.7952552
J. Kim, G. Englebienne, K. P. Truong, and V. Eversu, "Towards speech emotion recognition "in the Wild" using aggregated corpora and deep multi-task learning," Proc. Interspeech, 1113-1117 (2017). 10.21437/Interspeech.2017-736
S. Yoon, S. Byun, and K. Jung, "Multimodal speech emotion recognition using audio and text," Proc. SLT. 112-118 (2018). 10.1109/SLT.2018.8639583
Z. Lu, L. Cao, Y. Zhang, C. Chiu, and J. Fan, "Speech sentiment analysis via pre-trained features from end- to-end ASR models," Proc. ICASSP. 7149-7153 (2020). 10.1109/ICASSP40776.2020.9052937
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and Ł.Kaiser, "Attention is all you need," Proc. NIPS. 6000-6010 (2017).
J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," Proc. NAACL-HLT. 4171-4186 (2019).
A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "Wav2vec 2.0: A framework for self-supervised learning of speech representations," Proc. NeurIPS. 12449-12460 (2020).
C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S.l Kim, J. N. Chang, S. Lee, and S. S. Narayanan, "IEMOCAP: interactive emotional dyadic motion capture database," Language Resources and Evaluation, 42, 335-359 (2008). 10.1007/s10579-008-9076-6
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," Proc. ICASSP. 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016). 10.1109/ICASSP.2016.7472621
A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proc. ICASSP. 6645-6649 (2013). 10.1109/ICASSP.2013.6638947
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks," Proc. ICML. 369-376 (2006). 10.1145/1143844.1143891
S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, "Hybrid CTC/attention architecture for end- to-end speech recognition," IEEE JSTSP. 11, 1240- 1253 (2017). 10.1109/JSTSP.2017.2763455
T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing," Proc. EMNLP 66-71 (2018). 10.18653/v1/D18-201229382465
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "PyTorch: An imperative style, high-performance deep learning library," Proc. NeurIPS. 8024-8035 (2019).
  • Publisher :The Acoustical Society Of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :5
  • Pages :515-522
  • Received Date :2021. 07. 16
  • Accepted Date : 2021. 08. 25