All Issue

2021 Vol.40, Issue 5 Preview Page

Research Article

30 September 2021. pp. 515-522
Abstract
References
1
H. Hu, M. Xu, and W. Wu, "GMM supervector based SVM with spectral features for speech emotion recognition," Proc. ICASSP. 413-416 (2007). 10.1109/ICASSP.2007.36693717611316
2
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, and B. Schuller, "Deep neural networks for acoustic emotion recognition: Raising the benchmarks," Proc. ICASSP. 5688-5691 (2011). 10.1109/ICASSP.2011.5947651
3
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," Proc. ICASSP. 5200-5204 (2016). 10.1109/ICASSP.2016.7472669
4
S. Mirsamadi, E. Barsoum, and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," Proc. ICASSP. 2227-2231 (2017). 10.1109/ICASSP.2017.7952552
5
J. Kim, G. Englebienne, K. P. Truong, and V. Eversu, "Towards speech emotion recognition "in the Wild" using aggregated corpora and deep multi-task learning," Proc. Interspeech, 1113-1117 (2017). 10.21437/Interspeech.2017-736
6
S. Yoon, S. Byun, and K. Jung, "Multimodal speech emotion recognition using audio and text," Proc. SLT. 112-118 (2018). 10.1109/SLT.2018.8639583
7
Z. Lu, L. Cao, Y. Zhang, C. Chiu, and J. Fan, "Speech sentiment analysis via pre-trained features from end- to-end ASR models," Proc. ICASSP. 7149-7153 (2020). 10.1109/ICASSP40776.2020.9052937
8
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and Ł.Kaiser, "Attention is all you need," Proc. NIPS. 6000-6010 (2017).
9
J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," Proc. NAACL-HLT. 4171-4186 (2019).
10
A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "Wav2vec 2.0: A framework for self-supervised learning of speech representations," Proc. NeurIPS. 12449-12460 (2020).
11
C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S.l Kim, J. N. Chang, S. Lee, and S. S. Narayanan, "IEMOCAP: interactive emotional dyadic motion capture database," Language Resources and Evaluation, 42, 335-359 (2008). 10.1007/s10579-008-9076-6
12
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," Proc. ICASSP. 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
13
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016). 10.1109/ICASSP.2016.7472621
14
A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proc. ICASSP. 6645-6649 (2013). 10.1109/ICASSP.2013.6638947
15
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks," Proc. ICML. 369-376 (2006). 10.1145/1143844.1143891
16
S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, "Hybrid CTC/attention architecture for end- to-end speech recognition," IEEE JSTSP. 11, 1240- 1253 (2017). 10.1109/JSTSP.2017.2763455
17
T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing," Proc. EMNLP 66-71 (2018). 10.18653/v1/D18-201229382465
18
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "PyTorch: An imperative style, high-performance deep learning library," Proc. NeurIPS. 8024-8035 (2019).
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 40
  • No :5
  • Pages :515-522
  • Received Date : 2021-07-16
  • Accepted Date : 2021-08-25