All Issue

2022 Vol.41, Issue 3 Preview Page

Research Article

31 May 2022. pp. 359-366
A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," Proc. IEEE ICASSP, 373-376 (1996).
T. Yoshimura, K. Tokuda, T. Masuko, T, Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM based speech synthesis," Proc. Eurospeech, 2347-2350 (1999).
J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, "Natural tts synthesis by conditioning wavenet on mel spectrogram predictions," Proc. IEEE ICASSP, 4779- 4783 (2018). 10.1109/ICASSP.2018.8461368
Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T-Y. Liu. "Fastspeech2: Fast and high-quality end- to-end text to speech," arXiv:2006.04558 (2021).
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: a generative model for raw audio," arXiv:1609.03499 (2016).
R. Yamamoto, E. Song, and J. Kim. "Parallel waveGAN: A fast waveformgeneration model based on generative adversarial networks with multi-resolution spectrogram," Proc. IEEE ICASSP, 6199-6203 (2020). 10.1109/ICASSP40776.2020.9053795
Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T. Liu. "Fastspeech:Fast, robust and controllable text to speech," Proc. NIPS, 3165-3174 (2019).
A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Yu Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Proc. Interspeech, 5036-5040 (2020). 10.21437/Interspeech.2020-3015
M. Koo, "A korean speech recognition based on conformer" (In Korean), J. Acoust. Soc. Kr. 40, 488- 495 (2021)
P. Guo, F. Boyer, X. Chang, T. Hayashi, Y. Higuchi, H. Inaguma, N. Kamo, C. Li, D. Garcia-Romero, J. Shi, J. Shi, S. Watanabe, K. Wei, W. Zhang, and Y. Zhang, "Recent developments on espnet toolkit boosted by conformer," Proc. IEEE ICASSP, 5874-5878 (2021) 10.1109/ICASSP39728.2021.941485834060830
N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NeurIPS, 1-11 (2017)
P. Ramachandran, B. Zoph, and Q. V. Le, "Swish: A self-gated activation function," arXiv:1710.05941v1 (2017).
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 41
  • No :3
  • Pages :359-366
  • Received Date :2022. 03. 21
  • Revised Date :2022. 04. 29
  • Accepted Date : 2022. 05. 23