All Issue

2024 Vol.43, Issue 2 Preview Page

Research Article

31 March 2024. pp. 243-252
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, "Zero-shot text-to-image generation," Proc. ICML, 8821-8831 (2021).
OpenAI, "GPT-4 technical report," arXiv preprint, arXiv:2303.08774 (2023).
M. Pasini and J. Schluter, "Musika! fast infinite waveform music generation," arXiv preprint, arXiv: 2208.08706 (2022).
Z. Borsos, R. Marninier, D. Vincent, E. Kharitonov, O. Pietquin, M. Sharifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchi, and N. Zeghidour, "Audiolm: a language modrlling approach to audio generation," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 31, 2523-2533 (2023). 10.1109/TASLP.2023.3288409
J. Kong, J. Kim, and J. Bae, "Hifi-gan: generative adversarial networks for efficient and high fidelity speech synthesis," Proc. NeurIPS. 33, 17022-17033 (2020).
J. Engel, L. Hantrakul, C. Gu, and A. Roberts, "DDSP: differentiable digital signal processing," arXiv preprint, arXiv:2001.04643 (2020).
K. Choi, J. Im, L. M. Heller, B. McFee, K. Imoto, Y. Okamoto, M. Lagrange, and S. Takamichi, "Foley sound synthesis at the dcase 2023 challenge," arXiv preprint, arXiv:2304.12521 (2023).
H. C. Chung, "Foley sound synthesis based on GAN using contrastive learning without label information," DCASE2023, Tech. Rep., 2023.
Y. Yuan, H. Liu, X. Liu, X Kang, M. D. Plumbley, and W. Wang, "Latent diffusion model based Foley sound generation system for DCASE challenge 2023 task 7," arXiv preprint, arXiv:2305.15905 (2023).
X. Chen, N. Mishra, M. Rohaninejad, and P. Abbeel, "Pixelsnail: an improved autoregressive generative model," Proc. International Conference on Machine Learning, 864-872 (2018).
N. Zeghidour, A. Luebs, A. Omran, J. Skoguld, and M. Tagliasacchi, "Sonudstream: an end-to-end neural audio codec," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 30, 495-507 (2021). 10.1109/TASLP.2021.3129994
A. Déffossez, J. Copet, G. Synnaeve, and Y. Adi, "High fidelity neural audio compression," arXiv preprint, arXiv:2210.13438 (2022).
D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint, arXiv:1312.6114 (2013).
A. Caillon and P. Esling, "RAVE: a variational autoencoder for fast and high-quality neural audio synthesis," arXiv preprint, arXiv:2111.05011 (2021).
A. van den Oord and O. Vinyals, "Neural discrete representation learning," Proc. NeurIPS. 1-10 (2017).
A. Razavi, A. van den Oord, and O. Vinyals, "Generating diverse high-fidelity images with VQ-VAE-2," Proc. NeurIPS, 1-11 (2019).
D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv preprint, arXiv:1412.6980 (2014).
K. Kilgour, M. Zuluaga, D. Roblek, and M. Sharifi, "Fréchet audio distance: a metric for evaluating music enhancement algorithms," arXiv preprint, arXiv:1812.08466 (2018). 10.21437/Interspeech.2019-2219
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification," Proc. IEEE ICASSP, 131-135 (2017). 10.1109/ICASSP.2017.7952132
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 43
  • No :2
  • Pages :243-252
  • Received Date : 2024-01-23
  • Accepted Date : 2024-02-15