Study on data augmentation methods for deep neural network-based audio tagging

Bum-Jun Kim; Hyeongi Moon; Sung-Wook Park; Young cheol Park

doi:10.7776/ASK.2018.37.6.475

All Issue

2018 Vol.37, Issue 6 Preview Page Next Page

Research Article

Study on data augmentation methods for deep neural network-based audio tagging Deep neural network 기반 오디오 표식을 위한 데이터 증강 방법 연구

30 November 2018. pp. 475-482

PDF XML

Abstract

In this paper, we present a study on data augmentation methods for DNN (Deep Neural Network)-based audio tagging. In this system, an audio signal is converted into a mel-spectrogram and used as an input to the DNN for audio tagging. To cope with the problem associated with a small number of training data, we augment the training samples using time stretching, pitch shifting, dynamic range compression, and block mixing. In this paper, we derive optimal parameters and combinations for the augmentation methods through audio tagging simulations.

Keywords

Audio tagging

DNN (Deep Neural Network)

Data augmentation

Parameter tuning

본 논문에서는 DNN(Deep Neural Network) 기반 오디오 표식을 위한 데이터 증강 방법을 연구한다. 본 시스템에서는 오디오 신호를 멜-스펙트로그램으로 변환하여 오디오 표식을 위한 심층신경망의 입력으로 사용한다. 적은 수의 훈련 데이터를 사용하는 경우 발생하는 문제를 해결하기 위해, 타임 스트레칭, 피치 변화, 동적 영역 압축, 블록 혼합 등의 방법을 사용하여 훈련 데이터를 증강시켰다. 사용된 데이터 증강 기법의 최적 파라미터와 최적 조합을 오디오 표식 시뮬레이션을 통해 확인하였다.

키워드

오디오 표식

인공신경망

데이터 증강

파라미터 조정

References

E. Wold, T. Blum, D. Keislar, and J. Wheaten, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, 3, 27-36 (1996).

10.1109/93.556537

D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events: an IEEE AASP challenge," Proc. of IEEE WASPAA, 1-4, (2013).

10.1109/WASPAA.2013.6701819

P. Cano, M. Koppenberger, and N. Wack, "Content-based music audio recommendation," Proc. ACM 13th, 211-212 (2005).

10.1145/1101149.1101181

P. Foster, S. Sigtia, S. Krstulovic, J. Barker, and M. D. Plumbley, "CHiME-home: A dataset for sound source recognition in a domestic environment," Proc. of IEEE WASPAA, 15, 2015.

10.1109/WASPAA.2015.7336899

J. Salamon and J. P. Bello, "Deep convolutional neural networks and data augmentation for environmental sound classification," in. IEEE Signal Process. Lett., 24, 279-283(2016).

10.1109/LSP.2017.2657381

S. Mum, S. Park, D. K. Han, and H. Ko, "Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane," Proc. DCASE, 93-97 (2017).

R. Seizel, N. Turpault, H. Eghbal-Zadeh, and A. P. Shah, "Large-scale weakly labeled semi-supervised sound event detection," arXiv preprint arXiv:1807.10501, July (2018).

M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., 45, 2673-2681(1997).

10.1109/78.650093

G. E. Dahl, T. N. Sainath, and G. E. Hinton "Improving DNNs for LVCSR using rectified linear units and dropout," Proc. IEEE ICASSP, 8609-8613 (2013).

M. Hilsamer and S. Herzog, "A statistical approach to automated offline dynamic processing in the audio mastering process," In. DAFx, 35-40 (2014).

Dolby E, "Standards and practices for authoring Dolby Digital and Dolby E bitstreams," Dolby Labortories, Inc. 2002.

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, "Audio set: An ontology and human-labeled dataset for audio events," Proc. IEEE ICASSP, 776-780 (2017).

10.1109/ICASSP.2017.7952261

S. M. Beitzel, On Understanding And Classifying Web Queries, (Ph.D. thesis, Illinois Institute of Technology, Chicago, IL, CiteSeerX 10.1.1.127.634, 2006).

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 37
No :6
Pages :475-482
Received Date : 2018-09-14
Accepted Date : 2018-11-21
DOI :https://doi.org/10.7776/ASK.2018.37.6.475

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue