Bird sounds classification by combining PNCC and robust Mel-log filter bank features

Badi Alzahra; Kyungdeuk Ko; Hanseok Ko

doi:10.7776/ASK.2019.38.1.039

All Issue

2019 Vol.38, Issue 1 Preview Page Next Page

Research Article

Bird sounds classification by combining PNCC and robust Mel-log filter bank features PNCC와 robust Mel-log filter bank 특징을 결합한 조류 울음소리 분류

31 January 2019. pp. 39-46

PDF XML

Abstract

In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.

Keywords

Acoustic event recognition

Environmental sound classification

CNN (Convolutional Neural Network)

Weiner filter

PNCCs (Power Normalized Cepstral Coefficients)

본 논문에서는 합성곱 신경망(Convolutional Neural Network, CNN) 구조를 이용하여 잡음 환경에서 음향 신호를 분류할 때, 인식률을 높이는 결합 특징을 제안한다. 반면, Wiener filter를 이용한 강인한 log Mel-filter bank와 PNCCs(Power Normalized Cepstral Coefficients)는 CNN 구조의 입력으로 사용되는 2차원 특징을 형성하기 위해 추출됐다. 자연환경에서 43종의 조류 울음소리를 포함한 ebird 데이터베이스는 분류 실험을 위해 사용됐다. 잡음 환경에서 결합 특징의 성능을 평가하기 위해 ebird 데이터베이스를 3종류의 잡음을 이용하여 4개의 다른 SNR (Signal to Noise Ratio)(20 dB, 10 dB, 5 dB, 0 dB)로 합성했다. 결합 특징은 Wiener filter를 적용한 log-Mel filter bank, 적용하지 않은 log-Mel filter bank, 그리고 PNCC와 성능을 비교했다. 결합 특징은 잡음이 없는 환경에서 1.34 % 인식률 향상으로 다른 특징에 비해 높은 성능을 보였다. 추가적으로, 4단계 SNR의 잡음 환경에서 인식률은 shop 잡음 환경과 schoolyard 잡음 환경에서 각각 1.06 %, 0.65 % 향상했다.

키워드

음향 이벤트 인식

환경음 인식

CNN (Convolutional Neural Network)

Weiner filter

PNCCs (Power Normalized Cepstral Coefficients)

References

R. Radhakrishnan, A. Divakaran, and A. Smaragdis, "Audio analysis for surveillance applications," Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., 158-161 (2005).

10.1109/ASPAA.2005.1540194

J. Salamon and J. P. Bello, "Deep convolutional neural networks and sata augmentation for environmental sound classification," IEEE Signal Process. Lett., 24, 279-283 (2017).

10.1109/LSP.2017.2657381

F. R. González-Hernández, L. P. Sánchez-Fernández, S. Suárez-Guerra, and L. A. Sánchez-Pérez, "Marine mammal sound classification based on a parallel recognition model and octave analysis," Applied Acoustics, 119, 17-28 (2017).

10.1016/j.apacoust.2016.11.016

M. Malfante, J. Mars, M. D. Mura, C. Gervaise, J. I. Mars, and C. Gervaise, "Automatic fish sounds classification," J. Acoust. Soc. Am. 143, 2834-2846 (2018).

10.1121/1.503662829857733

O. M. Aodha, R. Gibb, K. E. Barlow, E. Browning, M. Firman, R. Freeman, B. Harder, L. Kinsey, G. R. Mead, S. E. Newson, I. Pandourski, S. Parsons, J. Russ, A. Szodorary-Paradi, F. Szodoray-Paradi, E. Tilova, M. Girolami, G. Brostow, and K. E. Jones, "Bat detective-Deep learning tools for bat acoustic signal detection," PLoS Comput. Biol., 14, e1005995 (2018).

10.1371/journal.pcbi.100599529518076PMC5843167

F. Briggs, B. Lakshminarayanan, L. Neal, X. Z. Fern, R. Raich, S. J. K. Hadley, A. S. Hadley, and M. G. Betts, "Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach." J. Acoust. Soc. Am. 131, 4640-4650 (2012).

10.1121/1.470742422712937

K. Ko, S. Park, and H. Ko, "Convolutional feature vectors and support vector machine for animal sound classification," Proc. IEEE Eng. Med. Biol. Soc. 376-379 (2018).

10.1109/EMBC.2018.8512408

R. Lu and Z. Duan, "Bidirectional Gru for sound event detection," Detection and Classification of Acoustic Scenes and Events (DCASE), (2017).

T. H. Vu and J.-C. Wang, "Acoustic scene and event recognition using recurrent neural networks," Detection and Classification of Acoustic Scenes and Events (DCASE), (2016).

Y. Miao, M. Gowayyed, and F. Metze, "EESEN: End-to-End speech recognition using deep RNN models and WFST-based decoding," 2015 IEEE Work. Autom. Speech Recognit. Understanding, ASRU 2015, 167-174 (2016).

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-End Attention-based large vocabulary speech recognition," Acoust. Speech Signal Process (ICASSP), 2016 IEEE Int. Conf., 4945-4949 (2016).

10.1109/ICASSP.2016.7472618

A. Ahmed, Y. Hifny, K. Shaalan, and S. Toral, "Lexicon free Arabic speech recognition recipe," Advances in Intelligent Systems and Computing, 533, 147-159 (2017).

10.1007/978-3-319-48308-5_15

C. Kim and R. M. Stern, "Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction," Proc. 10th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 28-31 (2009).

M. J. Alam, P. Kenny, and D. O'Shaughnessy, "Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique," Digit. Signal Process., 29, 147-157 (2014).

10.1016/j.dsp.2014.03.001

M. T. S. Al-Kaltakchi, W. L. Woo, S. S. Dlay, and J. A. Chambers, "Study of fusion strategies and exploiting the combination of MFCC and PNCC features for robust biometric speaker identification," 4th Int. Work. Biometrics Forensics (IWBF), 1-6 (2016).

S. Park, S. Mun, Y. Lee, D. K. Han, and H. Ko, "Analysis acoustic features for acoustic scene classification and score fusion of multi-classification systems applied to DCASE 2016 challenge," arXiv Prepr. arXiv1807.04970 (2018).

N. Upadhyay and R. K. Jaiswal, "Single channel speech enhancement: using Wiener filtering with recursive noise estimation," Procedia Comput. Sci., 84, 22-30 (2016).

10.1016/j.procs.2016.04.061

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in neural information processing systems, 1097-1105 (2012).

P. M. Chauhan and N. P. Desai, "Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using Wiener filter," Green Computing Communication and Electrical Engineering (ICGCCEE), 1-5 (2014).

S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation theory (PTR Prentice-Hall, Englewood Cliffs, 1993), pp. 400-409.

T. Gerkmann and R. C. Hendriks, "Noise power estimation based on the probability of speech presence," Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 145-148 (2011).

10.1109/ASPAA.2011.6082266

S. S. Stevens, "On the psychological law," Psychological Review, 64, 153 (1957).

10.1037/h004616213441853

L. Zhang, L. Zhang, and B. Du, "Deep learning for remote sensing data: A technical tutorial on the state of the art," IEEE Geosci. Remote Sens. Mag., 4, 22-40 (2016).

10.1109/MGRS.2016.2540798

K. Ko, S. Park, and H. Ko, "Convolutional neural network based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 60-65 (2018).

J. Park, W. Kim, D. K. Han, and H. Ko, "Voice activity detection in noisy environments based on double-combined fourier transform and line fitting," Sci. World J., 2014, e146040 (2014).

10.1155/2014/14604025170520PMC4142156

ITU-T, ITU-T P.56, Objective Measurement of Active Speech Level, 2011.

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 38
No :1
Pages :39-46
Received Date : 2018-11-07
Accepted Date : 2019-01-25
DOI :https://doi.org/10.7776/ASK.2019.38.1.039

The Journal of the Acoustical Society of Korea ISSN:1225-4428(Print) 2287-3775(Online) 한국음향학회지

All Issue