A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy

Boneung Koo

doi:10.7776/ASK.2014.33.2.139

All Issue

2014 Vol.33, Issue 2 Preview Page Next Page

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy 웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별

31 March 2014. pp. 139-145

PDF XML

Abstract

In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD’s for SNR values of ranging from 10 to -10 dB.

Keywords

Voice activity detection

Speech pause detection

Teager energy

Wavelet packet decomposition

Noise-robustness

Single channel

본 논문에서는 WPD (Wavelet Packet Decomposition) 계수에 Teager 에너지를 적용한 특징 계수를 임계값 알고리듬에 적용하여 잡음에 강인한 VAD 알고리듬을 제안하였다. 임계값은 비음성 구간의 평균과 표준편차를 추산하여 설정하였다. TIMIT 음성과 NOISEX 잡음 데이터베이스를 사용한 실험 결과, 제안된 알고리듬이 기존의 대표적인 비교 대상 알고리듬보다 우수함을 보였다. 정확도는 SNR 10 dB부터 -10 dB까지 ROC (Receiver Operating Characteristics) 곡선을 사용하여 비교하였다.

키워드

음성 탐지

비음성 탐지

Teager 에너지

웨이블렛 패킷 변환

잡음 강인성

단일 채널

References

1.P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.

2.K. Ishizuka, T. Nakatani and N. Miyazaki, “Noise robust voice activity detection based on periodic to aperiodic component ratio,” Speech Commun.52, 41-60 (2010).

3.D. Ying, Y. Yan, J. Dang and F. K. Soong, “Voice activity detection based on an unsupervised learning network,” IEEE Trans. Audio, Speech, and Lang. Processing, 19, 2624-2628 (2011).

4.T. Kristjansson, S. Deligne and P. Olsen, “Voicing features for speech detection,” in Proc. Interspeech, 369-372 (2005).

5.J-H Bach, B. Kollmeier and J. Anemuller, “Modulation- based detection of speech in real background noise: Gene-ralization to novel background classes,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 41-44 (2010).

6.E. Chuangsuwanich and J. Glass, “Robust voice activity detector for real world application using harmonicity and modulation frequency,” in Proc. Interspeech, 2645-2648 (2011).

7.J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detection,” IEEE Signal Process. Lett. 16, 1-3 (1999).

8.F. Beritelli, S. Casale and G. Ruggeri, “Performance evaluation and comparison of ITU-T/ETSI voice activity detectors,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. 3, 1425-1428 (2001).

9.M. Marzinzik and B. Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,” IEEE Trans. Speech and Audio Process. 10, 109-118 (2002)

10.J. Ramirez, J. C. Segura, C. Benitez, A, Torre and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech Commun. 42, 271-287 (2004).

11.A. Davis, S. Nordholm and R. Togneri, “Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold,” IEEE Trans. Audio, Speech, and Lang. Processing, 14, 412-414 (2006).

12.G. Evangelopoulos and P. Maragos, “Multiband modulation energy tracking for noisy speech detection,” IEEE Trans. Audio, Speech and Lang. Processing, 14, 2024-2038 (2006).

13.T. V. Pham and T. T. Chien, “Reliable voice activity detection algorithm under adverse environments,” in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).

14.P. K. Ghosh and S. Narayanan, “Robust voice activity detection using long-term signal variability,” IEEE Trans. Audio, Speech and Lang. Processing, 19, 600-613 (2011).

15.James F. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process. S7.3, 381-384 (1990).

16.F. Jabloun, A. E. Cetin and E. Erzin, “Teager energy based feature parameters for speech recognition in car noises,” IEEE Signal Process. Lett.. 6, 259-261 (1999).

17.M. Bahoura and J. Rouat, “Wavelet speech enhancement based on the Teager energy operator,” IEEE Signal Process. Lett. 8, 10-12 (2001).

18.K. B. Eung, “An Experimental Study on the Robustness of the Teager Energy to the Car Noise,” (in Korean), Inst. of Industrial Technology Journal, Kyonggi University, 39, 43-56 (2011).

19.ETSI EN 301 708 V7.1.1(1999-12), Digital cellular tele-communications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).

20.ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).

21.J. S. Garofolo, “TIMIT acoustic-phonetic continuous speech corpus,” Linguistic Data Consortium, Philadelphia, (1993).

22.A. Varga and H. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems,” Speech Commun.12, 247-251 (1993).

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 33
No :2
Pages :139-145
Received Date : 2013-12-06
Accepted Date : 2014-01-24
DOI :https://doi.org/10.7776/ASK.2014.33.2.139

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue