A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises

Boneung Koo

doi:10.7776/ASK.2015.34.4.310

All Issue

2015 Vol.34, Issue 4 Preview Page Next Page

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises 비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지

31 July 2015. pp. 310-315

PDF XML

Abstract

A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

Keywords

Voice activity detection

Speech pause detection

Nonstationary noise

Noise-robustness

Single channel

본 논문에서는 비정체성(nonstationary) 잡음 환경을 위한 단일 채널 VAD(Voice Activity Detection) 알고리듬 제안하였다. VAD 판별을 위한 특징계수의 임계값은 과거 비음성 프레임들의 평균과 표준편차를 추산하여 적응적으로 갱신하였다. 특징계수로는 SPD-TE(Spectral Power Difference-Teager Energy)를 사용했는데, 이것은 WPD (Wavelet Packet Decomposition) 계수에 Teager 에너지를 적용한 것으로서 잡음에 강인한 것으로 보고된 바 있다. TIMIT 음성과 NOISEX-92 잡음을 사용하여 10 dB부터 -10 dB까지의 SNR에 대한 실험 결과, 제안된 알고리듬이 표준을 포함한 기존의 알고리듬과 비슷한 정확도를 보였다.

키워드

음성 탐지

비음성 탐지

비정체성 잡음

잡음 강인성

단일 채널

References

1.P. C. Loizou, Speech Enhancement (CRC Press, Boca Raton, 2007), pp. 309-400.

2.J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detection,” IEEE Signal Process. Lett. 16, 1–3 (1999).

3.ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70, ITU-T Recommendation G.729-Annex B (1996).

4.ETSI EN 301 708 V7.1.1(1999-12), Digital cellular tele-communications system(Phase 2+); VAD for AMR speech traffic channels; General Description (GSM 06.94 version 7.1.1 Release 1998), 13-14 (1999).

5.ETSI ES 202 050, Ver. 1.1.5(2007-01), Speech Processing, Transmission and Quality Aspects(STQ); Distributed Speech Recognition; Advanced front-end feature extraction algorithm; Compression algorithms, Annex A.3 Stage 2-VAD Logic, 42-43 (2007).

6.J. Ramirez, J. C. Segura, C. Benitez, A. Torre, and A. Rubio, “Efficient voice activity detection algorithms using long- term speech information,” Speech Commun. 42, 271-287 (2004).

7.A. Davis, S. Nordholm, and R. Togneri, “Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold,” IEEE Trans. Audio, Speech and Lang. Processing 14, 412-414 (2006).

8.G. Evangelopoulos and P. Maragos, “Multiband modulation energy tracking for noisy speech detection,” IEEE Trans. Audio, Speech and Lang. Processing 14, 2024-2038 (2006).

9.T. V. Pham and T. T. Chien, “Reliable voice activity detection algorithm under adverse environments,” in Proc. IEEE Int. Conf. Commun. Electronics, 218-223 (2008).

10.P. K. Ghosh and S. Narayanan, “Robust voice activity detection using long-term signal variability,” IEEE Trans. Audio, Speech and Lang. Processing 19, 600-613 (2011).

11.E. Chuangsuwanich and J. Glass, “Robust voice activity detector for real world application using harmonicity and modulation frequency,” in Proc. Interspeech, 2645-2648 (2011).

12.B. Koo, “A single channel voice activity detection for noisy environments using wavelet packet decomposition and Teager energy” (in Korean), J. Acoust. Soc. Kr. 33, 139-145 (2014).

13. J. Garofolo, “TIMIT acoustic-phonetic continuous speech corpus,” LDC93S1, Linguistic Data Consortium, Philadelphia, 1993.

14. A. Varga and H. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: An additive noise on speech recognition systems,” Speech Commun. 12, 247-251 (1993).

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 34
No :4
Pages :310-315
Received Date : 2015-01-29
Accepted Date : 2015-04-07
DOI :https://doi.org/10.7776/ASK.2015.34.4.310

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue