Feature Selection-based Voice Transformation

Ki-Seung Lee

doi:None

All Issue

2012 Vol.31, Issue 1 Preview Page Next Page

Feature Selection-based Voice Transformation

31 January 2012. pp. 39-50

PDF XML

Abstract

A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker’s database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker’s model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

Keywords

Voice conversion

Unit selection

Hidden markov model

References

M. Abe, S. Nakamura, K. Shikano and H. Kuwabara, "Voice conversion through vector quantization," in Proc. IEEE ICASSP, pp. 565-568, 1988.

M. Savic and I. H. Nam, "Voice personality transfor-mation," Digital Signal Processing, vol. 4, pp. 107- 110, 1991.

10.1016/1051-2004(91)90099-7

H. Valbret, E. Moulines and J. P. Tubach, "Voice transformation using PSOLA technique," Speech Communication, vol. 11, no. 2-3, pp. 175-187, 1992.

10.1016/0167-6393(92)90012-V

H. Mizuno and M. Abe, "Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectral tilt," Speech Communication, vol. 16, no. 2, pp. 153-164, 1995.

10.1016/0167-6393(94)00052-C

M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants of voice conversion using artificial neural networks," Speech Communication, vol. 16, no. 2, pp. 207- 216, 1995.

10.1016/0167-6393(94)00058-I

N. Iwahashi and Y. Sagisaka, "Speech spectrum conversion based on speaker interpolation and multi- functional representation with weighting by radial basis function networks," Speech Communication, vol. 16, no. 2, pp. 139-152, 1995.

10.1016/0167-6393(94)00051-B

Y. Stylianou O. Cappe and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 6, no. 2, pp. 131-142, 1998.

10.1109/89.661472

N. Bi and Y. Qi, "Application of speech conversion to alaryngeal speech enhancement," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 5, no. 2, pp. 97-105, 1997.

10.1109/89.554771

L. M. Arslan, "Speaker transformation algorithm using segmental codebooks (STASC)," Speech Communication, vol. 28, no. 28, pp. 211-226, 1999.

10.1016/S0167-6393(99)00015-1

K. S. Lee, D. H. Youn and I. W. Cha, "A New voice personality transformation based on both linear and nonlinear prediction analysis," in Proc. ICSLP, pp. 1401-1404, 1996.

K. S. Lee, D. H. Youn and I. W. Cha, "Voice conversion using a low dimensional vector mapping," IEICE Trans. on Information and System, vol-E85D, no. 8, pp. 1297- 1305, 2002.

K. S. Lee "Statistical approach for voice personality transformation," IEEE Trans. on Audio, Speech and Language processing, vol. 15, no. 2, pp. 641-651, 2007.

10.1109/TASL.2006.876760

Z.-H. Jian and Y. Zhen, "Voice conversion using Viterbi algorithm based on Gaussian mixture model," in Proc. Intelligent Signal Processing and Communi-cation Systems, pp. 32-35, 2007.

D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black, S. Narayanan, "Text-Independent Voice Con-version Based on Unit Selection," in Proc. IEEE ICASSP, pp. 14-19, 2006.

D. Sundermann, H. Hoge, A. Bonafonte, H. Ney and A. W. Black, "Residual prediction based on unit selection," in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, pp.369-374, 2005.

10.1109/ASRU.2005.1566484

T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez and Y. Stylianou, "Towards a Voice Conversion System Based on Frame Selection," in Proc. IEEE ICASSP, pp. 15-20, 2007.

10.1109/ICASSP.2007.366962

S. J. Cox and J. S. Bridle, "Unsupervised speaker adaptation by probabilistic spectrum fitting," in Proc. IEEE ICASSP, pp. 294-297, 1989.

D. G. Childers, B. Yegnanarayana and Ke Wu, "Voice Conversion: Factors responsible for quality," in Proc. IEEE ICASSP, pp. 748-751, 1985.

Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. on Communi-cations, vol. 28, Issue 1, pp. 84-95, 1980.

10.1109/TCOM.1980.1094577

M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianouand A. Syrdal, "The AT&T Next-Gen TTS system," in Proc. Joint Meeting of ASA, EAA, and DAGA, Berlin, Germany, March 1999.

L. R. Rabiner and R. W. Schafer, Digital Processing of speech signals, Prentice-Hall, 1987.

G. M. White and R. B. Neely, "Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming," IEEE Trans. on Acoustic Speech and Signal Processing, vol. ASSP-24, no. 2, pp. 183-188, 1976.

10.1109/TASSP.1976.1162779

S. Roucos and A. M. Wilgus, "High quality time- scale modification for speech," in Proc. ICASSP 85, pp. 493-469, 1985.

A. Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. R. Soc. London B, vol. 335, pp. 71-78, 1992.

10.1098/rstb.1992.00091348140

D. A. Reynolds and R. C. Rose, "Robust text- independent speaker identification using Gaussian mixture speaker models," IEEE Trans. on Acoustic Speech and Signal Processing, vol. 3, no. 1, pp. 72-83, 1995.

10.1109/89.365379

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 31
No :1
Pages :39-50
Received Date : 2011-11-29
Accepted Date : 2011-12-23

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue