All Issue

2022 Vol.41, Issue 1 Preview Page

Research Article

January 2022. pp. 38-44
Abstract
References
1
J. Lim and A. Oppenheim, "All-pole modeling of degraded speech," IEEE Trans. on Acoustics, Speech, and Signal Process. 26, 197-210 (1978). 10.1109/TASSP.1978.1163086
2
R. Martin, "Spectral subtraction based on minimum statistics," power 6.8 (1994).
3
Y. H. Tu, J. Du, and C. H. Lee, "2d-to-2d mask estimation for speech enhancement based on fully convolutional neural network," Proc. IEEE ICASSP. 6664-6668 (2020).
4
Y. Xu, J. Du, and C. H. Lee, "A regression approach to speech enhancement based on deep neural networks," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 23, 7-19 (2014). 10.1109/TASLP.2014.2364452
5
D. L. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ ACM Trans. on Audio, Speech, and Lang. Process. 26, 1702-1726 (2018). 10.1109/TASLP.2018.284215931223631PMC6586438
6
Z. Xu, S. Elshamy, and T. Fingscheidt, "Using separate losses for speech and noise in mask-based speech enhancement," Proc. IEEE ICASSP. 7519-7523 (2020). 10.1109/ICASSP40776.2020.905296832915586
7
K. Paliwal, K. Wójcicki, and B. Shannon, "The importance of phase in speech enhancement," speech communication, 53, 465-494 (2011). 10.1016/j.specom.2010.12.003
8
Y. Wang and D. L. Wang, "A deep neural network for time-domain signal reconstruction," Proc. IEEE ICASSP. 4390-4394 (2015). 10.1109/ICASSP.2015.7178800
9
Y. Hu, Y. Liu, S. Lv, M. Xing, S. Zhang, Y. Fu, J. Wu, B. Zhang, and L. Xie, "DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement," arXiv preprint arXiv:2008.00264 (2020). 10.21437/Interspeech.2020-2537PMC7553560
10
H. S. Choi, J. H. Kim, J. Huh, A. Kim, J. W. Ha, and K. Lee, "Phase-aware speech enhancement with deep complex u-net," Proc. ICLR. 1-20 (2019).
11
O. Oktay, J. Schlemper, L. Le. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, B. Glocker, and D. Rueckert, "Attention u-net: Learning where to look for the pancreas," arXiv preprint arXiv:1804.03999 (2018).
12
Y. Luo and N. Mesgarani, "Conv-tasnet: Surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. on Audio, Speech, and Lang. Process. 27, 1256-1266 (2019). 10.1109/TASLP.2019.291516731485462PMC6726126
13
J. Zhang, M. D. Plumbley, and W. Wang, "Weighted magnitude-phase loss for speech dereverberation," Proc. IEEE ICASSP. 5794-5798 (2021). 10.1109/ICASSP39728.2021.9414929PMC8058714
14
C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian, J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal, "Deep complex networks," arXiv preprint arXiv:1705.09792 (2017).
15
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1," NASA STI/Recon Tech. Rep., n 93: 27403, 1993. 10.6028/NIST.IR.4930
16
A. Varga, "The NOISEX-92 study on the effect of additive noise on automatic speech recognition," ical Report, DRA Speech Research Unit, CiNii (1992).
17
E. Vincent, R. Gribonval, and C. Févotte, "Performance measurement in blind audio source separation," IEEE Trans. on Audio, Speech, and Lang. Process. 14, 1462- 1469 (2006). 10.1109/TSA.2005.858005
18
A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PE SQ)-a new method for speech quality assessment of telephone networks and codecs," Proc. IEEE ICASSP. 749-752 (2001).
19
C. H. Taal, R. C. Hendriks, and R. Heusdens, "A short-time objective intelligibility measure for time- frequency weighted noisy speech," Proc. IEEE ICASSP. 4214-4217 (2010). 10.1109/ICASSP.2010.5495701
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 41
  • No :1
  • Pages :38-44
  • Received Date : 2021-11-26
  • Revised Date : 2021-12-28
  • Accepted Date : 2022-01-10