Sources separation of passive sonar array signal using recurrent neural network-based deep neural network with 3-D tensor

Sangheon Lee; Dongku Jung; Jaesok Yu

doi:10.7776/ASK.2023.42.4.357

All Issue

2023 Vol.42, Issue 4 Preview Page Next Page

Research Article

Sources separation of passive sonar array signal using recurrent neural network-based deep neural network with 3-D tensor 3-D 텐서와 recurrent neural network기반 심층신경망을 활용한 수동소나 다중 채널 신호분리 기술 개발

31 July 2023. pp. 357-363

PDF XML

Abstract

In underwater signal processing, separating individual signals from mixed signals has long been a challenge due to low signal quality. The common method using Short-time Fourier transform for spectrogram analysis has faced criticism for its complex parameter optimization and loss of phase data. We propose a Triple-path Recurrent Neural Network, based on the Dual-path Recurrent Neural Network’s success in long time series signal processing, to handle three-dimensional tensors from multi-channel sensor input signals. By dividing input signals into short chunks and creating a 3D tensor, the method accounts for relationships within and between chunks and channels, enabling local and global feature learning. The proposed technique demonstrates improved Root Mean Square Error and Scale Invariant Signal to Noise Ratio compared to the existing method.

Keywords

Passive sonar

Multichannel signals separation

3-D tensor

Recurrent Neural Network (RNN)

Deep learning

다양한 신호가 혼합된 수중 신호로부터 각각의 신호를 분리하는 기술은 오랫동안 연구되어왔지만, 낮은 품질의 수중 신호의 특성 상 쉽게 해결되지 않는 문제이다. 현재 주로 사용되는 방법은 Short-time Fourier transform을 사용하여 수신된 음향신호의 스펙트로그램을 얻은 뒤, 주파수의 특성을 분석하여 신호를 분리하는 기술이다. 하지만 매개변수의 최적화가 까다롭고, 스펙트로그램으로 변환하는 과정에서 위상 정보들이 손실되는 한계점이 지적되었다. 본 연구에서는 이러한 문제를 해결하기 위해 긴 시계열 신호 처리에서 좋은 성능을 보인 Dual-path Recurrent Neural Network을 기반으로, 다중 채널 센서로부터 생성된 입력신호인 3차원 텐서를 처리할 수 있도록 변형된 Tripple-path Recurrent Neural Network을 제안한다. 제안하는 기술은 먼저 다중 채널 입력 신호를 짧은 조각으로 분할하고 조각 내 신호 간, 구성된 조각간, 그리고 채널 신호 간의 각각의 관계를 고려한 3차원 텐서를 생성하여 로컬 및 글로벌 특성을 학습한다. 제안된 기법은, 기존 방법에 비해 개선된 Root Mean Square Error 값과 Scale Invariant Signal to Noise Ratio을 가짐을 확인하였다.

키워드

수동소나

다중 채널 신호분류

3-D 텐서

Recurrent Neural Network (RNN)

딥러닝

References

F. Bahmaninezhad, J. Wu, R. Gu, S.-X. Zhang, Y. Xu, M. Yu, and D. Yu, "A comprehensive study of speech separation: spectrogram vs waveform separation," Proc. Interspeech, 4574-4578 (2019). 10.21437/Interspeech.2019-3181

P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Singing-voice separation from monaural recordings using deep recurrent neural networks," Proc. ISMIR, 477-482 (2014).

B. Gao, W. L. Woo, and S. S. Dlay, "Adaptive sparsity non-negative matrix factorization for single-channel source separation," IEEE J. Sel. Top. Signal Process, 5, 989-1001 (2011). 10.1109/JSTSP.2011.2160840

N. mitianoudis and M. E. davies, "Audio source separation of convolutive mixtures," IEEE trans. Speech, Audio, Process. 11, 489-497 (2003). 10.1109/TSA.2003.815820

D. Stoller, S. Ewert, and S. Dixon, "Wave-u-net: A multi-scale neural network for end-to-end audio source separation," Proc. ISMIR, 1-7 (2018).

Y. Luo, Z. Chen, and T. Yoshioka, "Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation," Proc. ICASSP, 46-50 (2020). 10.1109/ICASSP40776.2020.9054266

S. Venkataramani, J. Casebeer, and P. Smaragdis, "End-to-end source separation with adaptive front-ends," Proc. 52nd Asilomar Conf. Sig. Sys. Comput. 684-688 (2018). 10.1109/ACSSC.2018.8645535

F. Lluís, J. Pons, and X. Serra, "End-to-end music source separation: Is it possible in the waveform domain?," Proc. Interspeech, 4619-4623 (2018). 10.21437/Interspeech.2019-1177

I. Kavalerov, S. Wisdom, H. Erdogan, B. Patton, K. Wilson, J. Le Roux, and J. R. Hershey, "Universal sound separation," Proc. IEEE WASPAA, 175-179 (2019). 10.1109/WASPAA.2019.8937253

K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," Proc. ECCV, 630-645 (2016). 10.1007/978-3-319-46493-0_38

Y. Luo and N. Mesgarani, "Tasnet: time-domain audio separation network for real-time, single-channel speech separation," IEEE ICASSP, 696-700 (2018). 10.1109/ICASSP.2018.8462116

D. Santos-Domínguez, S. Torres-Guijarro, A. Cardenal-Lopez, A. Pena-Gimenez, "ShipsEar: An underwater vessel noise database," Appl. Acoust. 113, 64-69 (2016). 10.1016/j.apacoust.2016.06.008

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," Proc. NeurIPS, 1-12 (2019).

D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proc. ICLR, 1-15 (2014).

Y Luo and N. Mesgarani, "Conv-tasnet: surpassing ideal time-frequency magnitude masking for speech separation," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 27, 1256-1266 (2019). 10.1109/TASLP.2019.29151676726126

M. Kolbaek, D. Yu, Z.-H. Tan, and J. Jensen, "Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks," IEEE/ACM Trans. Audio, Speech, and Lang. Process. 25, 1901-1913 (2017). 10.1109/TASLP.2017.2726762

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 42
No :4
Pages :357-363
Received Date : 2023-05-09
Accepted Date : 2023-07-18
DOI :https://doi.org/10.7776/ASK.2023.42.4.357

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue