Improved CycleGAN for underwater ship engine audio translation

Hina Ashraf; Yoon-Sang Jeong; Chong Hyun Lee

doi:10.7776/ASK.2020.39.4.292

All Issue

2020 Vol.39, Issue 4 Preview Page Next Page

Research Article

Improved CycleGAN for underwater ship engine audio translation 수중 선박엔진 음향 변환을 위한 향상된 CycleGAN 알고리즘

31 July 2020. pp. 292-302

PDF XML

Abstract

Machine learning algorithms have made immense contributions in various fields including sonar and radar applications. Recently developed Cycle-Consistency Generative Adversarial Network (CycleGAN), a variant of GAN has been successfully used for unpaired image-to-image translation. We present a modified CycleGAN for translation of underwater ship engine sounds with high perceptual quality. The proposed network is composed of an improved generator model trained to translate underwater audio from one vessel type to other, an improved discriminator to identify the data as real or fake and a modified cycle-consistency loss function. The quantitative and qualitative analysis of the proposed CycleGAN are performed on publicly available underwater dataset ShipsEar by evaluating and comparing Mel-cepstral distortion, pitch contour matching, nearest neighbor comparison and mean opinion score with existing algorithms. The analysis results of the proposed network demonstrate the effectiveness of the proposed network.

Keywords

Generative Adversarial Networks (GANs)

Style transfer

Cycle-Consistency GAN (CycleGAN)

Mel-Cepstrum (MCEP)

Mean Opinion Score (MOS)

기계학습 알고리즘은 소나 및 레이더를 포함한 다양한 분야에서 사용되고 있다. 최근 개발된 GAN(Generative Adversarial Networks)의 변형인 Cycle-Consistency Generative Adversarial Network(CycleGAN)은 쌍을 이루지 않은 이미지-이미지 변환에 대해 검증된 네트워크이다. 본 논문에서는 높은 품질로 수중 선박 엔진음을 변환시킬 수 있는 변형된 CycleGAN을 제안한다. 제안된 네트워크는 수중 음향을 기존영역에서 목표영역으로 변환시키는 생성자 모델과 데이터를 참과 거짓으로 구분하는 개선된 식별자 그리고 변환된 수환 일관성(Cycle Consistency) 손실함수로 구성된다. 제안된 CycleGAN의 정량 및 정성분석은 공개적으로 사용 가능한 수중 데이터 ShipsEar을 사용하여 기존 알고리즘들과 Mel-cepstral분포, 구조적 유사 지수, 최소 거리 비교, 평균 의견 점수를 평가 및 비교함으로써 수행되었고, 분석결과는 제안된 네트워크의 유효성을 입증하였다.

키워드

생성적 적대 신경망(Generative Adversarial Network, GAN)

데이터 확장 기법

Cycle-Consistency GAN (CycleGAN)

Mel-Cepstrum(MCEP)

평균 의견 점수(Mean Opinion Score, MOS)

References

J. Choi, Y. Choo, and K. Lee, "Acoustic classification of surface and underwater vessels in the ocean using supervised machine learning," Sensors, 19, 3492 (2019).

10.3390/s1916349231404999PMC6721123

A. Tesei, S. Fioravanti, V. Grandi, P. Guerrini, and A. Maguer, "Localization of small surface vessels through acoustic data fusion of two tetrahedral arrays of hydrophones," Proc. Meetings on Acoustics, 17, 070050 (2012).

10.1121/1.4772778

R. Diamant and Y. Jin, "A machine learning approach for dead-reckoning navigation at sea using a single accelerometer," IEEE J. Oceanic Engineering, 39, 672-684 (2013).

10.1109/JOE.2013.2279421

Y. Tan, J. K. Tan, H. S. Kim, and S. Ishikawa, "Detection of underwater objects based on machine learning," Proc. The SICE Annual Conference 2013, IEEE 2104-2109 (2013).

H. Yang, K. Lee, Y. Choo, and K. Kim, "Underwater acoustic research trends with machine learning: Passive SONAR applications," JOET. 34, 227-236 (2020).

10.26748/KSOE.2020.017

C. Albaladejo, F. Soto, R. Torres, P. Sánchez, and J. A. López, "A low-cost sensor buoy system for monitoring shallow marine environments," Sensors, 12, 9613-9634 (2012).

10.3390/s12070961323012562PMC3444120

D. G. Hathaway and R. M. Bridges, "Underwater sonar array," U.S. Patent 4901287, 1990.

H. Yang, K. Lee, Y. Choo, and K. Kim, "Underwater Acoustic Research Trends with Machine Learning: General Background," JOET. 34, 147-154 (2020).

10.26748/KSOE.2020.015

J. Schluter and T. Grill, "Exploring data augmentation for improved singing voice detection with neural networks," Proc. ISMIR. 121-126 (2015).

B. McFee, E. J. Humphrey, and J. P. Bello, "A software framework for musical data augmentation," Proc. ISMIR. 248-254 (2015).

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," JAIR. 16, 321-357 (2002).

10.1613/jair.953

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. NPIS. 2672-2680 (2014).

C. Donahue, J. McAuley, and M. Puckette, "Adversarial audio synthesis," arXiv preprint arXiv:1802.04208 (2018).

S. Mangal, R. Modak, and P. Joshi, "LSTM Based Music Generation System," arXiv preprint arXiv:1908. 01080 (2019).

10.17148/IARJSET.2019.6508

F. H. K. dos S. Tanaka and C. Aranha, "Data augmentation using GANs," arXiv preprint arXiv:1904.09135 (2019).

Y. Qian, H. Hu, and T. Tan, "Data augmentation using generative adversarial networks for robust speech recognition," Speech Communication, 114, 1-9 (2019).

10.1016/j.specom.2019.08.006

J.-Y., Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," Proc. the IEEE int. conf. on computer vision, 2223-2232 (2017).

10.1109/ICCV.2017.244

V. Sandfort, K. Yan, P. J. Pickhardt, and R. M. Summers, "Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks," Scientific reports, 9, 1-9 (2019).

10.1038/s41598-019-52737-x31729403PMC6858365

T. Kaneko and H. Kameoka, "Parallel data-free voice conversion using cycle-consistent adversarial networks," arXiv:1711.11293 (2017).

10.23919/EUSIPCO.2018.8553236

S. H. Dumpala, I. Sheikh, R. Chakraborty, and S. K. Kopparapu, "A Cycle-GAN approach to model natural perturbations in speech for ASR applications," arXiv preprint arXiv:1912.11151 (2019).

T. Kaneko and H. Kameoka, "Cyclegan-vc: Non- parallel voice conversion using cycle-consistent adversarial networks," Proc. 26th EUSIPCO. IEEE 2100- 2104 (2018).

10.23919/EUSIPCO.2018.8553236

Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," Proc. ICML. 933-941 (2017).

Y. Taigman, A. Polyak, L. Wolf, "Unsupervised cross- domain image generation," Proc. ICLR. arXiv preprint arXiv:1607.08022 (2017).

D. Ulyanov, A. Vedaldi, and V. S. Lempitsky. Instance normalization: The missing ingredient for fast stylization. CoRR. abs/1607.08022 (2016).

F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," CoRR. abs/1511.07122 (2015).

A. Odena, V. Dumoulin, and C. Olah, "Deconvolution and checkerboard artifacts," Distill, 1, e3 (2016).

10.23915/distill.00003

U. Demir and G. Unal, "Patch-based image inpainting with generative adversarial networks," arXiv preprint arXiv:1803.07422 (2018).

J. Luo and Y. Yang, "Simulation model of ship- radiated broadband noise." Proc. IEEE ICSPCC. 1-5 (2011).

10.1109/ICSPCC.2011.6061632

C. Verron and G. Drettakis, "Procedural audio modeling for particle-based environmental effects," 133rd AES Convention (2012).

CycleGAN with Better Cycles, https://ssnl.github.io/ better_cycles/report.pdf, (Last viewed July 21, 2020).

D. Santos-Domínguez, S. Torres-Guijarro, A. Cardenal- López, and A. Pena-Gimenez, "ShipsEar: An underwater vessel noise database," Applied Acoustics, 113, 64-69 (2016).

10.1016/j.apacoust.2016.06.008

J. Nirmal, P. Kachare, S. Patnaik, and M. Zaveri, "Cepstrum liftering based voice conversion using RBF and GMM," Proc. ICCSP. IEEE 570-575 (2013).

10.1109/iccsp.2013.6577119

W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image qualifty assessment: from error visibility to structural similarity," IEEE Trans. on Image Processing, 13, 600-612 (2004).

10.1109/TIP.2003.81986115376593

M. Chu and H. Peng, "Objective measure for estimating mean opinion score of synthesized speech," U.S. Patent 7024362, 2006.

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 39
No :4
Pages :292-302
Received Date : 2020-06-09
Accepted Date : 2020-06-30
DOI :https://doi.org/10.7776/ASK.2020.39.4.292

The Journal of the Acoustical Society of Korea ISSN:1225-4428(Print) 2287-3775(Online) 한국음향학회지

All Issue