A music similarity function based on probabilistic linear discriminant analysis for cover song identification

Jin Soo Seo; Junghyun Kim; Hyemi Kim

doi:10.7776/ASK.2022.41.6.662

All Issue

2022 Vol.41, Issue 6 Preview Page Next Page

Research Article

A music similarity function based on probabilistic linear discriminant analysis for cover song identification 커버곡 검색을 위한 확률적 선형 판별 분석 기반 음악 유사도

30 November 2022. pp. 662-667

PDF XML

Abstract

Computing music similarity is an indispensable component in developing music search service. This paper focuses on learning a music similarity function in order to boost cover song identification performance. By using the probabilistic linear discriminant analysis, we construct a latent music space where the distances between cover song pairs reduces while the distances between the non-cover song pairs increases. We derive a music similarity function by testing hypothesis, whether two songs share the same latent variable or not, using the probabilistic models with the assumption that observed music features are generated from the learned latent music space. Experimental results performed on two cover music datasets show that the proposed music similarity improves the cover song identification performance.

Keywords

Cover song identification

Music similarity

Probabilistic Linear Discriminant Analysis (PLDA)

Latent variable

음악 유사도 계산은 음악 검색 서비스 구현에서 가장 중요한 요소 중 하나이다. 본 논문은 커버곡 검색의 성능을 제고하기 위한 음악 유사도 학습에 대해서 다룬다. 음악 유사도 함수를 유도하는 데 확률적 선형 판별 분석을 이용하여 잠재 음악 공간을 구한다. 잠재 음악 공간은 같은 커버곡 간의 거리는 줄이고 다른 곡 간의 거리는 크게 되도록 학습한다. 추출된 음악 특징이 잠재 음악 변수에서 생성되었다는 가정 하에 확률 모델을 구하고, 음악의 동질성 여부를 가설검증하여 음악 유사도 함수를 유도한다. 두 가지 커버곡 실험 데이터셋에서 성능 비교를 수행하여 제안한 음악 유사도 함수가 커버곡 검색 성능을 개선시킬 수 있음을 보였다.

키워드

커버곡 검색

음악 유사도

확률적 선형 판별 분석

잠재 변수

References

Y. V. S. Murthy and S. G. Koolagudi, "Content-based music information retrieval and its applications toward the music industry: A review," ACM Comput. Surv. 51, 1-46 (2019). 10.1145/3177849

J. S. Seo, J. Kim, and J. Park, "Centroid-model based music similarity with alpha divergence" (in Korean), J. Acoust. Soc. Kr. 35, 83-91 (2016). 10.7776/ASK.2016.35.2.083

F. Yesiler, G. Doras, R. M. Bittner, C. J. Tralie, and J. Serra, "Audio-based musical version identification: Elements and challenges," IEEE Signal Process. Mag. 38, 115-136 (2021). 10.1109/MSP.2021.3105941

J. Serra, E. Gomez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio Speech Lang. Process, 16, 1138-1151 (2008). 10.1109/TASL.2008.924595

J. S. Seo, "Cover song search based on magnitude and phase of the 2D Fourier transform" (in Korean), J. Acoust. Soc. Kr. 37, 518-524 (2018).

G. Doras and G. Peeters, "Cover detection using dominant melody embeddings," Proc. ISMIR, 107-114 (2019).

F. Yesiler, J. Serrà, and E. Gómez, "Accurate and scalable version identification using musically-motivated embeddings," Proc. ICASSP, 21-25 (2020). 10.1109/ICASSP40776.2020.9053793

X. Du, Z. Yu, B. Zhu, X. Chen, and Z. Ma, "Bytecover: Cover song identification via multi-loss training," Proc. ICASSP, 551-555 (2021). 10.1109/ICASSP39728.2021.9414128

S. Prince, P. Li, Y. Fu, U. Mohammed, and J. Elder, "Probabilistic models for inference about identity," IEEE TPAMI, 34, 144-157 (2012). 10.1109/TPAMI.2011.10421576751

P. Rajan, A. Afanasyev, V Hautamäki, and T. Kinnunen, "From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification," Digit. Signal Process. 31, 93-101 (2014). 10.1016/j.dsp.2014.05.001

D. Snyder, D. Garcia-Romero, G. Sell, A. McCree, D. Povey, and S. Khudanpur, "Speaker recognition for multi-speaker conversations using x-vectors," Proc. ICASSP, 5796-5800 (2019). 10.1109/ICASSP.2019.8683760

B. McFee and J. P. Bello, "Structured training for large-vocabulary chord recognition," Proc. ISMIR, 188-194 (2017).

A. Hermans, L. Beyer, and B. Leibe, "In defense of the triplet loss for person re-identification," arXiv: 1703. 07737 (2017).

H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, "Bag of tricks and a strong baseline for deep person re-identification," Proc. CVPR workshops, 1487-1495 (2019). 10.1109/CVPRW.2019.00190

F. Yesiler, C. Tralie, A. Correya, D. F. Silva, P. Tovstogan, E. Gómez, and X. Serrà, "Da-TACOS: A dataset for cover song identification and understanding," Proc. ISMIR, 327-334 (2019).

Covers80 Cover Song Data Set, http://labrosa.ee.columbia.edu/projects/coversongs/covers80/ , (Last viewed February 1, 2017).

F. Yesiler, J. Serrà, and E. Gómez, "Less is more: Faster and better music version identification with embedding distillation," Proc. ISMIR, 884-892 (2020).

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 41
No :6
Pages :662-667
Received Date : 2022-09-30
Accepted Date : 2022-10-27
DOI :https://doi.org/10.7776/ASK.2022.41.6.662

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue