Neural network based office noise classification via sub frame division and majority voting

Sanghyeok Park; Minhan Kim; Seunghyeon Shin; Naheun Song; Seokjin Lee

doi:10.7776/ASK.2026.45.3.313

All Issue

2026 Vol.45, Issue 3 Preview Page Next Page

Research Article

Neural network based office noise classification via sub frame division and majority voting 서브 프레임 분할과 다수결 투표를 활용한 신경망 기반 사무실 소음 분류

31 May 2026. pp. 313-322

PDF XML

Abstract

Accurate noise classification is essential for the effective operation of sound masking systems. However, noise signals recorded in real office environments are easily distorted by reverberation, background noise, and overlapping acoustic events, which can degrade the performance of conventional whole clip classification methods. In addition, office noise often contains temporally correlated or repetitive patterns rather than isolated events, indicating that both local acoustic cues and global temporal context should be considered. To address these issues, this study proposes an office noise classification framework that combines a Conformer based classifier for modeling temporal dependency with sub frame segmentation and majority voting based post processing to alleviate the influence of distortions in real world recordings. Experimental results show that the proposed method provides more accurate and stable classification performance than both a Convolutional Neural Network (CNN) based baseline model and a single clip Conformer model. These results indicate that the Conformer effectively captures temporal relations in office noise signals, while the sub frame based inference strategy mitigates the influence of distortion in real world data.

Keywords

Environment classification system

Conformer

Deep learning

Real world data

Sub-frame division

Majority voting

사운드 마스킹 시스템의 효과적인 운용을 위해서는 환경 소음을 정확하게 분류하는 기술이 요구된다. 그러나 실제 사무실 환경에서 획득한 소음 신호는 잔향, 배경 소음, 중첩된 음향 이벤트 등의 영향으로 왜곡되기 쉬우며, 이러한 특성은 기존의 전체 클립 단위 분류 방식에서 성능 저하를 유발할 수 있다. 또한 실제 사무실 소음은 단발적인 사건만으로 구성되기보다 시간상으로 연관되거나 반복되는 패턴을 포함하므로, 국소적 특징뿐 아니라 시간 축 전반의 문맥 정보도 함께 반영할 필요가 있다. 본 연구에서는 이러한 점을 고려하여, 시간적 연관성을 반영할 수 있는 Conformer 기반 분류기와 실 환경 데이터의 왜곡 영향을 완화하기 위한 서브 프레임 분할 및 다수결 후처리 전략을 결합한 사무실 소음 분류 프레임워크를 제안하였다. 실험 결과, 제안한 방법은 기존 합성곱 신경망(Convolutional Neural Network, CNN) 기반 모델 및 단일 클립 기반 Conformer 모델보다 더 높은 정확도와 안정적인 분류 성능을 나타냈다. 이는 Conformer가 소음 신호의 시간적 연관성을 효과적으로 반영하고, 서브 프레임 기반 추론이 실 환경 데이터의 왜곡 영향을 완화하는 데 유효함을 보여준다.

키워드

환경 소음 분류

Conformer

심층 신경망

실제 환경 데이터

서브 프레임 분할

다수결 투표

References

V. Hongisto, “Effects of sound masking on workers - a case study in a landscaped office,” Proc. 9th ICBEN, 442-449 (2008).

H. Kuttruff, Acoustics: An Introduction (CRC Press, Boca Raton, 2007), pp. 248-251.

L. Lu, H.-J. Zhang, and H. Jiang, “Content analysis for audio classification and segmentation,” IEEE Trans. Speech Audio Process. 10, 504-516 (2002).

10.1109/TSA.2002.804546

K. J. Piczak, “Environmental sound classification with convolutional neural networks,” Proc. MLSP 2015, 1-6 (2015).

10.1109/MLSP.2015.7324337

Y. Tokozume and T. Harada, “Learning environmental sounds with end-to-end convolutional neural network,” Proc. ICASSP, 2721-2725 (2017).

10.1109/ICASSP.2017.7952651

X. Zhang, Y. Zou, and W. Shi, “Dilated convolution neural network with LeakyReLU for environmental sound classification,” Proc. DSP 2017, 1-5 (2017).

10.1109/ICDSP.2017.8096153

Z. Zhang, S. Xu, S. Cao, and S. Zhang, “Deep convolutional neural network with mixup for environmental sound classification,” Proc. PRCV, 356-367 (2018).

10.1007/978-3-030-03335-4_31

H. Wang, Y. Zou, D. Chong, and W. Wang, “Environmental sound classification with parallel temporal-spectral attention,” arXiv:1912.06808 (2019).

10.21437/Interspeech.2020-1219

Y. Cai, P. Zhang, and S. Li, “TF-SepNet: An efficient 1D kernel design in CNNs for low-complexity acoustic scene classification,” Proc. ICASSP, 821-825 (2024).

10.1109/ICASSP48485.2024.10447999

A. Ashurov, Z. Yi, H. Liu, Z. Yu, and M. Li, “Concatenation-based pre-trained convolutional neural networks using attention mechanism for environmental sound classification,” Appl. Acoust. 216, 109759 (2024).

10.1016/j.apacoust.2023.109759

F. Schmid, P. Primus, T. Heittola, A. Mesaros, I. Martín-Morató, K. Koutini, and G. Widmer, “Data-efficient low-complexity acoustic scene classification in the DCASE 2024 Challenge,” arXiv:2405.10018 (2024).

K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, and K. Takeda, “Conformer-based sound event detection with semi-supervised learning and data augmentation,” Proc. DCASE2020, 100-104 (2020).

A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-augmented Transformer for speech recognition,” Proc. Interspeech 2020, 5036-5040 (2020).

10.21437/Interspeech.2020-3015

A. Mesaros, T. Heittola, and T. Virtanen, “Acoustic scene classification: An overview of DCASE 2017 challenge entries,” Proc. IWAENC 2018, 411-415 (2018).

10.1109/IWAENC.2018.8521242

G. Dekkers, S. Lauwereins, B. Thoen, M. W. Adhana, H. Brouckxon, T. V. Waterschoot, B. Vanrumste, M. Verhelst, and P. Karsmakers, “The SINS database for detection of daily activities in a home environment using an acoustic sensor network,” Proc. DCASE 2017, 32-36 (2017).

T. Inoue, P. Vinayavekhin, S. Wang, D. Wood, N. Greco, and R. Tachibana, “Domestic activities classification based on CNN using shuffling and mixing data augmentation,” DCASE 2018 Challenge., Tech. Rep., 2018.

T. Iqbal, Y. Cao, A. Bailey, M. D. Plumbley, and W. Wang, “ARCA23K: An audio dataset for investigating open-set label noise,” arXiv:2109.09227 (2021).

E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, “FSD50K: An open dataset of human-labeled sound events,” IEEE/ACM Trans. Audio Speech Lang. Process. 30, 829-852 (2021).

10.1109/TASLP.2021.3133208

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, Cambridge, MA, 2017), pp. 234-236.

B. Kim, S. Yang, J. Kim, and S. Chang, “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” DCASE2021 Challenge., Tech. Rep., 2021.

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 45
No :3
Pages :313-322
Received Date : 2026-03-23
Accepted Date : 2026-05-06
DOI :https://doi.org/10.7776/ASK.2026.45.3.313

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue