Research Article

The Journal of the Acoustical Society of Korea. 31 July 2024. 422-435
https://doi.org/10.7776/ASK.2024.43.4.422

ABSTRACT


MAIN

  • I. Introduction

  • II. Methods

  •   2.1 Two speakers and face masks

  •   2.2 Speech recording under mask-wearing conditions

  •   2.3 Data extraction

  • III. Results

  •   3.1 Vowel analysis

  •   3.2 Spectral and cepstral analysis

  • IV. Discussion

  •   4.1 Face masks like low-pass filters

  •   4.2 Speech levels and CPPs due to face masks

  •   4.3 Limitations and future work

  • V. Conclusions

I. Introduction

Wearing a mask has become the new normal option owing to the coronavirus disease 2019 (COVID-19) outbreak. However, people in Korea have significantly reduced the use of masks since the declaration of the end of the COVID-19 pandemic, but are wearing masks selectively due to concerns about yellow dust, fine dust, flu, and coronavirus. People wear masks according to their individual preferences regarding shape, thickness, and color.[1] Various studies have been conducted on the wearing of masks during the COVID-19 pandemic. Wearing a mask for a long time not only causes discomfort such as difficulty in breathing, speech difficulties, sweating, and moisture,[1] but also makes visual access to the speaker’s lips difficult and causes hearing impairment in noisy environments.[2] Furthermore, wearing a mask has been shown to weaken a speaker’s speech signal at 1 kHz – 2 kHz[3,4,5,6] and negatively affect intelligibility.[7,8] Table 1 shows the latest research on the acoustic characteristics of vowels and sentences according to various types of mask wearing. A group of studies found that face masks did not produce significant acoustic changes and affected only a few of the acoustic parameters. Fiorella et al.[9]found no statistically significant difference in any acoustic parameters: fundamental frequency (F0), vocal intensity, jitter, shimmer, and Harmonics-to-Noise Ratio (HNR), from sustained /a/ samples from 60 Italian participants-between the masked and unmasked conditions; however, in 65 % of the subjects, after wearing the surgical mask, there was a non-significant decrease in vocal intensity. A reduction in intensity can also affect social interactions and speech audibility, particularly in individuals with hearing loss. They concluded that wearing a mask was likely to induce an unconscious need to increase vocal effort, resulting in a greater risk of developing functional dysphonia. Joshi et al.[10] collected vowel extension vocalization samples from 19 adults (10 women and 9 men) wearing no masks, cloth masks, surgical masks, KN95 masks, or surgical masks over a KN95 mask with and without a face shield. The masks tested in this study did not have a significant impact on the sound pressure level, F0, Cepstral Peak Prominence (CPP), and first (F1) or second (F2) formant frequency compared with the voice output without a mask. McKenna et al.[11] evaluated voice acoustics and self-perceptual ratings in US healthcare workers who were required to wear face masks throughout their workdays. Healthcare workers who wore masks reported more negative vocal symptoms after the workday. These symptoms appeared to be related to an increase in vocal intensity and HNR, and a decrease in F0. The effect of the mask type (simple and N95) showed only a Low-to-High spectral ratio (L/H ratio) standard deviation. Magee et al.[7]investigated the effects of N95, surgical, and cloth masks on acoustic analysis and perceived intelligibility in four Australian participants and found effects at frequencies above 3 kHz for the 95 mask and above 5 kHz for surgical and cloth masks. Measures of timing and spectral tilt differed mainly with the N95 mask use. The cepstral and harmonics-to-noise ratios remained unchanged across mask types. Face masks changed the speech signal, but some specific acoustic features, such as measures of voice quality, remained largely unaffected irrespective of the mask type. Some studies showed significant changes in acoustic parameters owing to face masks. Gojayev et al.[12] measured F0, shimmer, jitter, s/z ratio, Maximum Phonation Time (MPT), and HNR in 204 Turkish patients under three different masking conditions: no mask, surgical mask, and valved Face-Filtering Piece-3 (FFP3). When wearing no mask or a surgical mask, no significant differences in F0, jitter, shimmer, HNR, s/z, or MPT were found. However, significant differences were observed in the shimmer and HNR values when wearing an FFP3. Lin et al.[13] examined whether medical masks affected the acoustic, aerodynamic, and formant parameters in 53 Chinese participants (25 males and 28 females). The Sound Pressure Level (SPL) increased significantly when medical masks were worn. The jitter, shimmer, and frequency of the third formant (F3) were significantly reduced.

Table 1.

Acoustic parameters affected by face mask extracted from previous studies.

Speaker Speech material Face mask Acoustic parameters
Gender + No Language Analysis Effect of mask (P < 0.05)
Fiorella et al. (2021)[9] M24, F36 Hospital workers, Bari, Italy A sustained /a/ No Mask, Surgical Maximum Phonation Time (MPT), Median Pitch, Mean Pitch, Nim Pitch, Max Pitch, Intensity, Number of pulses, Number of Periods, HNR, Jitter, Shimmer None
Joshi et al. (2021)[10] M9, F10 Native speakers of Standard American English Sustained /a/, /i/ No Mask, Cloth, Surgical, KN95, Surgical + KN95, Surgical_KN95+shield SPL 1ft and 6ft, F0, CPP, F1, F2 No significant impact
McKenna et al. (2021)[11] M7, F11 English speaking healthcare workers Sustained vowels /i/, /a/, /u/
The first paragraph of the Rainbow passage
Single words and sentences
Simple, N95 Spectral and cepstral:: CPP, CPP SD, L/H Ratio, LH SD, CPP f0, CPP f0 SD
VCV: Relative Fundamental Frequency (RFF), Offset 10, Onset 1 Vowel: Intensity, HNR
LH SD
Magee et al. (2020)[7] M2, F2 Native English speakers A Phonetically balanced text, the Grandfather Passage No Mask, Surgical, N95, Cloth Mean pause length, Variability of pause length, Percent of pauses Spectral tilt, Mean intensity, Intensity prominence, p95 Intensity, CPPS, HNR, f0 mean, f0 CoV, Jitter, Shimmer Head-mounted Mic : Mean Pause length, Percent of pauses (%), Spectral tile
Tabletop Mic.: Percent of pause, Spectral tilt
Gojayev et al. (2021)[12] M77, F127 Turkish patients A sustained /a/ No Mask, Surgical, FFP3 F0, Jitter, Shimmer, MPT, HNR, s/z No significant impact with surgical mask
Overall with FFP3: Shimmer, HNR
Female: HNR
Male: Jitter, Shimmer, MPT, HNR
Lin et al. (2021)[13] M25, F28 Chinese speaking participants A sustained /a/ No Mask, Medical F0, SPL, Jitter, Shimmer, NHR, CPP, MPT, F1, F2, F3 SPL, Jitter, Shimmer, F3
Nguyen et al. (2021)[14] M4, F12 English speakers A sustained /a/
CAPE-V phrases
Rainbow Passage
No Mask, Surgical, KN95 Mean spectral level in 0 kHz - 1 kHz and 1 kHz - 8 kHz, L/H(1 k), HNR, CPP, Intensity Mean spectral level 1 kHz - 8 kHz, HNR
McKenna et al. (2022)[15] M8, F13 Standard American English speaking healthcare professionals Sustained /i/, /u/, /a/
Rainbow Passage
VCV
Simple, N95, N95 + Simple Vowel: f0, f0 SD, jitter, shimmer, HNR, Intensity, F1, F2, Vowel Articulation Index (VAI) Spectral and Cepstral: L/H(4k), L/H SD, CPP, CPP SD, VCV: RFF HNR, CPP, L/H, L/H SD, RFF offset 10, VAI
Zhang et al. (2022)[8] F3 Native Hong Kong Cantonese 30 trisyllabic words imbedded in a carrier structure No Mask, Surgical, KF94, Face shield, Surgical + shield Speaking rate, Intensity, Acoustic attenuation in
0 kHz - 1 kHz and 1 kHz - 8 kHz,
Vowel Space Area (VSA)
Tone duration, F0, F1, F2
Speaking rate, Intensity, Acoustic attenuation in 0 kHz - 1 kHz and
1 kHz - 8 kHz, VSA, /aa/ F1, /aa/ F2,
Tone duration, F0
Cala et al. (2023)[16] M5, F5 Italian speaking otolaryngologists working in the hospital Sustained /a/, /i/, /u/ A sentence No Mask, Surgical,
Surgical + shield, FFP2 (N95),
FFP2 + shield, FFP3, FFP3 + shield
F0 mean, Jitter, NNE, F1, F2, VSA, FCR (1/VAI) Female: Jitter /u/, NNE /i/, NNE /u/, F1a/F1i, F1a/F1u, F2i/F2u, VSA, FCR
Male: NNE /a/, F1a/F1i, F1a/F1u, F2i/F2u, VSA, FCR
Geng et al. (2023)[17] M15, F15 Native Mandarin Chinese speakers (Fluent English as their 2nd language) Phonetically balanced texts in both Chinese and English versions No Mask, Surgical F0, Speech rate, Intensity, HNR, Jitter, Shimmer, H1-H2 F0, Intensity, HNR, Jitter, Shimmer
Yang and Kwon
(present study)
M1, F1 Native Korean voice actors Sustained /a/, /i/, /u/
Phonetically balanced monosyllabic word lists
No Mask, Surgical, KF-AD vertically folded, KF-AD horizontally folded, KF-80 vertically folded, KF-80 horizontally folded, KF-94 vertically folded, KF-94 horizontally folded, N95 Vowel: f0 mean, F1, F2, VSA, VAI, HNR
Lists: F0 mean, Speech level, L/H (4 k), CPP
F0 mean (Hz), Speech level (dBA),
L/H (4 k) (dB), CPP (dB)

These changes may result from the adjustment of the vocal tract and the filtration function of medical masks, leading to the stability of voices being overstated. Nguyen et al.[14] conducted an acoustic analysis of F0, jitter, shimmer, MPT, HNR, and s/z in no-mask, surgical, and KN95 situations when performing standardized speech tasks on 16 Australian participants. In connected speech, the average spectral level in the 1 kHz – 8 kHz regions was significantly attenuated, and the L/H ratio of connected speech increased significantly while wearing either a surgical mask or a KN95 mask; however, no significant change in this measure was found for vowels. The HNR was higher when wearing a mask than when not wearing a mask. The CPPs and voice intensity did not change while wearing a mask. These results showed that surgical masks had less of an effect on the speech spectrum than KN95s. McKenna et al.[15] examined the spectral energy and vocal effort during speech while wearing Simple, N95, and N95 + Simple masks. They found a significant decrease in the VAI, high-frequency information (> 4 kHz), and a RFF offset of 10 when wearing a mask. The CPPs and perceived vocal effort increased when wearing an N95 mask, and high-frequency attenuation was noticeable compared with a simple mask. Zhang et al.[8] found significant changes in all acoustic correlates of Cantonese speech under Protective Facial Coverings (PFCs). The sound pressure levels were attenuated more intensely at higher frequencies in speech through face masks, whereas sound transmission was more affected at lower frequencies. Vowel spaces derived from formant frequencies shrank in all PFCs, with the vowel /aa/ demonstrating the largest changes in the first two formants. All tone-bearing parts were shortened and showed increments in F0 in speech through PFCs. The decrease in tone duration was statistically significant only for the high-level tones. They concluded that the general filtering effect of PFCs on Cantonese speech data confirmed the language-universal patterns of acoustic attenuation by PFCs. Cala et al.[16] reported significant differences between mask + shield configuration and no-mask conditions and between mask and mask + shield conditions with 10 Italian participants. The power spectral density decreased with statistical significance above 1.5 kHz when wearing masks. Subjective ratings confirmed an increase in discomfort from the no-mask condition to the protective mask and shield condition. Geng et al.[17] conducted a cross-linguistic study of masked speech in Mandarin Chinese and English. Continuous speech of phonetically balanced text in both Chinese and English versions was recorded from 30 native speakers of Mandarin Chinese, with and without a surgical mask. As a result of the acoustic analysis, mask speech exhibited higher F0, intensity, HNR, and lower jitter and shimmer than no-mask speech for Mandarin Chinese, whereas higher HNR and lower jitter and shimmer were observed for English mask speech. They concluded that wearing a surgical mask impacted both acoustic-phonetic and automatic speaker recognition approaches to some extent, suggesting particular caution in the real-case practice of forensic speaker identification.

This study explored the effects of face masks on the Korean language in terms of acoustic, aerodynamic, and formant parameters. We chose all types of face masks available in Korea based on filter performance and folding type. Professional voice actors with more than 20 years of experience who are native Koreans and speak standard Korean participated in this study as speakers of voice data. We hypothesized that (1) face masks could affect speech acoustic parameters, and (2) changes in these acoustic measures would be more pronounced with increased face mask thickness.

II. Methods

2.1 Two speakers and face masks

Two native Korean voice actors (male, 51 years old; female, 47 years old) with more than 20 years of experience participated as speakers. No participant had dysphonia or any other voice problems.

Table 2 and Fig. 1 show the eight different face masks used in this study: surgical mask, vertically folded KF-AD (anti-droplet) mask, horizontally folded KF-AD mask, vertically folded KF-80 mask, horizontally folded KF-80 mask, vertically folded KF-94 mask, horizontally folded KF-94 mask, and N95 mask. A surgical mask is a loose-fitting disposable device that creates a physical barrier between the wearer’s mouth and nose, and potential contaminants in the immediate environment.[18] Although a surgical mask with non-woven fabric fails to provide complete protection, many prefer to wear it because of its breathability.[19] The KF-80 and KF-94 masks are the ‘Korean filter’ standard; 80 (%) and 94 (%) refer to its filtration efficiency.[20] The KF-94 mask is equivalent to the N95. Horizontally and vertically folded KF94 masks have been the most widely used masks in Korea during the COVID-19 pandemic.[1] The experiment was conducted by randomly wearing 8 types of masks. Table 2 lists the specifications of the face masks used in this study.

Table 2.

Specifications of the face masks used in this study.

Mask Size Filter/Thickness Etc.
Surgical 175 mm × 90 mm – 155 mm 3 layers of non-woven fabric/
0.40 mm (±0.02 mm)
KFADv 145 mm × 130 mm
Arc length of
176 mm
3 layers of non-woven fabric/
0.48 mm (±0.02 mm)
KFADh 208 mm × 46 mm + 76 mm + 50 mm 3 layers of non-woven fabric/
0.44 mm (±0.02 mm)
BFE 99 %
KF80v 153 mm × 95 mm
Arc length of
155 mm
3 layers of non-woven fabric/
0.71 mm (±0.02 mm)
PM2.5
KF80h 206 mm × 50 mm + 76 mm + 48 mm 4 layers of non-woven fabric/
0.59 mm (±0.02 mm)
KF94v 150 mm × 115 mm
Arc length of
170 mm
3 layers of non-woven fabric/
0.61 mm (±0.02 mm)
KF94h 210 mm × 48 mm + 80 mm + 48 mm 4 layers of non-woven fabric/
0.71 mm (±0.02 mm)
PM2.5
N95 206 mm × 64 mm + 85 mm + 77 mm non-woven fabric/
1.12 mm (±0.02 mm)
3MTM AuraTM Particulate
Respirator 9205+, N95

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F1.jpg
Fig. 1.

(Color available online) 8 Types of face masks.

2.2 Speech recording under mask-wearing conditions

Recordings were completed in a flat-walled fully anechoic chamber[21] (8.2 m × 7.0 m × 7.5 m, fcutoff = 50 Hz), using a class 1 sound level meter (RION NL-52) which is able to record speech signal and measure speech levels 1 m away from a speaker’s mouth. Because room conditions affect speech acoustics, an anechoic chamber was chosen to eliminate room effects from speech recordings.

Three sustained vowels, /a/, /i/, and /u/, were recorded twice per speaker for 5 s each. Speakers read the Korean Standard Monosyllabic Word List (KS-MWL),[22] which was developed based on the international standard for speech audiometry.[23] The speech speed was set at 2 s per word. Because the purpose of the acoustic measurements was to test the effects of face masks on human speech, monosyllabic word lists could be useful for providing phonetically balanced content and enhancing discrimination in mask-wearing conditions. Four 50-word lists from the KS-MWL for adults and four 25-word lists for preschoolers were used.

Three vowels and eight phonetically balanced KS-MWL lists were recorded under nine different conditions: one non-mask-wearing and eight mask-wearing conditions.

2.3 Data extraction

2.3.1 Vowel analysis

Acoustic measurements were manually extracted from the vowel segments using Praat version 6.3.10. The standard pitch settings (75 Hz – 500 Hz) provided by Praat were used. The middle portion of the sustained vowels was extracted for the analysis. Mean F0, HNR, and formant frequencies (F1 and F2) were obtained using Praat. The VSA and VAI[15] were calculated using F1 and F2.

2.3.2 Spectral and cepstral analysis

The mean F0, Speech Level (SL) in dBA, L/H ratio with a cut-off frequency of 4 kHz, and CPP were analyzed as per the KS-MWL list. The F0 and L/H ratio were calculated using the R programming language version 4.2.3. The L/H ratio, with a cutoff frequency of 4 kHz, is known to decrease in speakers with dysphonia[24,25] and vocal effort.[15,26]

The SL was measured using a RION NL-52 sound level meter. The CPP was achieved using Praat CPP plugin.[27] The cepstral peak range was set to a standard of 60 Hz – 330 Hz.

Factorial ANalysis Of VAriance (ANOVA) was used to test the effects of face masks and speakers on spectral and cepstral measures. Tukey’s HSD was applied to enable multiple comparisons, the significance level was set to p = 0.05, and statistical analyses were conducted using Minitab® 21.1 (Minitab, State College, PA, US).

III. Results

3.1 Vowel analysis

Table 3 summarizes the vowel analysis results for the two speakers. The mean F0 was approximately 200 Hz for the female and 100 Hz for the male. HNR tended to increase for both speakers as the mask thickness increased. HNR The first two formants of the three corner vowels varied under mask-wearing conditions, as shown in Fig. 2, and VSA and VAI decreased with increasing mask thickness. The first two formants of male /i/ changed less than the other two vowels. For the female speaker, the first two formants of the three vowels changed under mask-wearing conditions. Increased mask thickness seemed to affect VSA and VAI.

Table 3.

F0, HNR, formant frequency (F1 and F2), VSA, and VAI for three vowels (/a/, /i/, and /u/).

Mask Vowel F0 (Hz) HNR (dB) Formant frequency VSA VAI
Mean SD Mean SD /a/ /i/ /u/
Female No mask 199.2 6.64 16.4 0.31 f1
f2
1054.2
1692.2
345.1
2877.7
308.3
925.0
714132.9 1.20
Surgical 209.8 1.24 16.5 0.32 f1
f2
1034.9
1460.1
318.4
2883.7
366.4
994.3
641979.4 1.25
KFADv 196.7 6.73 17.3 0.84 f1
f2
1041.0
1747.7
367.1
2360.1
392.1
1178.8
390399.1 0.92
KFADh 198.0 14.99 18.7 2.61 f1
f2
1096.8
1628.6
345.6
2808.1
413.1
1037.6
625151.8 1.14
KF80v 208.1 1.63 16.8 0.80 f1
f2
1051.0
1590.2
366.7
2900.1
365.3
866.5
696738.0 1.24
KF80h 201.0 5.08 17.9 0.49 f1
f2
988.9
1486.8
333.9
2891.2
403.9
1256.7
486165.0 1.11
KF94v 208.1 8.65 18.4 0.60 f1
f2
1021.1
1487.6
350.5
1959.3
410.4
1138.2
261196.7 0.88
KF94h 208.7 6.24 18.0 0.96 f1
f2
963.1
1535.9
353.1
1974.0
396.6
942.6
305038.1 0.91
N95 204.2 7.33 18.2 0.04 f1
f2
1018.9
1388.8
369.8
2038.5
386.4
1087.0
303443.5 0.95
Male No mask 96.2 2.43 14.0 0.05 f1
f2
665.9
1290.3
258.3
2090.3
240.5
568.9
317191.3 1.17
Surgical 99.7 6.12 12.8 0.09 f1
f2
686.3
1338.7
278.7
1978.5
331.3
663.9
251115.8 1.02
KFADv 104.8 25.31 14.9 0.91 f1
f2
666.8
1301.9
267.3
2122.8
318.2
659.1
271469.0 1.10
KFADh 91.5 7.70 15.6 0.33 f1
f2
738.3
1362.1
265.1
2080.5
313.9
859.9
271245.2 1.01
KF80v 93.1 7.63 16.1 3.05 f1
f2
779.4
1398.4
244.7
2081.4
320.7
757.5
327999.7 1.05
KF80h 94.2 5.32 16.5 2.01 f1
f2
696.9
1327.1
236.7
2151.2
321.6
842.1
266284.9 1.04
KF94v 96.2 8.17 15.8 0.35 f1
f2
678.0
1281.1
252.2
2087.1
295.6
931.2
227978.0 1.00
KF94h 91.2 4.79 16.5 0.26 f1
f2
684.8
1296.6
252.2
2018.2
347.4
1028.1
179803.1 0.92
N95 90.9 3.86 16.9 0.10 f1
f2
756.5
1382.1
263.7
2049.9
379.3
1073.5
201974.8 0.91

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F2.jpg
Fig. 2.

(Color available online) VSA for each mask-wearing condition. (a) Female, and (b) male.

Figs. 3 and 4 show the tendencies of F0, HNR, VSA, and VAI according to mask filter performance and mask thickness, respectively. The mean F0 was not dependent on the face mask filter performance. The male speaker tended to have higher R2 values for HNR, and VAI based on face mask filter performance than the female speaker. The R2 values in Fig. 3 were all higher than those in Fig. 4. Therefore, the filtration properties of the mask appear to have a more substantial effect compared to mask thickness, affecting both male and female participants.

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F3.jpg
Fig. 3.

(Color available online) Scatter plots for vowel analysis according to mask filter performance.

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F4.jpg
Fig. 4.

(Color available online) Scatter plots for vowel analysis according to mask thickness.

3.2 Spectral and cepstral analysis

The F0, SL, L/H ratio, and CPP of the eight phonetically balanced word lists were significantly affected by face masks, as shown in Table 4. Wearing a face mask considerably altered the acoustic parameters, with the effect size, L/H ratio, SL, and CPP exhibiting noticeable changes in that respective order. Table 5 lists the mean values, standard deviations, and Tukey’s post hoc results of the spectral and cepstral analyses. The mean F0 of female speaker decreased with face masks, whereas that of male speaker showed no statistically significant difference. The L/H ratio increased with increasing face mask thickness for both female and male speakers. For female speakers, the SL and CPP increased with increasing face mask thickness. However, for male speakers, these two parameters decreased with increasing face mask thickness. The mean speech level of the male speaker without face masks was 4.2 dBA greater than that of the female speaker.

Table 4.

ANOVA on the spectral and cepstral data.

F0 Speech level L/H Ratio CPP
F
p
Effect size
(%)
F
p
Effect size
(%)
F
p
Effect size
(%)
F
p
Effect size
(%)
Mask 3.53 00.3 12.53 9.2 23.11 15.9 3.97 5.0
0.001 < 0.0005 < 0.0005 < 0.0005
Voicer 10410.6 98.4 127.46 11.6 812.81 69.8 231.46 36.8
< 0.0005 < 0.0005 < 0.0005 < 0.0005
Mask
*
Voice
1.88 00.1 92.63 67.7 5.17 3.6 30.06 38.2
0.069 < 0.0005 < 0.0005 < 0.0005
Table 5.

F0, SL, L/H ratio with cut-off frequency of 4 kHz, CPP for the KS-MWL lists.

Mask List F0 (Hz) Speech level (dBA) L/H Ratio (dB) CPP (dB)
Mean SD Turkey HSD Mean SD Turkey HSD Mean SD Turkey HSD Mean SD Turkey HSD
Female No mask 230.8 9.39 A 59.3 0.52 EFG 20.0 1.16 I 10.7 0.27 CDE
Surgical 217.9 10.39 B 58.1 0.52 GHI 24.5 0.33 GH 10.5 0.41 DE
KFADv 216.9 6.94 B 60.1 0.64 DE 23.4 0.76 H 10.5 0.48 CDE
KFADh 219.1 9.57 AB 60.0 1.08 EDF 23.8 0.80 H 10.7 0.65 CD
KF80v 215.8 8.38 B 61.7 1.14 C 25.1 1.04 EFG 11.3 0.32 ABC
KF80h 216.5 8.59 B 61.4 0.59 C 26.2 0.72 FGH 11.3 0.47 ABC
KF94v 209.9 7.34 B 63.1 0.54 AB 26.9 0.54 DEF 11.8 0.34 A
KF94h 213.3 11.98 B 62.1 0.79 BC 24.5 0.52 GH 11.6 0.44 AB
N95 215.1 8.74 B 63.6 0.76 A 24.3 1.05 GH 11.5 0.40 AB
Male No mask 97.4 2.78 C 63.5 0.75 A 27.3 0.41 CDE 11.3 0.46 ABCD
Surgical 97.0 1.50 C 62.2 0.42 ABC 31.4 0.76 A 10.9 0.21 BCD
KFADv 95.5 2.12 C 61.2 0.53 CD 30.0 0.51 AB 10.6 0.25 CDE
KFADh 95.5 2.12 C 59.8 0.49 DEF 28.7 0.23 BCD 9.9 0.46 EF
KF80v 95.5 1.50 C 58.4 0.42 FGH 30.2 0.63 AB 9.9 0.32 EF
KF80h 93.3 2.90 C 57.2 1.06 I 30.0 0.63 ABC 9.2 0.52 F
KF94v 94.4 3.81 C 58.6 0.67 FGH 30.5 0.45 AB 9.4 0.47 F
KF94h 94.8 2.90 C 57.8 1.23 HI 28.6 0.55 BCD 9.2 0.42 F
N95 94.0 2.12 C 58.3 1.18 HI 30.5 0.77 AB 9.4 0.39 F

Fig. 5 shows the one-third octave band spectra of the speech levels. For the female speaker, the spectral energies below and above 3,150 Hz according to the face mask showed different patterns based on the no-mask spectrum. The spectral energy below 3,150 Hz with face masks was greater than that without face masks.

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F5.jpg
Fig. 5.

(Color available online) 1/3 octave band speech levels.

However, the spectral energy above 3,150 Hz with face masks was lower than that without face masks, except for the surgical mask. The male speaker showed no such reversal in speech spectra. The female voice had two adjacent fundamental peaks at approximately 200 and 220 Hz. Therefore, the change in her F0 may not have been caused by face masks.

Figs. 6 and 7 similarly show the trends of F0, speech level, L/H ratio, and CPP of the word lists according to mask filter performance and mask thickness, respectively. Gender-based differences were clearly observed in SL and CPP. The filter properties of the masks had a more pronounced impact than mask thickness for both male and female subjects in Figs. 6 and 7. This observation aligns with the findings from the vowel analysis presented in Figs. 3 and 4.

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F6.jpg
Fig. 6.

(Color available online) Scatter plots for mean values of the spectral and cepstral analysis according to mask filter performance.

https://cdn.apub.kr/journalsite/sites/ask/2024-043-04/N0660430407/images/ASK_43_04_07_F7.jpg
Fig. 7.

(Color available online) Scatter plots for mean values of the spectral and cepstral analysis according to mask thickness.

IV. Discussion

4.1 Face masks like low-pass filters

In the vowel analysis, F0 was not affected by mask thickness, but HNR increased with increasing face mask filter performnance and mask thickness. It can be seen that when the additional noise caused by aperiodic vocal fold vibration is blocked by the mask, the acoustic noise gets attenuated. These results are consistent with those reported by Gojayev et al.,[12] Nguyen et al.,[14] McKenna et al.,[15] and Geng et al..[17] The VSA and VAI values decreased as mask thickness increased. This is consistent with the findings of McKenna et al..[15] Because the VSA tended to show a larger VSA during clear speech, the female voice was clearer than the male voice, even under the no-mask condition. The VAI value is an index related to speech intelligibility and tends to decrease as the vowels become centralized. Thus, when wearing a mask, there is a decrease in VSA and VAI because only the low-frequency sounds pass through the mask.

In the word list analysis, the L/H ratio (= 4 kHz) increased owing to the reduced high frequency caused by the face mask for the female and male speakers.

4.2 Speech levels and CPPs due to face masks

Opposite trends were observed between the male and female speaker in terms of SLs and CPPs. For the male voice, the SL and CPPs decreased with increasing face mask filter performnance and mask thickness, whereas for the female voice, the SL and CPPs increased.

The results for the female voice are consistent with those of McKenna et al.,[15] as previous studies have shown that CPP measures depend on voice intensity[28,29] predicting vocal effort.[30] The reverberation time of a room influences the vocal load.[31] In an anechoic chamber where speech recording was conducted in this study, vocal effort may have been greater for the female speaker whose speech level was lower than the male speaker.

By contrast, no vocal effort was found for the male speaker, regardless of the presence or thickness of the mask. His speech level without a mask was 63.5 dBA, and wearing a face mask did not increase his speech levels.

These results suggest that (1) the vocal effort of a person who does not have strong vocal intensity, such as women and children, may increase when wearing a face mask, and (2) reverberation time can be considered a risk factor for speakers, particularly occupational voice users, in masked speech situations.

4.3 Limitations and future work

First, we tested only two voices from professional actors. However, their voice quality for standard Korean pronunciation was reliable based on their professional experience. It is recommended that the number of men and women would be expanded to generalize the effects of face masks on speech in future study.

Second, the vocal effort differed between male and female voices in this study. A cross-check through self-evaluation of the voice load, fatigue, and effort before and after collecting voice samples is necessary to fully understand the vocal effort.

Third, /a/, /i/, and /u/ of the vowel triangle were analyzed, and /e/ and /o/ could be added to the vowel pentagram analysis to determine the acoustic properties of the vowels according to the face mask. In future research, we would like to examine the changes in detail and to add an auditory perception evaluation of the distortion of consonants and vowels during masked speech through a speech intelligibility test or speech acceptability test from the listener's perspective.

Fourth, KS-MWL was used as the speech sample in this study. A connected-speech sample such as sentences that are everyday utterances through natural breathing is recommended for further studies.

Fifth, this study did not account for potential variations in vocalization caused by restricted mouth movement due to the tension exerted by the mask strings. For future research, it is advisable to conduct acoustic analyses that take into consideration different mask-related variables that could impact speech production.

V. Conclusions

In this study, the presence or filer performance of a face mask was found to affect speech acoustic parameters according to the speech characteristics. Face masks attenuated the high-frequency range, resulting in decreased VSA and VAI scores and an increased L/H ratio in all voice samples. This can result in lower speech intelligibility. However, the degree of increment and decrement was based on the voice characteristics. For female speakers, the SL and CPP increased with increasing face mask thickness. However, for male speakers, these two parameters decreased with increasing face mask thickness. Face masks provoked vocal effort when the vocal intensity was not sufficiently strong, or the environment had less reverberance. Further research needs to be conducted on the vocal efforts induced by face masks to overcome acoustic modifications when wearing masks.

Acknowledgements

We gratefully acknowledge the contributions of two voice actors who participated in our experiments.

This study was supported by the Basic Science Research Program of the National Research Foundation (NRF) [grant no. 2018R1D1A1B07048157] funded by the Ministry of Education, Republic of Korea. This study was also supported by research funds provided by Gwangju University in 2024.

References

1

M. Kwon and W. Yang, "Mask-wearing behaviors after two years of wearing masks due to COVID-19 in Korea: a cross-sectional study," Int J. Environ Res Public Health, Sw. 19, 14940 (2022).

10.3390/ijerph19221494036429657PMC9691200
2

T. Hampton, R. Crunkhorn, N. Lowe, J. Bhat, E. Hogg, W. Afifi, M. Krishnan, I. Street, S. De, R. Sharma, R. Clarke, S. Ratnayake, S. Dasgupta, and S. Sharma. "Speech discrimination challenges of healthcare professionals whilst wearing Personal Protective Equipment (PPE) during the coronavirus disease 2019 (COVID-19) pandemic," J. Authorea. (2020).

10.22541/au.159050338.83886289
3

R. M. Corey, U. Jones, and A. C. Singer, "Acoustic effects of medical, cloth, and transparent face masks on speech signals," J. Acoust Soc, Am. 148, 2371-2375 (2020).

10.1121/10.000227933138498PMC7857499
4

J. Jeong, M. Kim, and Y. Kim, "Changes on speech transmission characteristics by types of mask" (in Korean), J. Audiol Speech Res. Kr. 16, 295-304 (2020).

10.21848/asr.200053
5

T. Rahne, L. Fröhlich, S. Plontke, and L. Wagner, "Influence of surgical and N95 face masks on speech perception and listening effort in noise," J. PloS one, 16, e0253874 (2021).

10.1371/journal.pone.025387434197513PMC8248731
6

J. C. Toscano and C. M. Toscano, "Effects of face masks on speech recognition in multi-talker babble noise," J. PloS one, 16, e0246842 (2021).

10.1371/journal.pone.024684233626073PMC7904190
7

M. Magee, C. Lewis, G. Noffs, H. Reece, Jess. C. S. Chan, C. J. Zaga, C. Paynter, O. Birchall, S. R. Azocar, A. Ediriweera, K. Kenyon, M. W. Caverlé, B. G. Schultz, and A. P. Vogel, "Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols," J. Acoust Soc. Am. 148, 3562-3568 (2020).

10.1121/10.000287333379897PMC7857500
8

T. Zhang, M. He, B. Li, C. Zhang, and J. Hu, "Acoustic characteristics of cantonese speech through protective facial coverings," J. Voice, 22, 00269-7 (2022).

10.1016/j.jvoice.2022.08.029
9

M. L. Fiorella, G. Cavallaro, V. D. Nicola, and N. Quaranta, "Voice differences when wearing and not wearing a surgical mask," J. Voice, 37, 467.e1-467.e7 (2021).

10.1016/j.jvoice.2021.01.02633712355
10

A. Joshi, T. Procter, and P. A. Kulesz, "COVID-19: acoustic measures of voice in individuals wearing different facemasks," J. Voice, 37, 971.e1-971.e8 (2021).

10.1016/j.jvoice.2021.06.01534261582PMC8214155
11

V. S. McKenna, T. H. Patel, C. L. Kendall, R. J. Howell, and R. L. Gustin, "Voice acoustics and vocal effort in mask-wearing healthcare professionals: A comparison pre-and post-workday," J. Voice, 37, 802.e15-802.e23 (2021).

10.1016/j.jvoice.2021.04.01634112547
12

E. K. Gojayev, Z. Ç. Büyükatalay, T. Akyüz, M. Rehan, and G. Dursun, "The effect of masks and respirators on acoustic voice analysis during the COVID-19 pandemic," J. Voice, 38, 798.e1-798.e6 (2021).

10.1016/j.jvoice.2021.11.01434961655PMC8627850
13

Y. Lin, L. Cheng, Q. Wang, and W. Xu, "Effects of medical masks on voice assessment during the COVID-19 pandemic," J. Voice, 37, 802.e25-802.e29 (2021).

10.1016/j.jvoice.2021.04.02834116888
14

D. D. Nguyen, P. McCabe, D. Thomas, A. Purcell, M. Doble, D. Novakovic, A. Chacon, and C. Madill, "Acoustic voice characteristics with and without wearing a facemask," Scientific Rep., 2021.

10.1038/s41598-021-85130-833707509PMC7970997
15

V. S. McKenna, C. L. Kendall, T. H. Patel, R. J. Howell, and R. L. Gustin, "Impact of face masks on speech acoustics and vocal effort in healthcare professionals," J. Laryngoscope, 132, 391-397 (2022).

10.1002/lary.2976334287933PMC8742743
16

F. Calà, C. Manfredi, L. Battilocchi, L. Frassineti, and G. Cantarella, "Speaking with mask in the COVID-19 era: Multiclass machine learning classification of acoustic and perceptual parameters," J. Acoust Soc. Am. 153, 1204-1218 (2023).

10.1121/10.001724436859154
17

P. Geng, Q. Lu, H. Guo, and J. Zeng, "The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study," J. PloS one, 18, e0283724 (2023).

10.1371/journal.pone.028372436996037PMC10062611
19

L. Ma and M.-S. Kim, "A study of the purchasing tendency of healthcare masks based on the user-centered design concept-centered on the form and color of the mask" (in Korean), J. Korea Convergence Society, Kr. 11, 143-154 (2022).

20

M.-C. Kim, S. M. Bae, J. Y. Kim, S. Y. Park, J. S. Lim, M. K. Sung, and S. H. Kim, "Effectiveness of surgical, KF94, and N95 respirator masks in blocking SARS-CoV-2: a controlled comparison in 7 patients," J. Infect Dis. Lond. 52, 908-912 (2022).

10.1080/23744235.2020.181085832845196
21

S. J. Doo, S. W. Oh, P. Brandstatt, and H. V. Fuchs, "Anechoic chamber design using broadband compact absorber" (in Korean), Proc. Trans. Korean Soc. Noise Vib. Eng. 393-396 (2003).

22

W. Han and J. Bahng, "A review of development and standardization on Korean speech audiometry," J. Audiology, 9, 113-126 (2013).

10.21848/audiol.2013.9.2.113
23

ISO, https://www.iso.org/standard/74049.html, (Last viewed 01, 2022).

24

S. N. Awan, N. Roy, M. E. Jetté, G. S. Meltzner, and R. E. Hillman, "Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V," J. Clin Linguist Phon, 24, 742-758 (2010).

10.3109/02699206.2010.49244620687828
25

S. Y. Lowell, R. H. Colton, R. T. Kelley, and S. A. Mizia, "Predictive value and discriminant capacity of cepstral-and spectral-based measures during continuous speech," J. Voice, 27, 393-400 (2013).

10.1016/j.jvoice.2013.02.00523684735
26

V. S. McKenna and C. E. Stepp, "The relationship between acoustical and perceptual measures of vocal effort," J. Acoust Soc, Am. 144, 1643-1658 (2018).

10.1121/1.505523430424674PMC6167228
27

E. S. H. Murray, A. Chao, and L. Colletti, "A practical guide to calculating cepstral peak prominence in Praat," J. Voice. published online (2022).

28

S. N. Awan, A. Giovinco, and J. Owens, "Effects of vocal intensity and vowel type on cepstral analysis of voice," J. Voice, 26, 670.e15-670.e20 (2012).

10.1016/j.jvoice.2011.12.00122480754
29

M. Brockmann-Bauser, J. E. Bohlender, and D. D. Mehta, "Acoustic perturbation measures improve with increasing vocal intensity in individuals with and without voice disorders," J. Voice, 32, 162-168. (2018).

10.1016/j.jvoice.2017.04.00828528786PMC7053781
30

P. Bottalico, "Speech adjustments for room acoustics and their effects on vocal effort," J. Voice, 31, 392.e1- e12 (2017).

10.1016/j.jvoice.2016.10.00128029555PMC5409880
31

P. Bottalico, A. Astolfi, and E. J. Hunter, "Teachers' voicing and silence periods during continuous speech in classrooms with different reverberation times," J. Acoust Soc. Am. 141, EL26-EL31 (2017).

10.1121/1.497331228147593PMC5392096
페이지 상단으로 이동하기