'On the Acoustical and Perceptual Features of Vowel Nasality'
Vowel nasality is, simply put, the difference in the vowel sound between the English words "pat" and "pant", or between the French "beau" and "bon". This phenomenon is used in languages around the world, but is relatively poorly understood from an acoustical standpoint, meaning that although we as human listeners can easily hear that a vowel is or isn't nasalized, it's quite difficult for us to measure or identify that nasality in a laboratory context.
The goal of my dissertation is to better understand vowel nasality in language by discovering not just what parts of the sound signal change in oral vs. nasal vowels, but which parts of the signal are actually used by listeners to perceive differences in nasality.
I've written up a summary of the process, aimed at a more general audience, on my blog, or you can read the abstract below.
Although much is known about the linguistic function of vowel nasality, either contrastive (as in French) or coarticulatory (as in English), less is known about its perception. This study uses careful examination of production patterns, along with data from both machine learning and human listeners to establish which acoustical features are useful (and used) for identifying vowel nasality.
A corpus of 4,778 oral and nasal or nasalized vowels in English and French was collected, and feature data for 29 potential perceptual features was extracted. A series of Linear Mixed-Effects Regressions showed 7 promising features with large oral-to-nasal feature differences, and highlighted some cross-linguistic differences in the relative importance of these features.
Two machine learning algorithms, Support Vector Machines and RandomForests, were trained on this data to identify features or feature groupings that were most effective at predicting nasality token-by-token in each language. The list of promising features was thus narrowed to four: A1-P0, Vowel Duration, Spectral Tilt, and Formant Frequency/Bandwidth.
These four features were manipulated in vowels in oral and nasal contexts in English, adding nasal features to oral vowels and reducing nasal features in nasalized vowels, in an attempt to influence oral/nasal classification. These stimuli were presented to native English listeners in a lexical choice task with phoneme masking, measuring oral/nasal classification accuracy and reaction time. Only modifications to vowel formant structure caused any perceptual change for listeners, resulting in increased reaction times, as well as increased oral/nasal confusion in the oral-to-nasal (feature addition) stimuli. Classification of already-nasal vowels was not affected by any modifications, suggesting a perceptual role for other acoustical characteristics alongside nasality-specific cues. A Support Vector Machine trained on the same stimuli showed a similar pattern of sensitivity to the experimental modifications.
Thus, based on both the machine learning and human perception results, formant structure, particularly F1 bandwidth, appears to be the primary cue to the perception of nasality in English. This close relationship of nasal- and oral-cavity derived acoustical cues leads to a strong perceptual role for both the oral and nasal aspects of nasal vowels.
Title: "On the Acoustical and Perceptual Features of Vowel Nasality"
Advisor: Dr. Rebecca Scarborough
Defense Date: March 18th, 2015
- Surveying the nasal peak: A1 and P0 in nasal and nasalized vowels - Will Styler and Rebecca Scarborough, presented as a poster at the 2014 Acoustical Society of America meeting in Indianapolis.