SRPP: Talker identity from acoustic voice variability

Cynthia Yoonjeong Lee (Department of Head & Neck Surgery, UCLA)
17 December 2021, 17h0018h30

What makes your voice yours? Human voices, our “auditory faces,” are inherently social, involving a speaker, a signal, a listener, and their interaction. Neither voice perception nor production can be understood without consideration of the dynamic, variable signals that shape utterances. Our team has identified a suite of measures that constitute a psycho-acoustic model of voice quality, paving the way for a long overdue refinement in characterizing talker voice variation. In this talk, I introduce a series of interdisciplinary studies of voice quality that tackle the challenge of identifying which of the model’s indices account for perceptually relevant acoustic variance within and among speakers. These studies investigate vocal and perceptual behaviors of many individuals from various backgrounds, employing computational tools to analyze large arrays of high-dimensional data to characterize voice variation within and across speakers and across voice qualities, speaking styles, emotions, and dialects or languages. The overarching hypothesis is that the same small set of acoustic variables characterizes acoustic variability across voices but that much of what characterizes individual speakers is idiosyncratic. Our investigation incorporates a broad range of language and communicative contexts, including identifying acoustic spaces for speech produced in different speaking styles and under different emotions, languages with and/or without tone and/or phonation contrasts, severely pathologic voices, and their perceptual consequences. Our findings serve as a basis for research on voice production and recognition, and for clinical diagnosis and/or treatment of deviation in voice quality.

