Moghiseh, E., Sonderegger, M., and Wagner, M. (2023). The iambic-trochaic law without iambs or trochees: Parsing speech for grouping and prominence. Journal of the Acoustical Society of America, 153(2):1108–1129. [doi][osf]
Listeners parse the speech signal effortlessly into words and phrases, but many questions remain about how. One classic idea is that rhythm-related auditory principles play a role, in particular, that a psycho-acoustic “iambic-trochaic law” (ITL) ensures that alternating sounds varying in intensity are perceived as recurrent binary groups with initial prominence (trochees), while alternating sounds varying in duration are perceived as binary groups with final prominence (iambs). We test the hypothesis that the ITL is in fact an indirect consequence of the parsing of speech along two in-principle orthogonal dimensions: prominence and grouping. Results from several perception experiments show that the two dimensions, prominence and grouping, are each reliably cued by both intensity and duration, while foot type is not associated with consistent cues. The ITL emerges only when one manipulates either intensity or duration in an extreme way. Overall, the results suggest that foot perception is derivative of the cognitively more basic decisions of grouping and prominence, and the notions of trochee and iamb may not play any direct role in speech parsing. A task manipulation furthermore gives new insight into how these decisions mutually inform each other.
Wagner, Michael (2021). Toward an alternative(s) syntax: Projecting and operating over syntactic alternatives. Colloqium Talk at Michigan State University, October 14 2021 [handout]
Abstract: Many grammatical phenomena have been analyzed based on the assumption that constituents can introduce semantic alternatives, and that these alternatives can project by point-wise semantic composition, following Hamblin’s 1973 analysis of questions. This talk presents arguments that linguistics expressions can also introduce syntactic alternatives, that these alternatives can “project” in a point-wise fashion to create larger linguistic expressions, and that grammar can operate over sets of linguistic expressions. This syntactic view of alternatives is compatible with Katzir’s 2007 independent arguments that alternatives are, at least sometimes, structural. The evidence comes from data involving prosodic focus, association with focus, disjunction, and coordination.
Wagner, Michael (2021). Why predictability is not predictive without a linguistic theory and a theory of processing. The case of external sandhi. Talk presented at Universität des Saarlandes, July 15 2021. Reporting on joint work with Oriana Kilbourn-Ceron and others [slides]
(I updated the title after the talk to add ‘and a theory of processing’ to better reflect the content)
In a sequence of otherwise equal sounds, listeners tend to hear a series of trochees (groups of two sounds with an initial beat) when every other sound is louder; they tend to hear a series of iambs (groups of two sounds with a final beat) when every other sound is longer. The article presents evidence that this so-called “Iambic–Trochaic Law” (ITL) is a consequence of the way listeners parse the signal along two orthogonal dimensions, grouping (Which tone is first/last?) and prominence (Which tone is prominent?). A production experiment shows that in speech, intensity and duration correlate when encoding prominence, but anticorrelate when encoding grouping. A model of the production data shows that the ITL emerges from the cue distribution based on a listener’s predicted decisions about prominence and grouping respectively. This, and further predictions derived from the model, are then tested in speech and tone perception. The perception results provide evidence that intensity and duration are excellent cues for grouping and prominence, but poor cues for the distinction between iamb and trochee per se. Overall, the findings illustrate how the ITL derives from the way listeners recover two orthogonal perceptual dimensions, grouping and prominence, from a single acoustic stream.
Humans appear to be wired to perceive acoustic events rhythmically.
English speakers, for example, tend to perceive alternating short and long sounds as a series of binary groups with a final beat (iambs), and alternating soft and loud sounds as a series of trochees.
This generalization, often called the ‘Iambic-trochaic Law’ (ITL), although viewed as an auditory universal by some, has been argued to be shaped by language experience.
Earlier work on the ITL had a crucial limitation, in that it did not tease apart the percepts of grouping and prominence, which the notions of iamb and trochee inherently confound.
We explore how intensity and duration relate to percepts of prominence and grouping in six languages (English, French, German, Japanese, Mandarin, and Spanish).
The results show that the ITL is not universal, and that cue interpretation is shaped by language experience.
However, there are also invariances:
Duration appears relatively robust across languages as a cue to prominence (longer syllables are perceived as stressed), and intensity for grouping (louder syllables are perceived as initial).
The results show the beginnings of a rhythmic typology based on how the dimensions of grouping and prominence are cued.
A 3 min video presentation of the paper is available here:
At this fall’s interspeech, we’ll be presenting a paper on our prosoBeast, an annotation tool for looking at intonation:
Gerazov B. and M. Wagner (2021). ProsoBeast Prosody Annotation Tool. Proceedings of Interspeech. 2621–2625 [doi][archive][git][video]
The labelling of speech corpora is a laborious and time-consuming process.
The ProsoBeast Annotation Tool seeks to ease and accelerate this process by providing an interactive 2D representation of the prosodic landscape of the data, in which contours are distributed based on their similarity.
This interactive map allows the user to inspect and label the utterances.
The tool integrates several state-of-the-art methods for dimensionality reduction and feature embedding, including variational autoencoders.
The user can use these to find a good representation for their data.
In addition, as most of these methods are stochastic, each can be used to generate an unlimited number of different prosodic maps.
The web app then allows the user to seamlessly switch between these alternative representations in the annotation process.
Experiments with a sample prosodically rich dataset have shown that the tool manages to find good representations of varied data and is helpful both for annotation and label correction.
The tool is released as free software for use by the community.
A 3 min video presentation of the paper is available here: