Jesse Zymet, "Lexical propensities in phonology: Corpus and experimental evidence, grammar, and learning"

Colloquium | September 17 | 3:10-5 p.m. | 370 Dwinelle Hall

 Jesse Zymet, UC Berkeley

 Department of Linguistics

Traditional theories of phonological variation propose that morphemes be encoded with descriptors such as [+/- Rule X], to capture which of them participate in a variable process. More recent theories predict that individual morphemes can have lexical propensities: idiosyncratic, gradient rates at which they participate in a process—e.g., [0.7 Rule X]. In this talk, I argue that such propensities exist, and that a binary distinction is not rich enough to characterize participation in variable processes. Corpus investigations into Slovenian palatalization and French liaison reveal that individual morphemes pattern across an entire propensity spectrum. Furthermore, an experimental investigation into French speakers’ intuitions suggests that they internalize word-specific propensities to undergo liaison.

A spate of experimental research has uncovered language learners’ ability to acquire the idiosyncratic behavior of individual attested words while frequency matching to statistical generalizations across the lexicon (e.g., how regularly a variable process applies overall across eligible words). How can we model the learning of lexical propensities together with a frequency-matching grammar? A recent approach based in Maximum Entropy Harmonic Grammar (MaxEnt) makes use of general constraints that putatively capture statistical generalizations across the lexicon, as well as lexical constraints governing the behavior of individual words. With a series of learning simulations, I show that the approach fails to learn statistical generalizations across the lexicon: lexical constraints are so powerful that the learner comes to acquire the behavior of each attested form using only these constraints, at which point the general constraint is rendered superfluous and ineffective. A generality bias is therefore attributed to learners, whereby they privilege general constraints over lexical ones. It is argued that MaxEnt—essentially a canonical logistic regression model—fails to represent this property, and that it be replaced with the hierarchical mixed-effects logistic regression model (Mixed-Effects MaxEnt), which is shown to succeed in learning both a frequency-matching grammar and lexical propensities, by encoding general constraints as fixed effects and lexical constraints as a random effect. The learner treats the grammar and lexicon differently, in that vocabulary effects are subordinated to broad, grammatical effects in the learning process.

 hyman@berkeley.edu