Skip to main content.
Advanced search >
<< Back to previous page Print

<< Wednesday, December 02, 2009 >>


Remind me

Tell a friend

Add to my appointment calendar

Bookmark and ShareShare


Supervised topic models

Seminar: Neyman Seminar | December 2 | 4-5 p.m. | 1011 Evans Hall


Jon McAuliffe, Adjunct Assistant Professor, Department of Statistics, UC Berkeley

Statistics, Department of


The scale of contemporary electronic text collections has led to growing interest in statistical models based on so-called topics. Formally, a topic is a probability distribution over a vocabulary. Informally, a topic is intended to capture an underlying semantic theme. Most topic models are unsupervised: only the words in the documents are modelled. I will describe supervised latent Dirichlet allocation, a model in which each document is paired with a response variable. The goal is to infer latent topics predictive of the response, then use the fitted model to predict response values for previously unobserved documents. Since exact maximum likelihood is intractable, I will present an approximate EM algorithm which uses variational inference. I'll also discuss results on example document prediction problems, with comparisons to other approaches. Some background on text modeling and variational methods will be provided.

Joint work with David Blei, Princeton University.

Refreshments from about 3:45pm in 1011 Evans


510-642-2781