Skip to main content.
Advanced search >
<< Back to previous page Print

<< Friday, May 10, 2013 >>


Remind me

Tell a friend

Add to my Google calendar (bCal)

Download to my calendar

Bookmark and ShareShare


[Dissertation Talk] Minimax Optimality in Online Learning under Logarithmic Loss

Presentation: Departmental | May 10 | 2-3 p.m. | 310 Soda Hall


Fares Hedayati

Electrical Engineering and Computer Sciences (EECS)


We study online prediction of individual sequences under logarithmic loss with parametric experts. The goal is to predict a sequence of outcomes x_t, revealed one at a time, almost as well as a set of experts. At round t, the forecaster's prediction takes the form of a conditional probability density, p( X | x_1,..., x_{t-1}). The loss that the forecaster suffers at that round is negative of the log of the probability of forecaster's prediction. The performance of the prediction strategy is measured relative to the best in a reference set of expert, a paramedic class of i.i.d distributions. The difference between the accumulated loss of the prediction strategy and the best expert in the reference set is called the regret.

The minimax regret, is achieved by the normalized maximum likelihood (NML) strategy. NML strategy is naturally defined via a joint distribution not conditionals. Conditionals are computed at each round by marginalization which is very costly for NML. Due to this drawback, much focus has been given to alternative strategies such as sequential normalized maximum likelihood (SNML) and Bayesian strategies. We investigate conditions that lead to optimality of SNML and Bayesian strategies. We show that optimality of SNML makes it equivalent to a certain Bayesian strategy, namely the Bayesian strategy under Jefferys prior, and vice versa. That is to say if the Bayesian strategy under Jeffreys prior is optimal then the strategy is exactly SNML. In both cases they are equivalent to NML. Furthermore optimality of SNML happens if and only if the joint distribution on sequences defined by SNML is exchangeable. These results are proven for exponential families and any parametric family for which the maximum likelihood is asymptotically normal. The most important implication of these results is that when SNML-exchangeability holds NML becomes horizon-independent, and it could be either calculated through a Bayesian update with Jefferys prior or through a one step-ahead maximum likelihood calculation as in SNML. Interestingly this phenomenon holds for a large class of one-dimensional exponential family distributions, namely for Gaussian, the gamma, and the Tweedie exponential family of order 3/2, and any one-to-one transformation of them. It cannot hold for other one-dimensional exponential family distributions.


fares.hedayati@gmail.com, 510-621-8396