BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//University of California\, Berkeley//UCB Events Calendar//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
DTSTART:19701029T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:19700402T020000
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20180410T213930Z
DTSTART;TZID=America/Los_Angeles:20180416T153000
DTEND;TZID=America/Los_Angeles:20180416T163000
TRANSP:OPAQUE
SUMMARY:Rahul Jain — Reinforcement Learning without Reinforcement
UID:116947-ucb-events-calendar@berkeley.edu
ORGANIZER;CN="UC Berkeley Calendar Network":
LOCATION:3108 Etcheverry Hall
DESCRIPTION:Rahul Jain\, University of Southern California\n\nAbstract: Reinforcement Learning (RL) is concerned with solving sequential decision-making problems in the presence of uncertainty. RL is really about two problems together. The first is the `Bellman problem’: Finding the optimal policy given the model\, which may involve large state spaces. Various approximate dynamic programming and RL schemes have been developed\, but either there are no guarantees\, or they are not universal\, or rather slow. In fact\, most RL algorithms have become synonymous with stochastic approximation schemes that are known to be rather slow. This is an even more difficult problem for MDPs with continuous state (and action) spaces. We present a class of RL algorithms for the continuous state space problem based on `empirical’ ideas\, which are simple\, effective and yet universal with probabilistic guarantees. The idea involves randomized Kernel-based function fitting combined with `empirical’ updates. The key is a “probabilistic contraction analysis” method we have developed for analysis of stochastic iterative algorithms\, wherein we show convergence to a probabilistic fixed point of a sequence of random operators via a stochastic dominance argument. \n\nThe second RL problem is the `online learning (or the Lai-Robbins) problem’ when the model itself is unknown. We propose a simple posterior sampling-based regret-minimization reinforcement learning algorithm for MDPs. It achieves O(sqrt{T})-regret which is order-optimal. It not only optimally manages the “exploration versus exploitation tradeoff” but also obviates the need for expensive computation for exploration. The algorithm differs from classical adaptive control in its focus on non-asymptotic regret optimality as opposed to asymptotic stability. This seems to resolve a long standing open problem in Reinforcement Learning.\n\nBiography: Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering\, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur\, and an MA in Statistics and a PhD in EECS from the University of California\, Berkeley. Prior to joining USC\, he was at the IBM T J Watson Research Center\, Yorktown Heights\, NY. He has received numerous awards including the NSF CAREER award\, the ONR Young Investigator award\, an IBM Faculty award\, the James H. Zumberge Faculty Research and Innovation Award\, and is currently a US Fulbright Specialist Scholar. His interests span reinforcement learning\, stochastic control\, statistical learning\, stochastic networks\, and game theory\, and power systems and healthcare on the applications side. The talk is based on work with a number of outstanding students and postdocs who are now faculty members themselves at top places.
URL:http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?event_ID=116947&view=preview
SEQUENCE:0
CLASS:PUBLIC
CREATED:20180410T213930Z
LAST-MODIFIED:20180410T213930Z
X-MICROSOFT-CDO-BUSYSTATUS:BUSY
X-MICROSOFT-CDO-INSTTYPE:0
X-MICROSOFT-CDO-IMPORTANCE:1
X-MICROSOFT-CDO-OWNERAPPTID:-1
END:VEVENT
END:VCALENDAR