Statistics
http://events.berkeley.edu/index.php/calendar/sn/stat.html
Upcoming EventsNick Sahinidis — ALAMO: Machine learning from data and first principles, Oct 2
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=111022&date=2017-10-02
We have developed the ALAMO methodology with the aim of producing a tool capable of using data to learn algebraic models that are accurate and as simple as possible. ALAMO relies on (a) integer nonlinear optimization to build low-complexity models from input-output data, (b) derivative-free optimization to collect additional data points<br />
that can be used to improve tentative models, and (c) global optimization to enforce physical constraints on the mathematical structure of the model. We present computational results and comparisons between ALAMO and a variety of learning techniques, including Latin hypercube sampling, simple least-squares regression, and the lasso. We also describe results from applications in CO 2 capture that motivated the development of ALAMO.<br />
<br />
Nick Sahinidis<br />
Department of Chemical Engineering<br />
Carnegie Mellon University<br />
http://archimedes.cheme.cmu.edu<br />
Sahinidis@cmu.eduhttp://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=111022&date=2017-10-02Seminar 217, Risk Management: Nonparametric Risk Attribution for Factor Models of Portfolios, Oct 3
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112150&date=2017-10-03
Factor models are used to predict the future returns of a portfolio with known positions in many assets. These simulations yield a distribution of future returns and various measures of the risk of the portfolio. Clients would often like to identify sources of risk in their portfolios, but this is difficult when factors influence the portfolio in nonlinear ways, such as when returns are measured on a log scale and when the portfolio contains nonlinear instruments. We develop a two-step method to partition risk among factors in a portfolio which accounts for these nonlinearities: first, model the relationship between factors and portfolio returns, and second, estimate the risk contribution of each factor as the increase in portfolio risk due to increasing the factor's weight. Both of these steps can be done using nonparametric regressions, which make no assumptions about the distribution of factors or their functional relationship with the portfolio returns. This research was done at State Street GX Labs.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112150&date=2017-10-03Chaining, interpolation, and convexity, Oct 4
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112141&date=2017-10-04
Classical estimates on the suprema of random processes in terms of metric<br />
entropy have found widespread use in probability theory, statistics,<br />
computer science, and other areas. Such estimates are powerful and easy to<br />
use, but often fail to be sharp. To obtain sharp bounds, one must replace<br />
these methods by a multiscale analogue known as the generic chaining that<br />
was developed by Talagrand. Unfortunately, the latter is notoriously<br />
difficult to use in any given situation. In this talk, I will exhibit a<br />
surprisingly simple construction, inspired by real interpolation of Banach<br />
spaces, that is readily amenable to explicit computations. Despite its<br />
simplicity, the method proves to be sufficiently powerful to recover the<br />
central results in Talagrand's theory in a very simple way. The talk will<br />
focus on some basic ideas and will be illustrated by specific examples; I<br />
will not assume prior familiarity with the topic. If time permits, I will<br />
briefly outline applications to the majorizing measure theorem and randommatrices, as well as some embarrassing open problems.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112141&date=2017-10-04Heterogeneity: opportunities for causal inference and prediction, Oct 4
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112005&date=2017-10-04
Heterogeneity in potentially large-scale data can be beneficially exploited for causal inference and more robust prediction. The key idea relies on invariance and stability across different heterogeneous regimes or sub-populations. What we term as "anchor regression" opens some novel insights and connections between causality and protection (robustness) against worst case interventions in prediction problems. The resulting new procedures offer (possibly conservative) confidence guarantees. We will discuss the methodology as well as some applications.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112005&date=2017-10-04BLISS Seminar: Optimal Experimental Design via Regret Minimization, Oct 9
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112326&date=2017-10-09
The experimental design problem concerns the selection of k points from a potentially very large design pool of p-dimensional vectors, so as to maximize the statistical efficiency regressed on the selected k design points.<br />
<br />
We achieve 1+varepsilon approximations to all popular optimality criteria for this problem, including A-optimality, D-optimality, T-optimality, E-optimality, V-optimality and G-optimality, with only k = O(p/varepsilon^2) design points.<br />
<br />
In contrast, to the best of our knowledge, previously (1) no polynomial-time algorithm exists for achieving any 1+varepsilon approximations for DTEG-optimality, and (2) the algorithm achieving 1+varepsilon approximation for AV-optimality require $k geq Omega(p^2/varepsilon)$http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112326&date=2017-10-09Seminar 217, Risk Management: Advances in Basketball Analytics Using Player Tracking Data, Oct 10
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110436&date=2017-10-10
In the 2013-2014 season, the National Basketball League, in conjunction with STATS LLC, implemented a league-wide program to collect player-tracking data for all NBA games. The data feed now provides 25-FPS records of all players' XY coordinates on the court, as well as XYZ coordinates for the ball. This data source has opened up new lines in inquiry into the quantitative analysis of basketball that have previously been hamstrung by a reliance on spatially naive box-score and play-by-play statistics. In this talk I will discuss several projects undertaken by myself and the XY Research group that use newly-available spatial data to work toward answering fundamental questions about basketball. Topics covered will include expected (EPV, or a stock-ticker for a possession), defensive shot charts, the impact of ball movement, and play detection.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110436&date=2017-10-10Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment, Oct 11
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112319&date=2017-10-11
Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions and orientations; Cryo-Electron Microscopy (Cryo-EM) imaging where projections of a molecule density are taken from unknown rotations, and several others.<br />
<br />
One fundamental example of this type of problems is Multi-Reference Alignment: Given a group acting in a space, the goal is to estimate an orbit of the group action from noisy samples. For example, in one of its simplest forms, one is tasked with estimating a signal from noisy cyclically shifted copies. We will show that the number of observations needed by any method has a surprising dependency on the signal-to-noise ratio (SNR), and algebraic properties of the underlying group action. Remarkably, in some important cases, this sample complexity is achieved with computationally efficient methods based on computing invariants under the group of transformations.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112319&date=2017-10-11Split-Sample Strategies for Avoiding False Discoveries, Oct 11
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110881&date=2017-10-11
Preanalysis plans (PAPs) have become an important tool for limiting false discoveries in field experiments. We evaluate the properties of an alternate approach which splits the data into two samples: An exploratory sample and a confirmation sample. When hypotheses are homogeneous, we describe an improved split-sample approach that achieves 90% of the rejections of the optimal PAP without requiring preregistration or constraints on specification search in the exploratory sample. When hypotheses are heterogeneous in priors or intrinsic interest, we find that a hybrid approach which prespecifies hypotheses with high weights and priors and uses a split-sample approach to test additional hypotheses can have power gains over any pure PAP. We assess this approach using the community-driven development (CDD) application from Casey et al (2012) and find that the use of a hybrid split-sample approach would have generated qualitatively different conclusions.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110881&date=2017-10-11Philip Protter - Issues of Incomplete Markets, Oct 16
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112204&date=2017-10-16
Abstract: In a complete market, there is a unique risk neutral measure, unique prices, and all contingent claims can be (at least theoretically) perfectly hedged. In an incomplete market, in contrast, there is an infinite number of risk neutral measures, a continuum of “fair” prices, and contingent claims can in general not be perfectly hedged, even theoretically. Unfortunately, there seems to be plenty of evidence markets in actuality are incomplete.<br />
<br />
We are interested in trying to understand this a priori confusing situation. To make things concrete, we address the following question: Suppose a sequence of incomplete markets “converges” to a complete market in an appropriate sense (to be defined), do the major objects also converge? Mostly, this is false: the ranges of option prices do not converge, for example. We work out some simple examples that illustrate some of the issues, and we indicate when one might have some kind of reasonable convergence of the markets, and what such a convergence might be.<br />
<br />
The talk is back on joint work with Jean Jacod. <br />
<br />
Light refreshments will be served.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112204&date=2017-10-16Seminar 217, Risk Management: Backtest overfitting, stock fund design and forecast performance, Oct 17
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110437&date=2017-10-17
Backtest overfitting means the usage of backtests (historical market data) to construct an investment strategy, fund or portfolio, when the number of variations explored exceeds limits of statistical reliability. We show that backtest overfitting is inevitable when computer programs are employed to explore millions or even billions of parameter variations (as is typical) to select an optimal variant. We illustrate this by showing that while it is a simple matter to design a stock fund, based only on a weighted collection of S&P500 stocks, that achieves any desired performance profile, these funds typically perform erratically at best when confronted with new, out-of-sample data. Similarly, we present results of a recent study of market forecasters, most of whom employ some sort of historical market data analysis, and show that few, if any, have a positive long-term record.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110437&date=2017-10-17Humans Enter the Robot Equation, Oct 18
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112449&date=2017-10-18
Robots are becoming increasingly more capable of optimizing objective functions for physical tasks, from navigation, to dexterous manipulation, to flight. The ultimate goal is to perform these tasks for us, in our environments. We want cars driving on our roads, or personal robots assisting us with activities of daily living as we age in our own homes. Right now, we tend to be merely obstacles to these robots. I believe we need to be more -- humans need to enter the robot equation in two fundamental ways. First, we are agents who take actions in the same spaces, putting a burden on robots that their actions are <em>well-coordinated</em> with ours. Second, and perhaps more importantly, we hold the key to what the robot's objective function be in the first place -- robots need to optimize for what <em>we</em> want, for <em>our</em> values, for what helps <em>us</em>. In this talk, I will summarize my lab's journey into making robots formally reason about people in these two ways.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112449&date=2017-10-18Large deviation and entropic optimality in sparse settings, Oct 18
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112417&date=2017-10-18
The so called upper tail problem in the Erdos-Renyi random graph comes up naturally in the study of exponential random graph models and related graph ensembles with prescribed subgraph densities. The problem is broadly twofold:<br />
<br />
(1) To estimate the probability that the number of copies of a graph H in a random graph is atypically large. <br />
<br />
(2) To describe the structure of the random graph, in particular its clustering properties, in this situation.<br />
<br />
This is a canonical example of a non-linear large deviation problem where the statistic of interest is a polynomial of independent bits. In this talk, I will describe some recent progress regarding the upper tail problem in the sparse setting, i.e., when the edge density decays to zero, involving solutions of certain entropic optimization problems. <br />
Results in other related settings and open problems will also be presented.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112417&date=2017-10-18Theoretically Speaking Series — Black Holes, Firewalls, and the Limits of Quantum Computers, Oct 18
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112043&date=2017-10-18
Quantum computers are proposed devices that would exploit quantum mechanics to solve certain specific problems dramatically faster than we know how to solve them with today's computers. In the popular press, quantum computers are often presented not just as an exciting frontier of science and technology (which they are), but as magic devices that would work by simply trying every possible solution in parallel. However, research over the past 25 years has revealed that the truth is much more subtle and problem-dependent: for some types of problems, quantum computers would offer only modest speedups or no speedups at all. These limitations are entirely separate from the practical difficulties of building quantum computers (such as "decoherence"), and apply even to the fully error-corrected quantum computers we hope will be built in the future. In this talk, I'll give a crash course on what computer science has learned about both the capabilities and the limitations of quantum computers. Then, in a final section, I'll describe a remarkable and unexpected connection – made just within the last five years – where the conjectured limitations of quantum computers have been applied to issues in fundamental physics. These include Hawking's black-hole information puzzle (in its modern incarnation as the "firewall paradox"), and understanding the growth of wormholes in the so-called gauge/gravity duality that emerged from string theory.<br />
<br />
Theoretically Speaking is a lecture series highlighting exciting advances in theoretical computer science for a broad general audience. Events are held at the David Brower Center in Downtown Berkeley, and are free and open to the public. No special background is assumed.<br />
<br />
Light refreshments will be served before the lecture, at 5:30 p.m.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112043&date=2017-10-18GraphXD Seminar, Oct 19
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=111915&date=2017-10-19
Many important properties of an undirected graph manifest themselves spectrally in the eigenvalues or quadratic forms of matrices related to the graph. For instance, the connectivity structure, electrical properties, and random walk behavior of a graph are determined by its Laplacian matrix. A spectral sparsifier of a graph G is a sparse graph H on the same set of vertices such that the Laplacians of H and G are close, so that H captures the spectral behavior of G while being much cheaper to store and perform computations on. We survey a line of work showing that spectral sparsifiers with constant degree exist for every graph and can be computed efficiently.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=111915&date=2017-10-19Science at Cal Lecture - Leave election integrity to chance, Oct 21
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112163&date=2017-10-21
There’s no perfect way to count votes. To paraphrase Ulysses S. Grant and Richard M. Nixon, “Mistakes will be made.” Voters don’t always follow instructions. Voting systems can be mis-programmed. Ballots can be misplaced. Election fraud is not entirely unknown in the U.S. And the more elections depend on technology, the more vulnerable they are to failures, bugs, and hacking–domestic and foreign.<br />
<br />
How can we protect elections against honest mistakes and nation states that want to influence our political system? If we vote on paper ballots and keep track of them well, we can double-check election results by inspecting a random sample of ballots. If the results are right, a very small random sample can suffice to confirm the results; if the results are wrong, a full manual count may be required to set the record straight.<br />
<br />
“Risk-limiting audits” (RLAs), developed in 2007, guarantee that if the outcome is wrong, there is a large chance that the audit will correct the record before the results are official. They have been tested in California, Colorado, Ohio, and Denmark. Colorado will be the first state to routinely conduct RLAs, starting in November, 2017, and Rhode Island just passed a law requiring RLAs starting in 2018. An immediate national push for RLAs could give the public justified confidence in the 2018 midterm elections and the 2020 presidential election. But time is short.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=112163&date=2017-10-21Seminar 217, Risk Management: Submodular Risk Allocation, Oct 24
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110438&date=2017-10-24
We analyze the optimal allocation of trades to portfolios when the cost associated with an allocation is proportional to each portfolio's risk. Our investigation is motivated by changes in the over-the-counter derivatives markets, under which some contracts may be traded bilaterally or through central counterparties, splitting a set of trades into two or more portfolios. A derivatives dealer faces risk-based collateral and capital costs for each portfolio, and it seeks to minimize total margin requirements through its allocation of trades to portfolios. When margin requirements are submodular, the problem becomes a submodular intersection problem. Its dual provides per-trade margin attributions, and assigning trades to portfolios based on the lowest attributed costs yields an optimal allocation. As part of this investigation, we derive conditions under which standard deviation and other risk measures are submodular functions of sets of trades. We compare systemwide optimality with individually optimal allocations in a market with multiple dealers.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110438&date=2017-10-24Seminar 217, Risk Management:, Oct 31
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110439&date=2017-10-31
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=110439&date=2017-10-31