Statistics
http://events.berkeley.edu/index.php/calendar/sn/stat.html
Upcoming EventsSeminar 217, Risk Management: The role of dynamic and static volatility interruptions: Evidence from the Korean stock markets, Mar 1
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114353&date=2018-03-01
We conduct a comprehensive analysis on the sequential introductions of dynamic and static volatility interruption (VI) in the Korean stock markets. The Korea Exchange introduced VIs to improve price formation, and to limit damage to investors from brief periods of abnormal volatility, for individual stocks. We find that dynamic VI is effective in stabilizing markets and price discovery, while the effect of static VI is limited. The static VI functions similarly to the pre-existing price-limit system; this accounts for its limited incremental benefit.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114353&date=2018-03-01Dr. Julia Fukuyama, Fred Hutchinson Cancer Research Institute, Mar 1
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115319&date=2018-03-01
Abstract: <br />
Studies of the microbiome, the complex communities of bacteria that live in and around us, present interesting statistical problems. In particular, bacteria are best understood as the result of a continuous evolutionary process and methods to analyze data from microbiome studies should use the evolutionary history. Motivated by this example, I describe adaptive gPCA, a method for dimensionality reduction that uses the evolutionary structure as a regularizer and to improve interpretability of the low-dimensional space. I also discuss implications for interpretable supervised learning incorporating both the phylogeny and variable selection.<br />
<br />
Bio: <br />
Julia Fukuyama is currently a postdoctoral research fellow in Computational Biology at the Fred Hutchinson Cancer Research Center. She obtained her PhD in Statistics at Stanford University, where she developed a set of multivariate methods for integrative analysis of abundance and phylogenetic data for the microbiome. Her postdoctoral work has been in computational immunology, focusing in particular on B cell repertoire sequencing. She also holds a BS in Biology from Yale University, which informs her interest in methods that help us make sense of complex, high-dimensional biological data.<br />
<br />
Website: jfukuyama.github.iohttp://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115319&date=2018-03-01WiDS Berkeley, Mar 5
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114272&date=2018-03-05
The UC Berkeley School of Information is excited to partner with Stanford University to bring the Women in Data Science (WiDS) conference to Berkeley, California.<br />
<br />
The Global Women in Data Science (WiDS) Conference is an annual one-day technical conference based at Stanford, which brings together data scientists and professionals in adjacent fields from around the globe to discuss the latest research and applications of data science in a broad set of domains. Participants learn how leading-edge companies are leveraging data science for success and connect with potential mentors, collaborators, and others in the field.<br />
<br />
WiDS Berkeley is a regional event in association with WiDS that will feature live streamed keynotes and technical talks from the WiDS conference at Stanford. Interspersed with the main conference live stream, WiDS Berkeley will feature unique on-location technical vision talks by distinguished speakers from the Bay Area in academia, industry, government and non-profits; panel discussions with female data scientists and researchers in the space of artificial intelligence and deep learning; a student poster session; and networking opportunities throughout the day.<br />
<br />
All genders are invited to participate in the conference, which features exclusively female speakers.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114272&date=2018-03-05Statistics and Data Science: the Prediction and Modeling Cultures, Mar 5
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115731&date=2018-03-05
I recently taught a course entitled "Seminal Papers and Controversies in Statistics", and Leo Breiman's (2001) article "Statistical Modeling: The Two Cultures" was a very popular paper with students. The paper contrasts the machine learning culture, with it's focus on prediction, with more classical parametric modeling approach to statistics. I am more in the parametric modeling camp, but appreciate the prediction perspective as yielding a simple and unified approach to problems in statistics – the overarching objective being to predict the things you don’t know, with appropriate measures of uncertainty. Philosophically I try to follow the "calibrated Bayes" perspective of Box and Rubin. I discuss this viewpoint, tying it to other seminal papers in my course and two recent applications to missing data and causal inference.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115731&date=2018-03-05An almost-linear time algorithm for uniform random spanning tree generation, Mar 7
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115510&date=2018-03-07
We give an m^{1+o(1)} beta^{o(1)}-time algorithm for generating uniformly random spanning trees in weighted graphs with max-to-min weight ratio beta. In the process, we illustrate how fundamental tradeoffs in graph partitioning can be overcome by eliminating vertices from a graph using Schur complements of the associated Laplacian matrix.<br />
<br />
Our starting point is the Aldous-Broder algorithm, which samples a random spanning tree using a random walk. As in prior work, we use fast Laplacian linear system solvers to shortcut the random walk from a vertex v to the boundary of a set of vertices assigned to v called a "shortcutter." We depart from prior work by introducing a new way of employing Laplacian solvers to shortcut the walk. To bound the amount of shortcutting work, we show that most random walk steps occur far away from an unvisited vertex. We apply this observation by charging uses of a shortcutter S to random walk steps in the Schur complement obtained by eliminating all vertices in S that are not assigned to it.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115510&date=2018-03-07Center for Computational Biology Seminar, Mar 7
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115796&date=2018-03-07
Title: Quantifying the evolutionary dynamics of tumor progression and metastasis<br />
<br />
Abstract: Cancer results from the acquisition of somatic alterations in an evolutionary process that typically occurs over many years, much of which is occult. Understanding the evolutionary dynamics that are operative at different stages of progression in individual tumors might inform the earlier detection, diagnosis, and treatment of cancer. Although these processes cannot be directly observed, the resultant spatiotemporal patterns of genetic variation amongst tumor cells encode their evolutionary histories. Whereas it has traditionally been assumed that tumor progression results from ongoing sequential selection for driver mutations that confer a stringent fitness advantage, recently, we described a Big Bang model of tumor evolution, wherein after transformation, the tumor grows as a terminal expansion populated by numerous heterogeneous and effectively equally fit subclones. This new model is compatible with effectively neutral tumor evolution and explains the origins of intra-tumor heterogeneity and the dynamics of tumor growth with implications for earlier detection, treatment resistance and metastasis. Building on these findings, I will discuss the importance of accounting for tumor spatial structure when inferring clonal dynamics and describe an extensible framework to simulate spatial tumor growth under varied levels of selection with implications for defining the mode of evolution in diverse solid tumors. Lastly, I will describe a quantitative framework to infer the timing of metastatic dissemination, revealing fundamentally new insights into this lethal process.<br />
<br />
Bio:<br />
Christina Curtis, PhD, MSc is an Assistant Professor of Medicine and Genetics in the School of Medicine at Stanford University and Co-Director of the MolecularTumor Board at the Stanford Cancer Institute. Trained in molecular and computational biology, Dr. Curtis obtained her PhD from USC with Simon Tavaré and completed postdoctoral training at the University of Cambridge. Her laboratory leverages genome-scale data, coupled with computational modeling<br />
and iterative experimentation in order to define the molecular determinants and evolutionary dynamics of tumor progression towards the development of robust biomarkers. For example, through spatial computational modeling of tumor growth and inference of patient-specific parameters, she and her team have<br />
described a Big Bang model of colorectal tumor growth that challenges the defacto model of sequential clonal evolution with attendant clinical implications. Her research also aims to develop a systematic interpretation of genotype/phenotype associations in cancer and has helped to redefine the molecular map of breast<br />
cancer, revealing novel subgroups with distinct clinical outcomes.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115796&date=2018-03-07Seminar 217, Risk Management: Factor Strategies: Crowding, Capacity and Sources of Active Returns, Mar 8
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116065&date=2018-03-08
We develop a methodology to estimate dynamic factor loadings using cross-sectional risk characteristics, which is especially useful when factor loadings significantly vary over time. In comparison, standard regression approaches assume the factor loadings are constant over a particular window. Applying the methodology to a dataset of U.S.-domiciled mutual funds we distinguish the components of active returns attributable to (1) constant factor exposures, for example, a tilt to value stocks; (2) time-varying factor exposures; and (3) security selection. We find large-cap growth funds tend to be concentrated in two factors, momentum and quality, whereas large-cap blend funds have the most factor diversity. With our approach, we find that common measures to gauge manager skill may be misleading. For example, we find no evidence that active share is associated with larger active returns; rather the opposite is true across the whole sample when controlling for factors such as fund size and fees. We also examine factor crowding in common strategies.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116065&date=2018-03-08Dr. Tal Korem, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Mar 8
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115104&date=2018-03-08
Abstract:<br />
The gut microbiome is an immense microbial ecosystem with unique and diverse metabolic capabilities. In the past decade, it has been associated with multiple chronic and complex diseases, raising great hopes for novel medical advances. But are contemporary microbiome analysis methods useful in a clinical setting? I will present new tools that we developed for the analysis of the gut microbiome that utilize genomic sequencing coverage to yield biological and mechanistic insights about the microbiome in the context of health and disease. I will also present our own clinical research, in which we used microbiome analysis tools along with clinical, lifestyle and nutritional data to tackle post-meal blood glucose levels, an important risk factor for metabolic diseases such as obesity and diabetes. Our research suggests that general dietary recommendations have limited efficacy in these diseases due to high variability in the responses of different people to identical foods. Demonstrating an approach to solving this problem, we devised a machine learning algorithm based on microbiome and clinical data that accurately predicts personalized blood glucose responses to real-life, complex meals, and demonstrated that personally tailored diets based on these predictions can successfully reduce hyperglycemia. <br />
<br />
Bio:<br />
Tal Korem is a postdoctoral fellow in the group of Prof. Eran Segal at the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science. His research focuses on designing tools for the analysis of the vast microbial ecosystem of the gut microbiome, and applying these tools in clinical settings in order to understand the relationship between nutrition, health, and gut microbes in humans. This is done by analyzing data collected on large human cohorts, with the aim of developing personalized nutrition and precision medicine. <br />
<br />
Tal has coauthored several publications in the field of microbiome and nutritional research, linking the microbiome to the effects of artificial sweeteners (Suez et al., Nature, 2014) and host circadian rhythm (Thaiss et al., Cell 2016), inferring bacterial growth dynamics (Korem et al., Science, 2015) and predicting the glycemic responses of individuals to complex meals (Zeevi et al., Cell, 2015, Korem et al., Cell Metab. 2017).http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115104&date=2018-03-08Random walk on the Heisenberg group, Mar 14
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115433&date=2018-03-14
The Heisenberg group ( 3 by 3 upper-triangular matrices with entries in a ring) is a venerable mathematical object. Simple random walk picks one of the bottom two rows at random and adds or substracts it from the row above.<br />
I will use Fourier analysis to get sharp results about the long term behavior. For matrix entries in integers mod n, the walk converges to uniform after order n squared steps. The Fourier arguments<br />
connect to Harper's operator, Hofstadter's butterfly, and the Fast Fourier Transform. For integer entries there is a kind of local CLT with connections to Gowers' higher Fourier analysis, and Levy's Brownian area.<br />
I will explain all this for a non-specialist audience. This is joint work with Dan Bump, Bob Hough and Harold Widom.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=115433&date=2018-03-14Mediation analysis for count and zero-inflated count data, Mar 14
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116067&date=2018-03-14
In health studies, the outcome is often a count or zero-inflated (ZI) count such as the number of decayed, missing and filled teeth (dmft) or surfaces (dmfs); many subjects have zeros because they have not had any cavities. To aid in understanding the underlying mechanisms of diseases and treatments, we developed a series of statistical methods for mediation analyses specifically for count or ZI count outcomes. Existing mediation analysis approaches for count and ZI count data often assume sequential ignorability of the mediator, which is often not plausible in health research because the mediator is not randomized by researchers. We defined relevant direct and mediation effects for count and ZI count data, and developed causal mediation methods based on an instrumental variable (IV) approach. The new method does not require a parametric distribution assumption on the outcome variable or ignorability of the mediator. Sensitivity analyses were developed to see how results will change if assumptions of the method are violated. Our method was applied to a randomized dental caries prevention trial.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116067&date=2018-03-14Seminar 217, Risk Management: A Credit Risk Framework With Jumps and Stochastic Volatility, Mar 15
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114354&date=2018-03-15
The jump threshold perspective is a view of credit risk in which the event of default corresponds to the first time a stock's log price experiences a downward jump exceeding a certain threshold size. We will describe and motivate this perspective and show that we may obtain explicit formulas for default probabilities and credit default swaps, even when the stock has stochastic volatility, the interest rate is stochastic, and the default threshold is a non-constant stochastic process. This talk is based on joint work with Pierre Garreau and Chun-Yuan Chiu.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=114354&date=2018-03-15BLISS Seminar: Queues, Balls and Bins, and Association, Mar 19
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116306&date=2018-03-19
We consider a problem motivated by file retrieval in cloud computing systems where storage coding is used. In such problems, each file-retrieval job consists of multiple tasks (each corresponding to the retrieval of a coded chunk of a file), and the job is completed only when all of its tasks are completed. The goal is to compute the tail probability of the job completion time. However, this is a difficult problem whereas the problem of computing tail probabilities of task completion times is relatively easy. We will show that, by assuming that the task completion times are independent, one can compute an upper bound on the tail probability of the job completion time. The result is obtained by proving that the task completions times at the various servers in the cloud system are associated. The key step in the proof can be easily understood by considering a corresponding balls-and-bins problem as we will illustrate in the talk. Joint work with Weina Wang, Mor Harchol-Balter, Haotian Jiang, and Alan Scheller-Wolf.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116306&date=2018-03-19Formation of large-scale random structure by competitive erosion, Mar 21
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116303&date=2018-03-21
Competitive erosion models a random interface sustained in equilibrium by equal and opposite<br />
pressures on each side of the interface. Here we study the following one dimensional<br />
version. Begin with all sites of Z uncolored. A blue particle performs simple random walk<br />
from 0 until it reaches a nonzero red or uncolored site, and turns that site blue; then, a red<br />
particle performs simple random walk from 0 until it reaches a nonzero blue or uncolored<br />
site, and turns that site red. We prove that after n blue and n red particles alternately perform<br />
such walks, the total number of colored sites is of order n^1/4<br />
The resulting random color configuration has a certain fractal nature which after scaling by n^1/4 and taking a<br />
limit, has an explicit description in terms of alternating extrema of Brownian motions.<br />
This is joint work with Shirshendu Ganguly and Lionel Levine.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116303&date=2018-03-21A Unified Theory of Regression Adjustment for Design-based Inference, Mar 21
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116354&date=2018-03-21
Under the Neyman causal model, a well-known result is that OLS with treatment-by-covariate interactions cannot harm asymptotic precision of estimated treatment effects in completely randomized experiments. But do such guarantees extend to experiments with more complex designs? This paper proposes a general framework for addressing this question and defines a class of generalized regression estimators that are applicable to experiments of any design. The class subsumes common estimators (e.g., OLS). Within that class, two novel estimators are proposed that are applicable to arbitrary designs and asymptotically optimal. The first is composed of three Horvitz-Thompson estimators. The second recursively applies the principle of generalized regression estimation to obtain regression-adjusted regression adjustment. A simulation study illustrates that the latter can be superior to alternatives in finite samples.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=116354&date=2018-03-21