Statistics
http://events.berkeley.edu/index.php/calendar/sn/stat.html
Upcoming EventsSeminar 217, Risk Management: Robust Experimentation in the Continuous Time Bandit Problem, Apr 2
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122093&date=2019-04-02
We consider the experimentation dynamics of a decision maker (DM) in a two-armed bandit setup, where the agent holds ambiguous beliefs regarding the distribution of the return process of one arm and is certain about the other one. The DM entertains Multiplier preferences a la Hansen and Sargent [2001], thus we frame the decision making environment as a two-player differential game against nature in continuous time. We characterize the DM's value function and her optimal experimentation strategy that turns out to follow a cut-off rule with respect to her belief process. The belief threshold for exploring the ambiguous arm is found in closed form and is shown to be increasing with respect to the ambiguity aversion index. We then study the effect of provision of an unambiguous information source about the ambiguous arm. Interestingly, we show that the exploration threshold rises unambiguously as a result of this new information source, thereby leading to more conservatism. This analysis also sheds light on the efficient time to reach for an expert opinion.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122093&date=2019-04-02Tajima coalescent, Apr 2
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124950&date=2019-04-02
In this talk I will present the Tajima coalescent, a model on the ancestral relationships of molecular samples. This model is then used as a prior model on unlabeled genealogies to infer evolutionary parameters with a Bayesian nonparametric method. I will then show that conditionally on observed data and a particular mutation model, the cardinality of the hidden state space of Tajima’s genealogies is exponentially smaller than the cardinality of the hidden state space of Kingman’s genealogies. We estimate the corresponding cardinalities with sequential importance sampling. Finally, I will propose a new distance on unlabeled genealogies that allows us to compare different distributions on unlabeled genealogies to Tajima’s coalescent.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124950&date=2019-04-02Grace-like polynomials and related questions, Apr 3
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124979&date=2019-04-03
We say that the multi-affine polynomial P(z1, . . . , zm, w1, . . . , wn) is Grace-like if it does not vanish when {z1, . . . , zm is separated from {w1, . . . , wn) by a circle in the complex plane. Such polynomials have many unexpected probabilistic properties related to the work of Borcea-Brändén.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124979&date=2019-04-03Statistical and Computational Challenges in Conformational Biology, Apr 4
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125018&date=2019-04-04
Chromatin architecture is critical to numerous cellular processes including gene regulation, while conformational disruption can be oncogenic. Accordingly, discerning chromatin configuration is of basic importance, however, this task is complicated by a number of factors including scale, compaction, dynamics, and inter-cellular variation.<br />
<br />
The recent emergence of a suite of proximity ligation-based assays, notably Hi-C, has transformed conformational biology with, for example, the elicitation of topological and contact domains providing a high resolution view of genome organization. Such conformation capture assays provide proxies for pairwise distances between genomic loci which can be used to infer 3D coordinates, although much downstream analysis bypasses this reconstruction step.<br />
<br />
After demonstrating advantages deriving from obtaining 3D genome reconstructions, in particular from superposing genomic attributes on a reconstruction and identifying extrema (â€™3D hotspotsâ€™) thereof, we showcase methodological challenges surrounding such analyses, as well as advancing a novel reconstruction approach based on principal curves. Open issues highlighted include (i) performing and synthesizing reconstructions from single-cell assays, (ii) devising rotation invariant methods for 3D hotspot detection, (iii) assessing genome reconstruction accuracy, and (iv) averting reconstruction uncertainty by direct integration of Hi-C data and genomic features. By using p-values from (epi)genome wide association studies as the feature the latter approach provides a conformational lens for viewing GWAS findings.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125018&date=2019-04-04CANCELED: Seminar 217, Risk Management: No Seminar, Apr 9
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122094&date=2019-04-09
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122094&date=2019-04-09Bigeodesics in first and last passage percolation, Apr 10
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125095&date=2019-04-10
First and last passage percolation are statistical physics models of<br />
random growth. These models are widely believed to belong to the<br />
Kardar-Parisi-Zhang universality class. I will define these two models<br />
and talk about what it means to be in this universality class. A<br />
longstanding question about these models is whether they have<br />
bi-infinite geodesics. This question is of interest to physicists due<br />
to its connections to the Ising model. I will discuss the recent<br />
progress on this question. This talk is based on joint work with<br />
Daniel Ahlberg, Riddhipratim Basu and Allan Sly.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125095&date=2019-04-10Number Theory Seminar, Apr 10
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125100&date=2019-04-10
The Farey fractions of level $n$ are the set of rationals in $[0,1]$ in lowest terms having denominator at most $n$. It is known that a measure of equally weighted point masses (of total mass 1) on the points of the Farey sequence $F_n$ converges to the uniform distribution on $[0,1]$ as $n$ goes to infinity. The Riemann hypothesis is equivalent to suitably fast rates of convergence (to zero) of certain statistics measuring distance to uniform distribution, given by theorems of Franel (1924) and Landau (1924) . This talk addresses a toy model consisting of unreduced Farey fractions (allowing fractions not in lowest terms) and studies similar statistics.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125100&date=2019-04-10Renewable Estimation and Incremental Inference in Generalized Linear Models with Streaming Data, Apr 10
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124960&date=2019-04-10
I will present a new statistical paradigm for the analysis of streaming data based on renewable estimation and incremental inference in the context of generalized linear models. Our proposed renewable estimation enables us to sequentially update the maximum likelihood estimation and inference with current data and summary statistics of historic data, but with no use of any historic raw data themselves. In the implementation, we design a new data flow, called the Rho architecture to accommodate the data storage of current and historic data, as well as to communicate with the computing layer of the Spark system in order to facilitate sequential learning. We establish both estimation consistency and asymptotic normality for the renewable estimation and incremental inference for regression parameters. We illustrate our methods by numerical examples from both simulation experiments and real-world analysis. This is a joint work with Lan Luo.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124960&date=2019-04-10Seminar 217, Risk Management: Linking 10-K and the GICS - through Experiments of Text Classification and Clustering, Apr 16
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122096&date=2019-04-16
A 10-K is an annual report filed by a publicly traded company about its financial performance and is required by the U.S. Securities and Exchange Commission (SEC). 10-Ks are fairly long and tend to be complicated. But this is one of the most comprehensive and most important documents a public company can publish on a yearly basis. The Global Industry Classification Standard (GICS) is an industry taxonomy developed in 1999 by MSCI and S&P Dow Jones Indices and is designed to classify a company according to its principal business activity. The GICS hierarchy begins with 11 sectors and is followed by 24 industry groups, 68 industries, and 157 sub-industries. We ask two questions: First, can a classifier be trained to recognize a firm's GICS sector based on the textual information in its 10-K? Second, can we extract, from the classifier, embeddings (low dimensional vectors) for 10-Ks that respect their GICS sectors, so firms within the same sector would have embeddings that are close (measured by cosine similarity)? We report on a series of experiments with Convolutional Neural Network (CNN) for text classification, trained on two variants of document representations, one uses pre-trained word vectors, the other is based on the simple bag-of-words model.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122096&date=2019-04-16Conformal embedding and percolation on the uniform triangulation, Apr 17
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125260&date=2019-04-17
Following Smirnov’s proof of Cardy’s formula and Schramm’s discovery of SLE, a thorough understanding of the scaling limit of critical percolation on the regular triangular lattice has been achieved. Smirnorv’s proof in fact gives a discrete approximation of the conformal embedding which we call the Cardy embedding. In this talk I will present a joint project with Nina Holden where we show that the uniform triangulation under the Cardy embedding converges to the Brownian disk under the conformal embedding. Moreover, we prove a quenched scaling limit result for the critical percolation on uniform triangulations. Time permitting, I will also explain how this result fits into the the larger picture of random planar maps and Liouville quantum gravity.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125260&date=2019-04-17From correlation to causation — measuring ad effectiveness at scale, Apr 17
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125235&date=2019-04-17
Everyone has had that one ad for that one pair of shoes seem to follow them everywhere they go on the internet. Why does that happen? Especially if you already bought the shoes? To make sense of this, it's worth understanding how marketers have historically measured ad effectiveness -- and why the problem is harder than it seems. Beyond improvements in measuring ad effectiveness, this talk with dive into the uniquely statistical problems we face in ad tech and some of the ways we are approaching them.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125235&date=2019-04-17BIDS Data Science Lecture: Astrophysical Machine Learning, Apr 18
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124964&date=2019-04-18
From streaming, repeated, noisy, and distorted images of the sky, time-domain astronomers are tasked with extracting novel science as quickly as possible with limited and imperfect information. Employing algorithms developed in other fields, we have has already reached important milestones demonstrating the speed and efficacy of using ML in data and inference workflows. Now we look to innovations in learning architectures and computational approaches that are purpose-built alongside the specific domain questions. I will describe such efforts—developed in the search for Planet 9, new classes of variable sources, and for data-driven emulators—and discuss on-going efforts to imbue physical understanding into the learning process itself.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=124964&date=2019-04-18Seminar 217, Risk Management: CANCELLED, Apr 23
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122095&date=2019-04-23
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122095&date=2019-04-23The topologies of random real algebraic hypersufaces, Apr 24
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125369&date=2019-04-24
The topology of a hyper-surface in P^n(R) <br />
of high degree can be very complicated .However <br />
if we choose the surface at random there is a universal <br />
law . Little is known about this law and it appears <br />
to be dramatically different for n=2 and n>2 .<br />
There is a similar theory for zero sets of monochromatic <br />
waves which model nodal sets .<br />
Joint work with Y.Canzani and I.Wigmanhttp://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125369&date=2019-04-24Cooperating with the Curse of Dimensionality, Apr 25
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125386&date=2019-04-25
The curse of dimensionality arises when analyzing high-dimensional data and non-Euclidean data, such as network data, which are ubiquitous nowadays. It causes counter-intuitive phenomena and makes traditional statistical tools less effective or inapplicable. On the other hand, some counter-intuitive phenomena might be explained by some universal patterns, which could be used to form new effective tools in dealing with high-dimensional/non-Euclidean data. In this talk, one such unique pattern is explored and applied to fundamental statistical tasks, including hypothesis testing and cluster analysis, leading to substantial improvements in conducting these tasks for high-dimensional/non-Euclidean data. Some other related topics will also be briefly discussed.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=125386&date=2019-04-25Seminar 217, Risk Management: The Implication of Information Network in Market Quality and Market Reaction to Public Announcements, Apr 30
http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122097&date=2019-04-30
This research studies the role of information network in market quality and market reaction to public announcements. We propose in this article a three-period rational noisy expected equilibrium model by taking both public and private information into account with an embedded information network structure among market traders. Closed form expressions for market reaction and market quality are derived as a function of topological structure of the network and several novel results are revealed. The trading volume and price change have different responses to network connectedness. As network connectedness increases, there is a downward trend for price change. The downward trend are decreasing which reﬂects that the market eﬃciency can not increase to inﬁnite in reality. However, the change of trading volume is uncertain because it depends on two attributes of the network, the uniformity and connectedness, it is hard to compare which one dominate another one. To the market quality, the information precision can increase market liquidity, market eﬃciency and decrease the cost of capital, network connectedness plays the same role in market eﬃciency and cost of capital, while it has a non-monotone inﬂuence towards market liquidity. And also network will suppress the eﬀect caused by disclosure.http://events.berkeley.edu/index.php/calendar/sn/stat.html?event_ID=122097&date=2019-04-30