Dissertation Talk: Optimal Tradeoffs in Modern Hypothesis Testing
Presentation: Dissertation Talk: CS | November 11 | 511 Soda Hall
Maxim Rabinovich, UC Berkeley
Many applications of statistics both in science and in the technology industry come in the form of hypothesis testing, whether executing an A/B test on a product or determining which genes are associated with disease risk. Unfortunately, classical statistical algorithms do not address the requirements of modern applications, where true effects may be very rare and the number of tests in the tens of thousands or more. The field of multiple hypothesis testing has developed to fill this gap, introducing algorithms better suited to large-scale testing scenarios. But these algorithms are challenging to design, and it is not clear how to determine whether those we know can be improved or not. In this work, we place constraints on the performance of any multiple testing algorithm. The constraints come in the form of an optimal tradeoff between the two kinds of errors such an algorithm can make (i.e. false positives and false negatives). In more precise terms, this work provides a framework for establishing the tradeoff between the False Discovery Rate (FDR), a measure of the fraction of effects discovered by the algorithm that turn out not to be real effects, and the False Non-discovery Rate (FNR), a measure of the fraction of true effects the algorithm misses. This framework applies to a wide array of testing models, yields predictions that can be numerically simulated in virtually any model, and can be analytically instantiated in a number of models previously studied in the context of multiple hypothesis testing. In cases where the framework yields analytically tractable bounds, they match the best previous known results established by the speaker and others. The work in this talk is joint with Michael I. Jordan and Martin J. Wainwright.