Modern data science is often exploratory in nature, with hundreds or thousands of hypotheses being regularly tested on scientific datasets. The false discovery rate (FDR) has emerged as a dominant error metric in multiple hypothesis testing over the last two decades. I will argue that both (a) the FDR error metric, as well as (b) the current framework of multiple testing, where the scientist picks an arbitrary target error level (like 0.05) and the algorithm returns a set of rejected null hypotheses, may be rather inappropriate for exploratory data analysis.

I will show that, luckily, most existing FDR algorithms (BH, STAR, LORD, AdaPT, Knockoffs, and several others) naturally satisfy a more uniform notion of error, yielding simultaneous confidence bands for the false discovery proportion through the entire path of the algorithm. This makes it possible to flip the traditional roles of the algorithm and the scientist, allowing the scientist to make post-hoc decisions after seeing the realization of an algorithm on the data. For example, the scientist can instead achieve an error guarantee for all target error levels simultaneously (and hence for any data-dependent error level). Remarkably, there is a relatively small price for this added flexibility, the analogous guarantees being less than a factor of 2 looser than if the error level was prespecified. The theoretical basis for this advance is founded in the theory of martingales : we move from optional stopping (used in FDR proofs) to optional spotting by proving uniform concentration bounds on relevant exponential supermartingales.

This is joint work with Eugene Katsveich, but this talk will also cover some work with (alphabetically) Rina Barber, Jianbo Chen, Will Fithian, Kevin Jamieson, Michael Jordan, Lihua Lei, Max Rabinovich, Martin Wainwright, Fanny Yang and Tijana Zrnic.