The Implications of Sampling-based and Design-based Uncertainty in Regression Analysis

Seminar | March 22 | 4-5 p.m. | 1011 Evans Hall

 Guido Imbens, Stanford Business School

 Department of Statistics

When a researcher estimates the parameters of a regression function, using information on all 50 states in the United States, or information on all visits to a website, what is being estimated, and what is the interpretation of the standard errors? Researchers typically assume the sample is a random sample from a large population of interest, and report standard errors that are designed to capture sampling variation. This is common practice, even in applications where it is difficult to articulate what that population of interest is, and how it differs from the sample. In this article we explore an alternative approach
where the the estimand and the uncertainty are partly design-based, in the sense that some of the regressors can be manipulated so they could have taken on different values from the ones actually observed. We derive standard errors that account for design-based uncertainty instead of, or in addition to, sampling-based uncertainty. We show that our standard errors in general are smaller than the infinite-population sampled-based standard
errors, and provide conditions under which they coincide.