Cross-validation with Confidence
Seminar | April 26 | 4-5 p.m. | 1011 Evans Hall
Jing Lei, Department of Statistics, CMU
Cross-validation is one of the most popular model selection methods
in statistics and machine learning. Despite its wide applicability,
traditional cross-validation methods tend to overfit, unless the ratio
between the training and testing sample sizes is very small.
We argue that such an overfitting tendency of cross-validation
is due to the ignorance of the uncertainty in the testing sample.
We develop a new, statistically principled
inference tool based on cross-validation, that takes into account
the uncertainty in the testing sample. Our method
outputs a small set of highly competitive candidate models that contains the best
one with probabilistic guarantees. In particular, our method
leads to consistent model selection in a classical linear regression setting,
for which existing methods require unconventional split ratios.
We demonstrate the performance of the proposed method in simulated and
real data examples.