Eric J. Tchetgen Tchetgen
Eric J. Tchetgen Tchetgen, Wharton, University of Pennsylvania
While model selection is a well-studied topic in parametric and nonparametric regression and density estimation, model selection of possibly high dimensional nuisance parameters in semiparametric problems is far less developed. In this talk, we propose a new model selection framework for making inferences about a finite dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function. The class of such doubly robust functionals is quite large, and includes estimation of pathwise differentiable functionals when data are missing at random and in causal inference problems under unconfoundedness conditions. Under double robustness, the estimated functional should incur no bias if either of two nuisance parameters is evaluated at the truth while the other spans a large collection of possibly incorrect candidate models. We introduce a new minimax criterion based on a certain pseudo-risk for the functional of primary interest that embodies this double robustness property and thus may be used to select the candidate model that is nearest to fulfilling this property even when all models are wrong. We establish an oracle property for a multi-fold cross-validation scheme of the new model selection criterion which states that our empirical criterion performs nearly as well as that of an oracle with a priori knowledge of the pseudo-risk for each candidate model. We also describe a smooth approximation to the selection criterion which allows for valid post-selection inference. Finally, we apply the approach to perform model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learning methods to account for confounding in a study of right heart catheterization in the initial care unit of critically ill patients.
This is joint work with Yifan Cui.