Working papers
Potential weights and implicit causal designs in linear regression
I introduce a simple and generic diagnostic for design-based causal interpretation of regression estimands.
Abstract
When do linear regressions estimate causal effects in quasi-experiments? This paper provides a generic diagnostic that assesses whether a given linear regression specification on a given dataset admits a design-based interpretation. To do so, we define a notion of potential weights, which encode counterfactual decisions a given regression makes to unobserved potential outcomes. If the specification does admit such an interpretation, this diagnostic can find a vector of unit-level treatment assignment probabilities---which we call an implicit design---under which the regression estimates a causal effect. This diagnostic also finds the implicit causal effect estimand. Knowing the implicit design and estimand adds transparency, leads to further sanity checks, and opens the door to design-based statistical inference. When applied to regression specifications studied in the causal inference literature, our framework recovers and extends existing theoretical results. When applied to widely-used specifications not covered by existing causal inference literature, our framework generates new theoretical insights.Empirical Bayes When Estimation Precision Predicts Parameters
[Revision requested by Econometrica] [Latest version] [arXiv] [R package] [Slides]
I introduce a new empirical Bayes shrinkage method in the normal location model $Y_i \mid \theta_i, \sigma_i \sim N(\theta_i, \sigma_i^2)$. I do not assume $\theta_i$ is independent from $\sigma_i$. I find that my proposed method selects much higher mobility tracts on average than the standard empirical Bayes shrinkage procedure.
Abstract
Empirical Bayes methods usually maintain a prior independence assumption: The unknown parameters of interest are independent from the known standard errors of the estimates. This assumption is often theoretically questionable and empirically rejected. This paper instead models the conditional distribution of the parameter given the standard errors as a flexibly parametrized family of distributions, leading to a family of methods that we call CLOSE. This paper establishes that (i) CLOSE is rate-optimal for squared error Bayes regret, (ii) squared error regret control is sufficient for an important class of economic decision problems, and (iii) CLOSE is worst-case robust when our assumption on the conditional distribution is misspecified. Empirically, using CLOSE leads to sizable gains for selecting high-mobility Census tracts. Census tracts selected by CLOSE are substantially more mobile on average than those selected by the standard shrinkage method.Optimal Conditional Inference in Adaptive Experiments, with Isaiah Andrews
We study batched bandit experiments and identify a small free lunch for adaptive inference. For $ \varepsilon $-greedy-type experiments, we characterize the optimal conditional inference procedure given history of bandit assignment probabilities.
Abstract
We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.Mean-variance constrained priors have finite maximum Bayes risk in the normal location model
I partially answer my own question on StackExchange. I show that the maximum squared error Bayes risk of a misspecified prior (with correctly specified mean and variance, normalized to zero and one) is at most 535. I think the correct value is 2.
Abstract
Consider a normal location model $X \mid \theta \sim N(\theta, \sigma^2)$ with known $\sigma^2$. Suppose $\theta \sim G_0$, where the prior $G_0$ has zero mean and unit variance. Let $G_1$ be a possibly misspecified prior with zero mean and unit variance. We show that the squared error Bayes risk of the posterior mean under $G_1$ is bounded, uniformly over $G_0, G_1, \sigma^2 > 0$.Nonparametric Treatment Effect Identification in School Choice
[Revision requested by Journal of Econometrics] [arXiv] [Twitter TL;DR]
I characterize the effect of treatment effect heterogeneity in settings where school choice mechanisms are used to estimate causal effect of schools.
Abstract
This paper studies nonparametric identification and estimation of causal effects in centralized school assignment. In many centralized assignment settings, students are subjected to both lottery-driven variation and regression discontinuity (RD) driven variation. We characterize the full set of identified atomic treatment effects (aTEs), defined as the conditional average treatment effect between a pair of schools, given student characteristics. Atomic treatment effects are the building blocks of more aggregated notions of treatment contrasts, and common approaches estimating aggregations of aTEs can mask important heterogeneity. In particular, many aggregations of aTEs put zero weight on aTEs driven by RD variation, and estimators of such aggregations put asymptotically vanishing weight on the RD-driven aTEs. We develop a diagnostic tool for empirically assessing the weight put on aTEs driven by RD variation. Lastly, we provide estimators and accompanying asymptotic results for inference on aggregations of RD-driven aTEs.Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models, with Daniel Chen and Greg Lewis
[Accepted at NeurIPS 2020 Workshop on Machine Learning for Economic Policy] [arXiv] [Twitter TL;DR]
We consider using machine learning to estimate the first stage in linear instrumental variables.
Abstract
We provide some simple theoretical results that justify incorporating machine learning in a standard linear instrumental variable setting, prevalent in empirical research in economics. Machine learning techniques, combined with sample-splitting, extract nonlinear variation in the instrument that may dramatically improve estimation precision and robustness by boosting instrument strength. The analysis is straightforward in the absence of covariates. The presence of linearly included exogenous covariates complicates identification, as the researcher would like to prevent nonlinearities in the covariates from providing the identifying variation. Our procedure can be effectively adapted to account for this complication, based on an argument by Chamberlain (1992). Our method preserves standard intuitions and interpretations of linear instrumental variable methods and provides a simple, user-friendly upgrade to the applied economics toolbox. We illustrate our method with an example in law and criminal justice, examining the causal effect of appellate court reversals on district court sentencing decisions.Github Gist
Causal Inference and Matching Markets
Undergraduate thesis advised by Scott Duke Kominers and David C. Parkes. Awarded the Thomas T. Hoopes Prize at Harvard College.
[Simulable Mechanisms] [Cutoff Mechanisms] [Regression discontinuity with endogenous cutoff]