Jiafeng (Kevin) Chen

I am an Assistant Professor of Economics at Stanford University. My research interests lie broadly in theoretical and applied econometrics, with a focus on causal inference, misspecification, and statistical decision theory.

I look forward to working with Stanford students and postdocs. Please don’t hesitate to reach out: My email is jiafeng [atsign] stanford dot edu

Previously, I was a postdoc at SIEPR. I obtained my Ph.D. in Business Economics in 2024 at Harvard University. I obtained my A.B. and S.M. degrees summa cum laude in Applied Mathematics from Harvard College in 2019. My Ph.D. research is supervised by Isaiah Andrews, Elie Tamer, Jesse Shapiro, and Edward Glaeser. Previously, I worked at Microsoft Research New England with Greg Lewis and at QuantCo.

I publish under my Chinese name, Jiafeng Chen (pronunciation); I go by Kevin.

Working Papers

Nonparametric Identification of Demand without Exogenous Product Characteristics, with Kirill Borusyak, Peter Hull, and Lihua Lei

[Lastest version] [arXiv]

We show that $J$-dimensional exogenous variation in prices is sufficient for identifying counterfactuals in prices nonparametrically with recentered instruments.

Abstract

We study the identification of differentiated product demand with exogenous supply-side instruments, allowing product characteristics to be endogenous. Past analyses have argued that exogenous characteristic-based instruments are essentially necessary given a sufficiently flexible demand model with a suitable index restriction. We show, however, that price counterfactuals are nonparametrically identified by recentered instruments---which combine exogenous shocks to prices with endogenous product characteristics---under a weaker index restriction and a new condition we term faithfulness. We argue that faithfulness, like the usual completeness condition for nonparametric identification with instruments, can be viewed as a technical requirement on the richness of identifying variation rather than a substantive economic restriction, and we show that it holds under a variety of non-nested conditions on either price-setting or the index.

Compound Selection Decisions: An Almost SURE Approach, with Lihua Lei, Timothy Sudijono, Liyang (Sophie) Sun, and Tian Xie

[Latest version] [arXiv]

We extend Stein’s unbiased risk estimate (SURE) to compound selection decisions.

Abstract

This paper proposes methods for producing compound selection decisions in a Gaussian sequence model. Given unknown, fixed parameters $\mu_ {1:n}$ and known $\sigma_{1:n}$ with observations $Y_i \sim N(\mu_i, \sigma_i^2)$, the decision maker would like to select a subset of indices $S$ so as to maximize utility $\frac{1}{n}\sum_{i\in S} (\mu_i - K_i)$, for known costs $K_i$. Inspired by Stein's unbiased risk estimate (SURE), we introduce an almost unbiased estimator, called ASSURE, for the expected utility of a proposed decision rule. ASSURE allows a user to choose a welfare-maximizing rule from a pre-specified class by optimizing the estimated welfare, thereby producing selection decisions that borrow strength across noisy estimates. We show that ASSURE produces decision rules that are asymptotically no worse than the optimal but infeasible decision rule in the pre-specified class. We apply ASSURE to the selection of Census tracts for economic opportunity, the identification of discriminating firms, and the analysis of $p$-value decision procedures in A/B testing.

Testing Monotonicity in a Finite Population, with Jonathan Roth and Jann Spiess

[Latest version]

The standard definition of identifiability in design-based, finite-population settings suggests that the data are more informative than they actually are. We illustrate with inference on whether all treatment effects are of the same sign.

Abstract

We consider the extent to which we can learn from a completely randomized experiment whether everyone has treatment effects that are weakly of the same sign, a condition we call monotonicity. From a classical sampling perspective, it is well-known that monotonicity is untestable. By contrast, we show from the design-based perspective -- in which the units in the population are fixed and only treatment assignment is stochastic -- that the distribution of treatment effects in the finite population (and hence whether monotonicity holds) is formally identified. We argue, however, that the usual definition of identification is unnatural in the design-based setting because it imagines knowing the distribution of outcomes over different treatment assignments for the same units. We thus evaluate the informativeness of the data by the extent to which it enables frequentist testing and Bayesian updating. We show that frequentist tests can have nontrivial power against some alternatives, but power is generically limited. Likewise, we show that there exist (non-degenerate) Bayesian priors that never update about whether monotonicity holds. We conclude that, despite the formal identification result, the ability to learn about monotonicity from data in practice is severely limited.

Certified Decisions, with Isaiah Andrews

[Latest version] [arXiv]

We connect statistical inference with statistical decisions by thinking of inference as providing guarantees for decisions. This turns out to be essentially without loss—certified decisions implicitly conduct inference. Such certified decisions allow downstream decision-makers safety guarantees.

Abstract

Hypothesis tests and confidence intervals are ubiquitous in empirical research, yet their connection to subsequent decision-making is often unclear. We develop a theory of certified decisions that pairs recommended decisions with inferential guarantees. Specifically, we attach _P-certificates_---upper bounds on loss that hold with probability at least $1-\alpha$---to recommended actions. We show that such certificates allow "safe," risk-controlling adoption decisions for ambiguity-averse downstream decision-makers. We further prove that it is without loss to limit attention to P-certificates arising as minimax decisions over confidence sets, or what Manski (2021) terms "as-if decisions with a set estimate." A parallel argument applies to E-certified decisions obtained from e-values in settings with unbounded loss.

Reinterpreting demand estimation

[Latest version] [arXiv]

I translate two models of demand estimation (Berry and Haile, 2014, 2024) to the Neyman–Rubin model and show a Vytlacil (2002)-style equivalence result.

Abstract

This paper bridges the demand estimation and causal inference literatures by interpreting nonparametric structural assumptions as restrictions on counterfactual outcomes. It offers nontrivial and equivalent restatements of key demand estimation assumptions in the Neyman-Rubin potential outcomes model, for both settings with market-level data (Berry and Haile, 2014) and settings with demographic-specific market shares (Berry and Haile, 2024). The reformulation highlights a latent homogeneity assumption underlying structural demand models: The relationship between counterfactual outcomes is assumed to be identical across markets. This assumption is strong, but necessary for identification of market-level counterfactuals. Viewing structural demand models as misspecified but approximately correct reveals a tradeoff between specification flexibility and robustness to latent homogeneity.

Empirical Bayes shrinkage (mostly) does not correct the measurement error in regression, with Jiaying Gu and Soonwoo Kwon

[Latest version] [arXiv]

Abstract

In the value-added literature, it is often claimed that regressing on empirical Bayes shrinkage estimates corrects for the measurement error problem in linear regression. We clarify the conditions needed; we argue that these conditions are stronger than the those needed for classical measurement error correction, which we advocate for instead. Moreover, we show that the classical estimator cannot be improved without stronger assumptions. We extend these results to regressions on nonlinear transformations of the latent attribute and find generically slow minimax estimation rates.

Potential weights and implicit causal designs in linear regression

[Latest version] [arXiv]

I introduce a simple and generic diagnostic for design-based causal interpretation of regression estimands.

Abstract

When we interpret linear regression as estimating causal effects justified by quasi-experimental treatment variation, what do we mean? This paper characterizes the necessary implications when linear regressions are interpreted causally. A minimal requirement for causal interpretation is that the regression estimates some contrast of individual potential outcomes under the true treatment assignment process. This requirement implies linear restrictions on the true distribution of treatment. Solving these linear restrictions leads to a set of implicit designs. Implicit designs are plausible candidates for the true design if the regression were to be causal. The implicit designs serve as a framework that unifies and extends existing theoretical results across starkly distinct settings (including multiple treatment, panel, and instrumental variables). They lead to new theoretical insights for widely used but less understood specifications.

Optimal Conditional Inference in Adaptive Experiments, with Isaiah Andrews

[arXiv]

We study batched bandit experiments and identify a small free lunch for adaptive inference. For $ \varepsilon $-greedy-type experiments, we characterize the optimal conditional inference procedure given history of bandit assignment probabilities.

Abstract

We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.

Resting / subsumed papers

Mean-variance constrained priors have finite maximum Bayes risk in the normal location model

[arXiv]

Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models, with Daniel Chen and Greg Lewis

[Accepted at NeurIPS 2020 Workshop on Machine Learning for Economic Policy] [arXiv]

Causal Inference and Matching Markets

Undergraduate thesis advised by Scott Duke Kominers and David C. Parkes. Awarded the Thomas T. Hoopes Prize at Harvard College.

[Simulable Mechanisms] [Cutoff Mechanisms] [Regression discontinuity with endogenous cutoff]

Publications

Empirical Bayes When Estimation Precision Predicts Parameters

[2026, Econometrica (Lead article)] [Paper (final manuscript)] [arXiv] [R package] [Slides]

I introduce a new empirical Bayes shrinkage method in the normal location model $Y_i \mid \theta_i, \sigma_i \sim N(\theta_i, \sigma_i^2)$. I do not assume $\theta_i$ is independent from $\sigma_i$. I find that my proposed method selects much higher mobility tracts on average than the standard empirical Bayes shrinkage procedure.

Abstract

Empirical Bayes methods usually maintain a prior independence assumption: The unknown parameters of interest are independent from the known standard errors of the estimates. This assumption is often theoretically questionable and empirically rejected. This paper instead models the conditional distribution of the parameter given the standard errors as a flexibly parametrized family of distributions, leading to a family of methods that we call CLOSE. This paper establishes that (i) CLOSE is rate-optimal for squared error Bayes regret, (ii) squared error regret control is sufficient for an important class of economic decision problems, and (iii) CLOSE is worst-case robust when our assumption on the conditional distribution is misspecified. Empirically, using CLOSE leads to sizable gains for selecting high-mobility Census tracts. Census tracts selected by CLOSE are substantially more mobile on average than those selected by the standard shrinkage method.

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification, with Isaiah Andrews and Otavio Tecchio

[Forthcoming, 2025 ESWC Monograph] [Latest version] [arXiv]

We review results on estimation under model misspecification and provide a new result on the interpretation of J-statistics. Invited contribution for the 2025 World Congress of the Econometric Society.

Abstract

In over-identified models, misspecification---the norm rather than exception---fundamentally changes what estimators estimate. Different estimators imply different estimands rather than different efficiency for the same target. A review of recent applications of generalized method of moments in the American Economic Review suggests widespread acceptance of this fact: There is little formal specification testing and widespread use of estimators that would be inefficient were the model correct, including the use of "hand-selected" moments and weighting matrices. Motivated by these observations, we review and synthesize recent results on estimation under model misspecification, providing guidelines for transparent and robust empirical research. We also provide a new theoretical result, showing that Hansen's J-statistic measures, asymptotically, the range of estimates achievable at a given standard error. Given the widespread use of inefficient estimators and the resulting researcher degrees of freedom, we thus particularly recommend the broader reporting of J-statistics.

Nonparametric Treatment Effect Identification in School Choice

[2026, Journal of Econometrics] [arXiv]

I characterize the effect of treatment effect heterogeneity in settings where school choice mechanisms are used to estimate causal effect of schools.

Abstract

This paper studies nonparametric identification and estimation of causal effects in centralized school assignment. In many centralized assignment algorithms, students are subjected to both lottery-driven variation and regression discontinuity (RD) driven variation. We characterize the full set of identified atomic treatment effects (aTEs), defined as the conditional average treatment effect between a pair of schools, given student characteristics. Atomic treatment effects are the building blocks of more aggregated notions of treatment contrasts, and common approaches to estimating aggregations of aTEs can mask important heterogeneity. In particular, many aggregations of aTEs put zero weight on aTEs driven by RD variation, and estimators of such aggregations put asymptotically vanishing weight on the RD-driven aTEs. We provide a diagnostic and recommend new aggregation schemes. Lastly, we provide estimators and accompanying asymptotic results for inference for those aggregations.

Logs with zeros? Some problems and solutions, with Jonathan Roth

[2024, Quarterly Journal of Economics] [Paper (final manuscript)] [arXiv] [Development Impact Blog]

It turns out that one can’t define away the $\log(0)$ problem. Fundamentally, there is a trilemma on defining scale-invariant average effects. We discuss a few fixes.

Abstract

Many economic settings involve an outcome $Y$ that is weakly positive but can equal zero (e.g. earnings). In such settings, it is common to estimate an average treatment effect (ATE) for a transformation of the outcome that behaves like $\log(Y)$ when $Y$ is large but is defined at zero (e.g. $\log(1+Y)$, $\mathrm{arcsinh}(Y)$). This paper argues that ATEs for such log-like transformations should not be interpreted as approximating a percentage effect, since unlike a percentage, they depend arbitrarily on the units of the outcome when the treatment affects the extensive margin. Intuitively, this dependence arises because an individual-level percentage effect is not well-defined for individuals whose outcome changes from zero to non-zero when receiving treatment, and the units of the outcome implicitly determine how much weight the ATE places on such extensive margin changes. We further establish that when the outcome can equal zero, there is no treatment effect parameter that is an average of individual-level treatment effects, unit-invariant, and point-identified. We discuss a variety of alternative approaches that may be sensible in settings with an intensive and extensive margin, including (i) expressing the ATE in levels as a percentage (e.g. using Poisson regression), (ii) explicitly calibrating the value placed on the intensive and extensive margins, and (iii) estimating separate effects for the two margins (e.g. using Lee bounds). We illustrate these approaches in three empirical applications.

Semiparametric Estimation of Long-Term Treatment Effects, with David M. Ritzwoller

[2023, Journal of Econometrics] [arXiv] [Code]

We compute the semiparametric efficiency bounds for two models of long-term treatment effects and introduce the accompanying double/debiased machine learning estimators as well as sieve two-step estimators. Simulation evidence shows that our estimation strategies improve bias and variance properties.

Abstract

Long-term outcomes of experimental evaluations are necessarily observed after long delays. We develop semiparametric methods for combining the short-term outcomes of an experimental evaluation with observational measurements of the joint distribution of short-term and long-term outcomes to estimate long-term treatment effects. We characterize semiparametric efficiency bounds for estimation of the average effect of a treatment on a long-term outcome in several instances of this problem. These calculations facilitate the construction of semiparametrically efficient estimators. The finite-sample performance of these estimators is analyzed with a simulation calibrated to a randomized evaluation of the long-term effects of a poverty alleviation program.

Synthetic Control As Online Linear Regression

[2023, Econometrica] [arXiv] [NBER SI 2022 Labor Studies Method Session]

It turns out that synthetic control has a connection with online convex optimization, which I use to derive novel guarantees.

Abstract

This paper notes a simple connection between synthetic control and online learning. Specifically, we recognize synthetic control as an instance of Follow‐The‐Leader (FTL). Standard results in online convex optimization then imply that, even when outcomes are chosen by an adversary, synthetic control predictions of counterfactual outcomes for the treated unit perform almost as well as an oracle weighted average of control units' outcomes. Synthetic control on differenced data performs almost as well as oracle weighted difference‐in‐differences, potentially making it an attractive choice in practice. We argue that this observation further supports the use of synthetic control estimators in comparative case studies.

Efficient estimation of average derivatives in NPIV models: Simulation comparisons of neural network estimators, with Xiaohong Chen and Elie Tamer

[2023, Journal of Econometrics] [arXiv]

We conduct a large Monte Carlo study on using neural networks to estimate models of nonparametric instrumental variables.

Abstract

Artificial Neural Networks (ANNs) can be viewed as nonlinear sieves that can approximate complex functions of high dimensional variables more effectively than linear sieves. We investigate the performance of various ANNs in nonparametric instrumental variables (NPIV) models of moderately high dimensional covariates that are relevant to empirical economics. We present two efficient procedures for estimation and inference on a weighted average derivative (WAD): an orthogonalized plug-in with optimally-weighted sieve minimum distance (OP-OSMD) procedure and a sieve efficient score (ES) procedure. Both estimators for WAD use ANN sieves to approximate the unknown NPIV function and are $\sqrt{n}$-asymptotically normal and first-order equivalent. We provide a detailed practitioner’s recipe for implementing both efficient procedures. We compare their finite-sample performances in various simulation designs that involve smooth NPIV function of up to 13 continuous covariates, different nonlinearities and covariate correlations. Some Monte Carlo findings include: (1) tuning and optimization are more delicate in ANN estimation; (2) given proper tuning, both ANN estimators with various architectures can perform well; (3) easier to tune ANN OP-OSMD estimators than ANN ES estimators; (4) stable inferences are more difficult to achieve with ANN (than spline) estimators; (5) there are gaps between current implementations and approximation theories. Finally, we apply ANN NPIV to estimate average partial derivatives in two empirical demand examples with multivariate covariates.

JUE Insight: The (Non-) Effect of Opportunity Zones on Housing Prices, with Edward L. Glaeser and David Wessel

[2022, Journal of Urban Economics] [NBER Working Paper] [Replication files] [Updated Working Paper (updated data)] [Bloomberg] [Brookings]

We rule out large immediate price effects on residential real estate from the Opportunity Zone program.

Abstract

Will the Opportunity Zones (OZ) program, America’s largest new place-based policy in decades, generate neighborhood change? We compare single-family housing price growth in OZs with price growth in areas that were eligible but not included in the program. We also compare OZs to their nearest geographic neighbors. Our most credible estimates rule out price impacts greater than 0.5 percentage points with 95% confidence, suggesting that, so far, home buyers don’t believe that this subsidy will generate major neighborhood change. OZ status reduces prices in areas with little employment, perhaps because buyers think that subsidizing new investment will increase housing supply. Mixed evidence suggests that OZs may have increased residential permitting.

Auctioneers Sometimes Prefer Entry Fees to Extra Bidders, with Scott Duke Kominers

[2021, International Journal of Industrial Organization (EARIE Special Issue)]

Auctioneers can profit from entry fees, even though they create a thin market by doing so.

Abstract

We investigate a market thickness–market power tradeoff in an auction setting with endogenous entry. We find that charging admission fees can sometimes dominate the benefit of recruiting additional bidders, even though the fees themselves implicitly reduce competition at the auction stage. We also highlight that admission fees and reserve prices are different instruments in a setting with uncertainty over entry costs, and that optimal mechanisms in such settings may be more complex than simply setting a reserve price. Our results provide a counterpoint to the broad intuition of Bulow and Klemperer (1996) that market thickness often takes precedence over market power in auction design.

A Semantic Approach to Financial Fundamentals with Suproteem K. Sarkar

[ACL 2020, Proceedings of the Second Workshop on Financial Technology and NLP (FinNLP)]

We explore using embeddings from large language models (BERT) for financial applications.

Abstract

The structure and evolution of firms’ operations are essential components of modern financial analyses. Traditional text-based approaches have often used standard statistical learning methods to analyze news and other text relating to firm characteristics, which may shroud key semantic information about firm activity. In this paper, we present the Semantically-Informed Financial Index (SIFI), an approach to modeling firm characteristics and dynamics using embeddings from transformer models. As opposed to previous work that uses similar techniques on news sentiment, our methods directly study the business operations that firms report in filings, which are legally required to be accurate. We develop text-based firm classifications that are more informative about fundamentals per level of granularity than established metrics, and use them to study the interactions between firms and industries. We also characterize a basic model of business operation evolution. Our work aims to contribute to the broader study of how text can provide insight into economic behavior.

Kevin Chen

Working Papers

Resting / subsumed papers

Publications