Marc BellemareがMetrics Mondayで以下のようなことを書いている

Those of us who do applied work for a living will have at some point noticed that, depending on which variables we include in X on the right-hand side (RHS) of an equation like
(1) y = a + bX + cD + e,
the coefficient c on the treatment variable D might go from significant to insignificant or vice versa.
That this is true is the very reason why it is common practice in applied work to present several specifications of equation (1) in the same table, ranging from the most parsimonious (i.e., a regression of y on D alone) to slightly less parsimonious (i.e., a regression of y on D and ever increasing subsets of X) to the least parsimonious (i.e., a regression of y on D and all the controls in X).
The issue of what goes on the RHS of equation (1) is getting a lot of attention in the applied literature. Two prominent examples are Emily Oster’s forthcoming JBES article “Unobserved*1 Selection and Coefficient Stability: Theory and Evidence” and Pei, Pischke, and Schwandt’s (2017) NBER working paper titled “Poorly Measured Confounders are More Useful on the Left than on the Right.”
Oster provides a method to assess just how much coefficient (as in coefficient c in equation 1) stability tells us about selection on unobservables. Pei et al. develop a test of identifying assumptions that treats putative additional controls as dependent variables in equation (1).
I expect both methods to become part of the applied econometrician’s toolkit over the next five to 10 years. At the very least, I expect a bare-bone regression of y on D alone to become something that has to be included in a paper, along with a discussion of why the controls that were included on the RHS of equation (1) were retained for analysis.
 (1) y = a + bX + cD + e
方程式(1)の右辺で何が起きているかは、応用研究分野で多くの注目を集めつつある。2つの傑出した例は、Emily OsterのJBES(Journal of Business & Economic Statistics)掲載予定論文「観測されない選択と係数の安定性:理論と実証」と、 Pei=Pischke=Schwandtの「きちんと表されていない交絡変数は右辺よりも左辺で有用」と題された2017年のNBERワーキングペーパーである。


A common approach to evaluating robustness to omitted variable bias is to observe coefficient movements after inclusion of controls. This is informative only if selection on observables is informative about selection on unobservables. Although this link is known in theory in existing literature, very few empirical articles approach this formally. I develop an extension of the theory that connects bias explicitly to coefficient stability. I show that it is necessary to take into account coefficient and R-squared movements. I develop a formal bounding argument. I show two validation exercises and discuss application to the economics literature. Supplementary materials for this article are available online.


Researchers frequently test identifying assumptions in regression based research designs (which include instrumental variables or difference-in-differences models) by adding additional control variables on the right hand side of the regression. If such additions do not affect the coefficient of interest (much) a study is presumed to be reliable. We caution that such invariance may result from the fact that the observed variables used in such robustness checks are often poor measures of the potential underlying confounders. In this case, a more powerful test of the identifying assumption is to put the variable on the left hand side of the candidate regression. We provide derivations for the estimators and test statistics involved, as well as power calculations, which can help applied researchers interpret their findings. We illustrate these results in the context of various strategies which have been suggested to identify the returns to schooling.

Bellemareはこのほか、Gabriel LenzとAlexander Sahnの「Achieving Statistical Significance with Covariates」という論文も紹介している。以下はその要旨。

An important and understudied area of hidden researcher discretion is the use of covariates. Researchers choose which covariates to include in statistical models and these choices affect the size and statistical significance of estimates reported in studies. How often does the statistical significance of published findings depend on these discretionary choices? The main hurdle to studying this problem is that researchers never know the true model and can always make a case that their choices are most plausible, closest to the true data generating process, or most likely to rule out alternative explanations. We attempt to surmount this hurdle through a meta-analysis of articles published in the American Journal of Political Science (AJPS). In almost 40% of observational studies, we find that researchers achieve conventional levels of statistical significance through covariate adjustments. Although that discretion may be justified, researchers almost never disclose or justify it.
研究者の隠れた裁量についての重要かつあまり研究されていない領域は、共変数の使用である。研究者はどの共変数を統計モデルに入れるかを決め、その選択が、研究結果として報告される推定量の大きさと統計的有意性に影響する。出版された結果の統計的有意性は、どの程度の頻度でそうした裁量的な選択に左右されるのだろうか? この問題を研究する上での最大の障壁は、研究者は決して真のモデルを知ることはできず、自分たちの選択が最も説得力があり、真のデータ生成過程に最も近い、もしくは、他の説明を除外する可能性が最も高い、と主張することが常に可能である、という点にある。我々は、アメリカン・ジャーナル・オブ・ポリティカル・サイエンス(AJPS)に掲載された論文のメタ分析を通じて、この障壁の克服を試みる。観測値を用いた研究の4割近くで、研究者が共変数の調整を通じて統計的有意性の通常の水準を達成していることを我々は見い出した。そうした裁量は正当化されるのかもしれないが、研究者は開示ないし正当化をほぼしていなかった。

なお、後続エントリでBellemareは、インディアナ大のDan Sacksから以下のような指摘が寄せられたことを紹介している

The basic issue is that it seems fine to me if the precision of your coefficient is sensitive to the inclusion of pre-determined covariates, as long as the expected value is not. That is, in such cases it seems fine to emphasize the precisely estimated result.
...the estimated coefficient c on D might or might be statistically significant, depending on what is included in the control vector X. The usual concern in the applied literature—which of course I share completely—is that if we don’t condition on a sufficient set of confounders, then c is estimated with bias. We all want to avoid bias. Bias is about expected values, though, not statistical significance, and it is not obvious to me that we should be worried about models in which including covariates changes the statistical significance (but not the expected value) of the results. Including pre-determined regressors which are uncorrelated with D but (conditionally) correlated with Y will generally reduce var(e), reducing the standard error of c and possibly leading to statistical significance. The fact that our results are only significant if we control for some set of X’s does not necessarily mean that there is bias – only that we might be underpowered without enough controls.