コロンビア大のW. Bentley MacLeodが「Viewpoint: The Human Capital Approach to Inference」というNBER論文ungated版)を書いている(H/T Francis Diebold)。以下はその要旨。

The purpose of this essay is to discuss the “human capital” approach to inference. Observed decisions by experts can be used to organize data on their decisions using simple machine learning techniques. The fact that the human capital of these experts is heterogeneous implies that errors in decision making are inevitable, which in turn allows us to identify the conditional average treatment effect for a wider class of situations than would be possible with randomized control trials. This point is illustrated with some data from medical decision making in the context of treating depression, heart disease, and adverse childbirth events.

Dieboldは論文について「...using economic theory in combination with machine learning to estimate conditional average treatment effects better than can be done with randomized control trials(経済理論と機械学習を組み合わせてランダム化比較試験よりも条件付き平均治療効果を良く推計)」とまとめている。


The human capital approach to inference can be used in situations where we have a large number of persons to be treated by skilled decision makers. If we were to do a randomized control trial (RCT), then individuals would be randomly allocated to treatment and control, and then we would compare the outcomes. The problem is that in many cases, particularly in medical decision making, the optimal treatment varies with the characteristics of the patient. For example, some individuals face adverse reactions to drugs, and others have a natural immunity to disease, leading to heterogeneous responses to both treatment and placebo. The potential variation is substantial, which is why physicians spend years studying different possible conditions, and associating them with the appropriate treatment.
Let us now suppose that in addition to having a large set of patients, information on their characteristics, treatment, and outcomes, we also have them matched to physicians, with a large number of patients for each physician. We then proceed by using the fact that these physicians are experts, and hence on average their treatments are helpful. Assuming that there are only two treatment choices, A or B, we can use the decisions by the physicians to organize the data by the probability that a physician chooses A for patient i with characteristics xi. This yields a propensity score η(xi). This is a straightforward machine learning exercise – given features xi, what is the likelihood that choice A will be made.
One approach to machine learning would be to stop at this point. Namely, use the data to build a model of how expert physicians make choices. There is a huge literature studying this problem. For example, we can view the recent work to produce self-driving cars as one in which the machine is learning to be as good as a human at such a task. However, we can do a bit more. Once we have the propensity score, then we can proceed, as in Rosenbaum and Rubin (1983), to estimate the effect of choice conditional upon the propensity score. We differ from the standard propensity score approach in two ways. The first, is that we are concerned with the conditional average treatment effect (CATE) - the effect of treatment conditional upon characteristics xi. As individual characteristics change, the optimal choice may change. The hope is that if we make a choice conditional upon the score xi, this can result in better outcomes on average for individuals with this score.
Second, the goal of the propensity score estimator is to provide better control for observable characteristics, and the endogenous selection of individuals based upon their characteristics into treatment. In our case, since we have information on who treats, we can use the fact that human capital is limited, and hence physicians not only make errors, vary in the frequency with which mistakes are made. This allows use to measure the effect of treatment conditional upon patient characteristics, or CATE, and physician identity. We can ask which physicians get better performance, and what are the characteristics of their decisions that achieve better outcomes.
そこで話を終わらせる、というのも機械学習における一つのやり方である。即ち、データを用いて熟練した医者がどのような選択をするか、というモデルを構築するわけだ。この問題を調べた研究は数多あり、例えば自動運転車を生み出そうとする近年の研究は、そうした仕事について機械に人間と同等の仕事をさせようとする研究、と見做すことができる。しかし、もう少し先に進むこともできる。傾向スコアを手に入れたならば、 Rosenbaum and Rubin (1983)のように、傾向スコアの条件付き選択の効果を推計することができる。我々のやり方は標準的な傾向スコアの手法と2つの点において異なっている。第一に、我々は条件付き平均治療効果――特性xiの条件付きの治療効果――に関心がある。個人の特性は変化するので、最適な選択も変化するだろう。スコアxiに基づく条件付き選択を行えば、そのスコアを有する個人について平均的な治療効果が向上することが期待される。


The human capital approach begins with the hypothesis that we can use the decisions of experts to organize individuals into treatment groups that have similar characteristics, and hence the treatment effect within these groups is more homogeneous. Here machine learning techniques can be very useful because of their potential to categorize large amounts of data efficiently.
Second, even though experts are skilled, they necessarily make mistakes. Without mistakes there can be no learning - a randomized control trial is an extreme case of learning by forced randomization over possible treatments..


  • 医療の観点からは、心臓発作の際は侵襲的手技を適用することが常に望ましいとされるが、良い病院の医者ほどそれが適当でない患者、即ち、高齢の患者には侵襲的手技を適用しない。そのことは、医者は単純な医療上の必要性以外の要因を考慮して意思決定を行っている、という仮説と整合的である。
  • 米国で帝王切開が多いのは金銭面のインセンティブのせいとされるが、低リスクの出生についてはそうであったものの、高リスクの出生については帝王切開はむしろ少なすぎた。両グループを平均し、リスクに曝された女性の数を考慮した場合、ニュージャージーの平均的な帝王切開の割合は、医学的に最適な割合を大きく下回った。このことは、平均的な治療効果だけを見ていることの危険性――最適な治療の選択における顕著な不均一性を見逃してしまう――を示す事例となっている。
  • 鬱治療薬についてはランダム化比較試験は非常に不適切、というのは周知の事実。精神病の治療については人的資本手法が使えるほどのデータは無いが、大掛かりなデータ収集作業を行えば話が進展するかも。