昨日紹介したInterfluidityへのIndyというコメンターのコメントの中に、「causal density」という言葉があった。ぐぐってみると、Jim ManziがCity Journalに書いた記事を紹介するブログがトップに表示された。そこで、今日はそのManzi記事を簡単に紹介してみる。


In early 2009, the United States was engaged in an intense public debate over a proposed $800 billion stimulus bill designed to boost economic activity through government borrowing and spending. James Buchanan, Edward Prescott, Vernon Smith, and Gary Becker, all Nobel laureates in economics, argued that while the stimulus might be an important emergency measure, it would fail to improve economic performance. Nobel laureates Paul Krugman and Joseph Stiglitz, on the other hand, argued that the stimulus would improve the economy and indeed that it should be bigger. Fierce debates can be found in frontier areas of all the sciences, of course, but this was as if, on the night before the Apollo moon launch, half of the world’s Nobel laureates in physics were asserting that rockets couldn’t reach the moon and the other half were saying that they could. Prior to the launch of the stimulus program, the only thing that anyone could conclude with high confidence was that several Nobelists would be wrong about it.

But the situation was even worse: it was clear that we wouldn’t know which economists were right even after the fact. Suppose that on February 1, 2009, Famous Economist X had predicted: “In two years, unemployment will be about 8 percent if we pass the stimulus bill, but about 10 percent if we don’t.” What do you think would happen when 2011 rolled around and unemployment was still at 10 percent, despite the passage of the bill? It’s a safe bet that Professor X would say something like: “Yes, but other conditions deteriorated faster than anticipated, so if we hadn’t passed the stimulus bill, unemployment would have been more like 12 percent. So I was right: the bill reduced unemployment by about 2 percent.”

Another way of putting the problem is that we have no reliable way to measure counterfactuals—that is, to know what would have happened had we not executed some policy—because so many other factors influence the outcome. This seemingly narrow problem is central to our continuing inability to transform social sciences into actual sciences. Unlike physics or biology, the social sciences have not demonstrated the capacity to produce a substantial body of useful, nonobvious, and reliable predictive rules about what they study—that is, human social behavior, including the impact of proposed government programs.

The missing ingredient is controlled experimentation, which is what allows science positively to settle certain kinds of debates. How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up. Social scientists may make claims as fascinating and counterintuitive as the proposition that a heavy piece of machinery can fly, but these claims are frequently untested by experiment, which means that debates like the one in 2009 will never be settled. For decades to come, we will continue to be lectured by what are, in effect, Keynesian and non-Keynesian economists.

Over many decades, social science has groped toward the goal of applying the experimental method to evaluate its theories for social improvement. Recent developments have made this much more practical, and the experimental revolution is finally reaching social science. The most fundamental lesson that emerges from such experimentation to date is that our scientific ignorance of the human condition remains profound. Despite confidently asserted empirical analysis, persuasive rhetoric, and claims to expertise, very few social-program interventions can be shown in controlled experiments to create real improvement in outcomes of interest.


しかし、状況は実はもっと厄介なものだった。実際に計画が実施された後になっても、どの経済学者が正しかったのか我々には知る術が無いことは明らかだった。2009年2月1日に、有名な経済学者Xが「刺激策の法案を成立させれば2年後の失業率は約8%になっているだろうが、成立させなければ約10%になるだろう」と言ったものとする。法案が成立したにも関わらず、いざ2011年になっても失業率が依然として10%近辺に留まっていたとしたら、どうなるだろうか? 教授Xはまず間違いなく次のようなことを言うだろう:「ああ、しかし他の要因が予想より急激に悪化したのだから、もし刺激策の法案を通していなければ、失業率は12%くらいまで高まっていただろう。従って私は正しかったのだ。法案は失業率を約2%減少させた。」


そこで欠落している要素は、制御された実験である。その実験こそが、科学がある種の議論に確かに決着を付けることを可能ならしめるものなのである。翼に関する物理学の理論が真であることをどうやって確かめるのか? 結局のところ、黒板に書かれた方程式や有名な物理学者の説得力のある講演ではなく、飛行機が飛ぶことで確かめるのである。社会学者たちは、重機械が空を飛べる、という命題と同じくらい魅力的で直観に反する主張を提示することがあるだろうが、そうした主張は実験によって検証できないことが多い。ということは、2009年の時のような議論には決して決着が付くことが無い、ということだ。我々はこれからの数十年も、突き詰めればケインジアンと非ケインジアンのどちらかに帰着する経済学者たちの話を聞かされ続けることになるだろう。



こうした巨大建造物に関しては、兎にも角にも反対を押し切って建設してうまく機能してしまえば、それで議論に決着が付く。しかしながら、経済政策に関しては、実際に実施した後になっても、その効果が本当にあったかどうかで議論がまた延々と続くことになる(eg. ニューディール政策批判やテイラーの財政政策批判)。そこにやはり経済学独特の難しさがあると言えよう。


この後にManziは制御された実験についてより具体的に説明しているが、そのうちの以下の一節で、冒頭で触れた「causal density」という言葉を用いている。

But clinical trials place an enormous burden on being sure that the treatment under evaluation is the only difference between the two groups. And as experiments began to move from fields like classical physics to fields like therapeutic biology, the number and complexity of potential causes of the outcome of interest—what I term “causal density”—rose substantially. It became difficult even to identify, never mind actually hold constant, all these causes. For example, how could an experimenter in 1800, when modern genetics remained undiscovered, possibly ensure that the subjects in the test group had the same genetic predisposition to a disease under study as those in the control group?
しかしながら、臨床実験は、2つのグループの違いが間違いなく評価対象の治療法だけであることに大いに依存している。そして、実験が、古典物理学のような分野から、治療のための生物学のような分野に移るにつれ、関心の対象である結果を生み出す可能性を持つ原因の数と複雑さ――私が「原因密度(causal density)」と呼ぶもの――は飛躍的に増加する。そうしたすべての原因を一定に保つことはおろか、特定することさえ難しくなるのだ。例えば、現代の遺伝学を未だ知らない1800年の実験者が、研究対象となる病気に対する遺伝的条件について、検証グループに属する被験者と比較グループに属する被験者とで同一であることをどうやって保証できようか?

その問題に対する解が、C.S.パース生み出した無作為抽出の手法である。この手法は社会科学にも応用され、「無作為実地試験(randomized field trials(RFTs))」と呼ばれている。


But what do we know from the social-science experiments that we have already conducted? After reviewing experiments not just in criminology but also in welfare-program design, education, and other fields, I propose that three lessons emerge consistently from them.

First, few programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.

Second, within this universe of programs that are far more likely to fail than succeed, programs that try to change people are even more likely to fail than those that try to change incentives. A litany of program ideas designed to push welfare recipients into the workforce failed when tested in those randomized experiments of the welfare-reform era; only adding mandatory work requirements succeeded in moving people from welfare to work in a humane fashion. And mandatory work-requirement programs that emphasize just getting a job are far more effective than those that emphasize skills-building. Similarly, the list of failed attempts to change people to make them less likely to commit crimes is almost endless—prisoner counseling, transitional aid to prisoners, intensive probation, juvenile boot camps—but the only program concept that tentatively demonstrated reductions in crime rates in replicated RFTs was nuisance abatement, which changes the environment in which criminals operate. (This isn’t to say that direct behavior-improvement programs can never work; one well-known program that sends nurses to visit new or expectant mothers seems to have succeeded in improving various social outcomes in replicated independent RFTs.)

And third, there is no magic. Those rare programs that do work usually lead to improvements that are quite modest, compared with the size of the problems they are meant to address or the dreams of advocates.

Experiments are surely changing the way we conduct social science. The number of experiments reported in major social-science journals is growing rapidly across education, criminology, political science, economics, and other areas. In academic economics, several recent Nobel Prizes have been awarded to laboratory experimentalists, and leading indicators of future Nobelists are rife with researchers focused on RFTs.

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.

At the moment, it is certain that we do not have anything remotely approaching a scientific understanding of human society. And the methods of experimental social science are not close to providing one within the foreseeable future. Science may someday allow us to predict human behavior comprehensively and reliably. Until then, we need to keep stumbling forward with trial-and-error learning as best we can.

しかし、我々が既に実施した社会科学の実験からは何が学べたのだろうか? 犯罪学に限らず、社会福祉プログラムの設計、教育、その他諸々の分野の実験を調べた結果として、私は以下の3つの教訓が共通して得られた、と主張したい。







*1:ちなみに近年ではこうした実験を商売として請け負う企業が現われ(ダイレクトメールは青の封筒と白の封筒のどちらが効果的か、2つのウインドウディスプレイのうちどちらが売り上げに貢献するか、等々)、Manziもそうした企業を1999年に設立したとのことである。そうしたビジネスにおいては、実験の質より量で「causal density」の問題をカバーしているとの由。