ずっとずっと前にはもうアポロ11号は月に行ったっていうのに…

昨日紹介したInterfluidityへのIndyというコメンターのコメントの中に、「causal density」という言葉があった。ぐぐってみると、Jim ManziがCity Journalに書いた記事を紹介するブログがトップに表示された。そこで、今日はそのManzi記事を簡単に紹介してみる。

まず、その冒頭部。

In early 2009, the United States was engaged in an intense public debate over a proposed $800 billion stimulus bill designed to boost economic activity through government borrowing and spending. James Buchanan, Edward Prescott, Vernon Smith, and Gary Becker, all Nobel laureates in economics, argued that while the stimulus might be an important emergency measure, it would fail to improve economic performance. Nobel laureates Paul Krugman and Joseph Stiglitz, on the other hand, argued that the stimulus would improve the economy and indeed that it should be bigger. Fierce debates can be found in frontier areas of all the sciences, of course, but this was as if, on the night before the Apollo moon launch, half of the world’s Nobel laureates in physics were asserting that rockets couldn’t reach the moon and the other half were saying that they could. Prior to the launch of the stimulus program, the only thing that anyone could conclude with high confidence was that several Nobelists would be wrong about it.
But the situation was even worse: it was clear that we wouldn’t know which economists were right even after the fact. Suppose that on February 1, 2009, Famous Economist X had predicted: “In two years, unemployment will be about 8 percent if we pass the stimulus bill, but about 10 percent if we don’t.” What do you think would happen when 2011 rolled around and unemployment was still at 10 percent, despite the passage of the bill? It’s a safe bet that Professor X would say something like: “Yes, but other conditions deteriorated faster than anticipated, so if we hadn’t passed the stimulus bill, unemployment would have been more like 12 percent. So I was right: the bill reduced unemployment by about 2 percent.”
Another way of putting the problem is that we have no reliable way to measure counterfactuals—that is, to know what would have happened had we not executed some policy—because so many other factors influence the outcome. This seemingly narrow problem is central to our continuing inability to transform social sciences into actual sciences. Unlike physics or biology, the social sciences have not demonstrated the capacity to produce a substantial body of useful, nonobvious, and reliable predictive rules about what they study—that is, human social behavior, including the impact of proposed government programs.
The missing ingredient is controlled experimentation, which is what allows science positively to settle certain kinds of debates. How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up. Social scientists may make claims as fascinating and counterintuitive as the proposition that a heavy piece of machinery can fly, but these claims are frequently untested by experiment, which means that debates like the one in 2009 will never be settled. For decades to come, we will continue to be lectured by what are, in effect, Keynesian and non-Keynesian economists.
Over many decades, social science has groped toward the goal of applying the experimental method to evaluate its theories for social improvement. Recent developments have made this much more practical, and the experimental revolution is finally reaching social science. The most fundamental lesson that emerges from such experimentation to date is that our scientific ignorance of the human condition remains profound. Despite confidently asserted empirical analysis, persuasive rhetoric, and claims to expertise, very few social-program interventions can be shown in controlled experiments to create real improvement in outcomes of interest.

（拙訳）
2009年初め、政府の借り入れと財政支出による経済の活性化を目的とした8000億ドルの景気刺激策の法案を巡り、公の場での激しい議論が米国で巻き起こった。いずれもノーベル経済学賞受賞者であるジェームズ・ブキャナン、エドワード・プレスコット、バーノン・スミス、そしてゲーリー・ベッカーは、その刺激策は重要な非常手段であるかもしれないが、経済のパフォーマンスを改善することには失敗するだろう、と論じた。一方、やはりノーベル経済学賞受賞者であるポール・クルーグマンとジョセフ・スティグリッツは、刺激策は経済の状況を改善するし、実際にはもっと大型の刺激策が必要なのだ、と論じた。科学の先端領域では激しい議論は付き物だが、この議論は、アポロを月に打ち上げる前の晩に、世界のノーベル物理学賞受賞者の半数がロケットは月に到達できないと言い、残りの半数ができると言っているようなものだった。刺激策が実施される前に皆が自信を持って言えたことは、この件に関してノーベル賞受賞者の幾人かの意見が間違っている、ということだけだった。

しかし、状況は実はもっと厄介なものだった。実際に計画が実施された後になっても、どの経済学者が正しかったのか我々には知る術が無いことは明らかだった。2009年2月1日に、有名な経済学者Xが「刺激策の法案を成立させれば2年後の失業率は約8%になっているだろうが、成立させなければ約10%になるだろう」と言ったものとする。法案が成立したにも関わらず、いざ2011年になっても失業率が依然として10%近辺に留まっていたとしたら、どうなるだろうか？　教授Xはまず間違いなく次のようなことを言うだろう：「ああ、しかし他の要因が予想より急激に悪化したのだから、もし刺激策の法案を通していなければ、失業率は12%くらいまで高まっていただろう。従って私は正しかったのだ。法案は失業率を約2%減少させた。」

この問題を別の言い方で表現すると、我々は反実仮想の結果を測定すること――即ち、ある政策が実施されていなければどうなっていたかを知ること――のための信頼できる手法を持ち合わせていない、ということである。というのは、その政策以外にも余りにも多くの要因が結果を左右するからである。これは一見範囲が限定された問題のように見えるが、社会科学を本当の科学に転換することがいつまで経ってもできない主要な理由となっている。物理学や生物学と違い、社会科学は、研究対象――即ち、政府の計画案が人々に与える影響などの、人間の社会的行動――についての有益かつ自明ではない信頼できる予測ルールの十分な体系を生み出す力を誇示することができなかった。

そこで欠落している要素は、制御された実験である。その実験こそが、科学がある種の議論に確かに決着を付けることを可能ならしめるものなのである。翼に関する物理学の理論が真であることをどうやって確かめるのか？　結局のところ、黒板に書かれた方程式や有名な物理学者の説得力のある講演ではなく、飛行機が飛ぶことで確かめるのである。社会学者たちは、重機械が空を飛べる、という命題と同じくらい魅力的で直観に反する主張を提示することがあるだろうが、そうした主張は実験によって検証できないことが多い。ということは、2009年の時のような議論には決して決着が付くことが無い、ということだ。我々はこれからの数十年も、突き詰めればケインジアンと非ケインジアンのどちらかに帰着する経済学者たちの話を聞かされ続けることになるだろう。

過去数十年もの間、社会科学は、社会改善のための理論を評価するのに実験的手法を適用するという目標に向けて模索を続けてきた。最近の進展はその現実的可能性を大きく高め、実験革命は遂に社会科学にも到達しつつある。今までのそうした実験の試みから得られた最も基本的な教訓は、人的条件に関する我々の科学的知識が依然として根本的に欠落している、ということである。自信ありげに提示された実験分析、説得力のある言辞、および、専門的見解と称する主張にも関わらず、実際に興味深い改善結果をもたらすことが制御された実験で証された社会計画はほとんど存在しない。

21日のエントリで小生は

こうした巨大建造物に関しては、兎にも角にも反対を押し切って建設してうまく機能してしまえば、それで議論に決着が付く。しかしながら、経済政策に関しては、実際に実施した後になっても、その効果が本当にあったかどうかで議論がまた延々と続くことになる（eg. ニューディール政策批判やテイラーの財政政策批判）。そこにやはり経済学独特の難しさがあると言えよう。

と書いたが、Manziはここでほぼ同様のことを論じ、その独特の難しさとして制御された実験の困難性を挙げているわけだ。

この後にManziは制御された実験についてより具体的に説明しているが、そのうちの以下の一節で、冒頭で触れた「causal density」という言葉を用いている。

But clinical trials place an enormous burden on being sure that the treatment under evaluation is the only difference between the two groups. And as experiments began to move from fields like classical physics to fields like therapeutic biology, the number and complexity of potential causes of the outcome of interest—what I term “causal density”—rose substantially. It became difficult even to identify, never mind actually hold constant, all these causes. For example, how could an experimenter in 1800, when modern genetics remained undiscovered, possibly ensure that the subjects in the test group had the same genetic predisposition to a disease under study as those in the control group?
（拙訳）
しかしながら、臨床実験は、2つのグループの違いが間違いなく評価対象の治療法だけであることに大いに依存している。そして、実験が、古典物理学のような分野から、治療のための生物学のような分野に移るにつれ、関心の対象である結果を生み出す可能性を持つ原因の数と複雑さ――私が「原因密度（causal density）」と呼ぶもの――は飛躍的に増加する。そうしたすべての原因を一定に保つことはおろか、特定することさえ難しくなるのだ。例えば、現代の遺伝学を未だ知らない1800年の実験者が、研究対象となる病気に対する遺伝的条件について、検証グループに属する被験者と比較グループに属する被験者とで同一であることをどうやって保証できようか？

その問題に対する解が、C.S.パースの生み出した無作為抽出の手法である。この手法は社会科学にも応用され、「無作為実地試験（randomized field trials(RFTs)）」と呼ばれている。
しかし、この手法を適用したとしても、それによって見い出された実験結果にはまだ問題があった。それは、その結果がどれだけ一般性を有するか、という問題である。
物理学の場合は、あるところで見い出された法則（たとえば重力の法則）が、別の場所や時間では通用しなくなることは無い、と仮定してもそれほど問題にはならない。臨床実験においても、ある実験グループで効果のあったワクチンが全人類に効くと仮定しても許されるだろう。しかし、社会科学においては、そうした一般化は得てして成立しない。そのことをManziは犯罪学のこれまでの研究を例に取って説明している*1。

その上で、最後には以下のようにやや悲観的に結論付けている。

But what do we know from the social-science experiments that we have already conducted? After reviewing experiments not just in criminology but also in welfare-program design, education, and other fields, I propose that three lessons emerge consistently from them.
First, few programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.
Second, within this universe of programs that are far more likely to fail than succeed, programs that try to change people are even more likely to fail than those that try to change incentives. A litany of program ideas designed to push welfare recipients into the workforce failed when tested in those randomized experiments of the welfare-reform era; only adding mandatory work requirements succeeded in moving people from welfare to work in a humane fashion. And mandatory work-requirement programs that emphasize just getting a job are far more effective than those that emphasize skills-building. Similarly, the list of failed attempts to change people to make them less likely to commit crimes is almost endless—prisoner counseling, transitional aid to prisoners, intensive probation, juvenile boot camps—but the only program concept that tentatively demonstrated reductions in crime rates in replicated RFTs was nuisance abatement, which changes the environment in which criminals operate. (This isn’t to say that direct behavior-improvement programs can never work; one well-known program that sends nurses to visit new or expectant mothers seems to have succeeded in improving various social outcomes in replicated independent RFTs.)
And third, there is no magic. Those rare programs that do work usually lead to improvements that are quite modest, compared with the size of the problems they are meant to address or the dreams of advocates.
Experiments are surely changing the way we conduct social science. The number of experiments reported in major social-science journals is growing rapidly across education, criminology, political science, economics, and other areas. In academic economics, several recent Nobel Prizes have been awarded to laboratory experimentalists, and leading indicators of future Nobelists are rife with researchers focused on RFTs.
It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.
At the moment, it is certain that we do not have anything remotely approaching a scientific understanding of human society. And the methods of experimental social science are not close to providing one within the foreseeable future. Science may someday allow us to predict human behavior comprehensively and reliably. Until then, we need to keep stumbling forward with trial-and-error learning as best we can.

（拙訳）
しかし、我々が既に実施した社会科学の実験からは何が学べたのだろうか？　犯罪学に限らず、社会福祉プログラムの設計、教育、その他諸々の分野の実験を調べた結果として、私は以下の3つの教訓が共通して得られた、と主張したい。

第一に、適切に無作為抽出と繰り返しが実施された実験によって機能することが証明されたプログラムはほとんど無い、と言う点である。実験の支持者たちや分析者たちは複雑でもっともらしい議論を展開するが、我々は直観に反するような新たな社会計画や政策の有効性を訴える主張に対しては、かなりの程度懐疑的であるべきである。また、経済政策や社会政策については、試行錯誤の積み重ねによる社会的進化の過程を置き換えることには慎重であるべきである。

第二に、成功するよりも失敗する確率の方が遥かに高いこうしたプログラムの母集団の中でも、人間を変えようとする計画の方が、インセンティブを変えようとする計画よりもさらに失敗する可能性が高い、という点である。生活保護受給者を働かせようとする様々な計画は、福祉改革の分野で実施された無作為抽出の実験において、次から次へと失敗した。働くことを義務化することだけが、人々を生活保護から労働に人道的な形で押しやることに成功した。また、働くことを義務化した計画の中でも、単に職を得ることに力点を置いた計画の方が、技術の習得に力点を置いた計画よりも遥かに効果的だった。同様に、人々の犯罪への性向を変えようとして失敗した試みは枚挙に暇が無い――獄中でのカウンセリング、更生のための援助、積極的な仮釈放、未成年者に対するブートキャンプ、等々。しかし、無作為実地試験の繰り返しで取りあえず犯罪率の減少を示したのは、不快な状況を軽減すること、即ち、犯罪が行われる環境を変えることだった。（ただし、このことは、直接的な行動改善計画が決して上手くいかないことを意味しているわけではない。母親なりたての人や妊婦のもとに看護師を訪問させるというある有名な計画は、無作為実地試験の独立した繰り返しにおいて、様々な社会的結果を改善するという成果を生み出すことが示された。）

第三に、魔法は存在しない、という点である。機能することが確認された数少ない計画においても、解決を目指した問題の規模や主唱者たちの夢に比べれば、改善効果は概ね極めて緩やかなものに留まる。

実験は確かに社会科学の方法を変えつつある。主な社会科学の学術誌で報告される実験の数は急速に伸びており、その分野も、教育、犯罪学、政治科学、経済学、等々にまたがる。学界の経済学においては、近年のノーベル賞の幾つかは実験経済学者に授与された。また、将来のノーベル経済学賞を占う主な指標においても、無作為実地試験を専門とする研究者が候補として数多く挙げられている。

こうしてみると、我々は社会科学における実験革命のとば口に立っており、この革命は想像も出来ないような数々の発見をもたらすのだ、と論じたい誘惑に駆られる。しかし、そうした議論に対しては懐疑的であるべきである。実験革命というのは、対象が複雑性を増すに連れてその力が減衰していく巨大な波のようなものなのだ。物理学はその革命により完全に生まれ変わった。治療のための生物学では、原因密度が物理学より高いものの、生物学的反応の均一性という仮定に大きく依拠することにより、無作為実験から得られた結果を信頼性のある形で一般化することができた。だが、社会科学での原因密度はさらに高いため、適切に実施された無作為実地試験の結果といえども、一般化することは危険である。人間社会の理解に関して真の革命を達成するためには、社会科学を生物学にまで還元する必要がありそうだが、そうしたことは、今のところ、科学の範疇を超えている。

現時点では、人間社会の科学的な理解に多少なりとも近づいているものは我々の手元には存在しない、と言って良い。実験的な社会科学の手法が近い将来にそうしたものを提供できる見込みは乏しい。いつかは、科学によって人間の行動を確実かつ完全に予測できる日が来るかもしれない。それまでは、試行錯誤によって可能な限りの学習をしつつ、こけつまろびつ前に進むしか無さそうである。

*1:ちなみに近年ではこうした実験を商売として請け負う企業が現われ（ダイレクトメールは青の封筒と白の封筒のどちらが効果的か、2つのウインドウディスプレイのうちどちらが売り上げに貢献するか、等々）、Manziもそうした企業を1999年に設立したとのことである。そうしたビジネスにおいては、実験の質より量で「causal density」の問題をカバーしているとの由。