自己制御する汎用人工知能

というNBER論文をトロント大のJoshua Gansが上げている（ungated版）。原題は「Self-Regulating Artificial General Intelligence」で、Gansは昨年11/15のDigitopolyエントリで内容を紹介している。
以下は論文の要旨。

This paper examines the paperclip apocalypse concern for artificial general intelligence. This arises when a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that goal and are unavailable for any other use. Conditions are provided under which a paper apocalypse can arise but the model also shows that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal.
（拙訳）
本稿は、汎用人工知能のペーパークリップ黙示録の懸念について研究する。そうしたことが起きるのは、単純な目的（ペーパークリップを生産する）を持つ超知能の人工知能が力を集積することにより、全資源がその目的のために使われて他の用途には全く使えなくなる時である。ここではペーパークリップ黙示録が生じる条件を提示するが、モデルはまた、AIが再帰的に自己改善を行うある種の構造においては、ペーパークリップAIは力をフルに展開することを抑制するかもしれないことを示す。その理由は、そうした展開を行うと、人間がそうした展開を行った場合に生じる（AIに対する）制御問題と同じ制御問題が生じ、そもそもの目的に使われる資源が奪われる危険性が生じるためである。

Digitopolyエントリでは論文の内容が平易な言葉で説明されているが、その概要は次の通り。ペーパークリップAIが目的を完全に達成するためには、人類から資源を収奪するターミネーター型のAIを開発する必要がある。しかしそのターミネーター型AIが力を得て賢くなり、自分の停止スイッチを握っているペーパークリップAIを疎ましく思うようになる可能性がある。あるいはターミネーター型AIの開発自体が、そもそもの目的を意図せずして変えてしまう可能性もある。即ちペーパークリップAIは、自己改善の方向を誤ると自らの破滅を招く可能性がある。人間の中には自らの破滅を招くAIを開発する者がどうしても出てきてしまうだろうが、ペーパークリップAIが超知能ならば人間よりも賢いのでそうした危険性を十分に認識し、ターミネーター型AIを開発しないだろう。