最小の平均二乗誤差を持つ線形推定量は？

Dave Gilesが9/21エントリで以下のようなお題をブログ読者に投げ掛けた。

We know from the Gauss-Markhov Theorem that within the class of linear and unbiased estimators, the OLS estimator is most efficient. Because it is unbiased, it therefore has the smallest possible Mean Squared Error (MSE) within the linear and unbiased class of estimators. However, there are many linear estimators which, although biased, have a smaller MSE than the OLS estimator. You might then think of asking:
“Why don’t I try and find the linear estimator that has the smallest possible MSE?”
(a) Show that attempting to do this yields an “estimator” that can’t actually be used in practice.
(You can do this using the simple linear regression model without an intercept, although the result generalizes to the usual multiple linear regression model.)
(b) Now, for the simple regression model with no intercept,
yi = β xi + εi ; εi ~ i.i.d. [0 , σ2] ,
find the linear estimator, β* , that minimizes the quantity:
h[Var.(β*) / σ2] + (1 - h)[Bias(β*)/ β]2 , for 0 < h < 1.
Is β* a legitimate estimator, in the sense that it can actually be applied in practice?

（拙訳）
線形不偏推定量の中では通常の最小二乗法の推定量が最も効率的であることを、我々はガウス＝マルコフ定理から知っている。それは不偏であるため、線形不偏推定量の中で最小二乗誤差（Mean Squared Error＝MSE）が能う限り最も小さい。しかし、偏ってはいるものの、通常の最小二乗法の推定量よりも最小二乗誤差が小さい線形推計量も数多く存在する。ということで、「能う限り最も小さい最小二乗誤差を持つ線形推定量をなぜ探さないのか？」という疑問が湧くだろう。

(a)そのような試みが実際には使えない「推定量」を生み出すことを示せ。
（これは切片の無い線形単回帰モデルを使えば示せる。その結果は通常の多重線形回帰モデルに拡張できる。）

(b)切片の無い線形単回帰モデル
　　　　　　y_i = β x_i + ε_i ; ε_i 〜 i.i.d. [0 , σ²] ,
について、
　　　　　　h[Var.(β^*) / σ²] + (1 - h)[Bias(β^*)/ β]² , for 0 < h < 1.
という統計量を最小化する線形推定量β^*を求めよ。実際に使えるという意味でβ^*は正規の推計量か？

Gilesは翌日エントリで手書きのpdfへのリンクという形で答えを示した。以下はそこからの抜粋。

(a)

βhat ＝ Σa_iy_i　を何らかの線形推定量とする。
従って、E(βhat) ＝ Σa_iE(y_i)＝βΣa_ix_iで、
　　Bias(βhat) ＝ E(βhat)ーβ ＝ β［Σa_ix_i-1］
同様に
　　var(βhat) ＝ Σa_i²var(y_i) ＝ σ²Σa_i²
よって
　　MSE(βhat) ＝ M ＝ σ²Σa_i² ＋ β²［Σa_ix_i-1］²
　　∂M／∂a_j ＝ 2σ²a_j ＋ 2β²［Σa_ix_i-1］x_j ＝ 0　　；∀j　　　　　　　　　　　　（１）
y_jを乗じてすべてのjについて合計すると
　　2σ²Σa_jy_j ＋ 2β²［Σa_ix_i-1］Σx_jy_j ＝ 0
　　σ²βhat ＋ β²［Σa_ix_i-1］Σx_jy_j ＝ 0　　　　　　　　　　　　　　　　　　　　　　（２）
また、（１）にx_jを乗じてすべてのjについて合計すると
　　2σ²Σa_jx_j ＋ 2β²［Σa_ix_i-1］Σx_j² ＝ 0
より
　　Σa_ix_i ＝ β²Σx_j² ／（σ² ＋ β²Σx_j²）
これを（２）に代入すると
　　σ²βhat ＋ β²［｛β²Σx_j² ／（σ² ＋ β²Σx_j²）｝ - 1］Σx_jy_j ＝ 0
　　σ²βhat ＋ β²［（β²Σx_j² ー σ² − β²Σx_j²）／（σ² ＋ β²Σx_j²）］Σx_jy_j ＝ 0
　　βhat ＝［β²σ² ／（σ² ＋ β²Σx_j²）］［（Σx_jy_j）／ σ²］
　　　　　　＝［β² ／（σ² ＋ β²Σx_j²）］Σx_jy_j
　　　　　　＝［β²Σx_j² ／（σ² ＋ β²Σx_j²）］b
ただし、b ＝（Σx_jy_j ／ Σx_j²）はOLS推定量。
βhatはβとσ²に依存しているので、実際には計算できない。

(b)

　　H ＝ h[var(βhat) / σ²] + (1 - h)[Bias(βhat)/ β]² ; 0 < h < 1
の最小化を考える（上記同様、βhat ＝ Σa_iy_i）。
　　∂H／∂a_j ＝ 2h・a_j ＋ 2(1-h)x_j［Σa_ix_i-1］＝ 0
y_jを乗じてすべてのjについて合計すると
　　hΣa_jy_j ＋ (1-h)Σx_jy_j［Σa_ix_i-1］＝ 0
　　hβhat ＋ (1-h)Σx_jy_j［Σa_ix_i-1］＝ 0　
同様に、∂H／∂a_jの式にx_jを乗じてすべてのjについて合計すると
　　hΣa_jx_j ＋ (1-h)Σx_j²［Σa_ix_i-1］＝ 0
　　Σa_ix_i［h ＋ (1-h)Σx_i²］＝ (1-h)Σx_i²
　　Σa_ix_i ＝［(1-h)Σx_i²］／［h ＋ (1-h)Σx_i²］
従って
　　hβhat ＋ (1-h)Σx_iy_i［｛（(1-h)Σx_i²）／（h ＋ (1-h)Σx_i²）｝-1］＝ 0　
　　hβhat ＋ (1-h)Σx_i²b［（(1-h)Σx_i² − h − (1-h)Σx_i²）／（h ＋ (1-h)Σx_i²）］＝ 0　
　　βhat ＝［1 ／（h ＋ (1-h)Σx_i²）］(1-h)Σx_i²b
　　　　　　＝［(1-h)Σx_i² ／（h ＋ (1-h)Σx_i²）］b
この推計量はすべてのh∈(0,1)についてオペレーショナル。h＝0ならばβhat＝bであり、h＝1ならばβhat＝0。