セミパラメトリック条件付きファクターモデル：推計と推定

というNBER論文が上がっている（ungated（arxiv）版）。原題は「Semiparametric Conditional Factor Models: Estimation and Inference」で、著者はQihui Chen（香港中文大学深圳校）、Nikolai Roussanov（ペンシルベニア大）、Xiaoliang Wang（香港科技大学）。
以下はその要旨。

This paper introduces a simple and tractable sieve estimation of semiparametric conditional factor models with latent factors. We establish large-N-asymptotic properties of the estimators without requiring large T. We also develop a simple bootstrap procedure for conducting inference about the conditional pricing errors as well as the shapes of the factor loading functions. These results enable us to estimate conditional factor structure of a large set of individual assets by utilizing arbitrary nonlinear functions of a number of characteristics without the need to pre-specify the factors, while allowing us to disentangle the characteristics' role in capturing factor betas from alphas (i.e., undiversifiable risk from mispricing). We apply these methods to the cross-section of individual U.S. stock returns and find strong evidence of large nonzero pricing errors that combine to produce arbitrage portfolios with Sharpe ratios above 3. We also document a significant decline in apparent mispricing over time.
（拙訳）
本稿は、潜在ファクターを持つセミパラメトリックな条件付きファクターモデルについて、簡単で追跡可能な篩推計*1を導入する。我々は、大きなTを要求することなしに、大きなNについての推計値の漸近特性を確立する。我々はまた、条件付き価格付け誤差ならびにファクターローディング関数の形状の推計を行うための簡単なブートストラップ手順を構築する。これらの結果により我々は、ファクターを事前に特定する必要なしに、数多くの特性についての任意の非線形関数を用いることにより、個別資産の大きな集合についての条件付きファクター構造を推計することが可能になる。その際に、併せて、ファクターベータとアルファの特性上の役割を分離すること（即ち、分散不可能リスクとミスプライシングを分離すること）も可能である。我々は、これらの手法を米国の個別株リターンのクロスセクションに適用し、大きなゼロでない価格誤差についての強力な証拠を見い出した。それらの価格誤差を組み合わせると、シャープ比が3を超える裁定ポートフォリオが生成できる。我々はまた、明らかなミスプライシングが時間と共に有意に低下することも明らかにした。

以下は本文の冒頭。

Over the half-century that passed since publication of Fama and MacBeth (1973) financial economists have continued to grapple with their central question: whether asset returns are proportional, on average, to these assets’ exposures to systematic risk. The debate has centered on the role of asset characteristics that appear to be related to average returns, and whether this relationship represents “mispricing” or, instead, the characteristics’ role in capturing dynamically changing risk exposures. The challenge is that neither the nature of such systematic sources of risk nor the role of characteristics in capturing time-varying and asset-specific exposures to these sources of risk is known ex ante.
We consider the following semiparametric factor model
　　y_it = α(z_it) + β(z_it)′f_t + ε_it, 　 i = 1, . . . , N, t = 1, . . . , T, 　　　(1)
where f_t is a K ×1 vector of unobserved factors, β(·) is a K ×1 vector of unknown factor loading functions, α(·) is an unknown intercept function, ε_it is the idiosyncratic component that cannot be explained by the common component, and y_it and z_it—an M × 1 vector of covariates—are observed. Our main focus is on cross-sectional asset pricing, where y_it are asset return realizations while z_it are pre-specified asset characteristics (i.e. they are known at the beginning of period t). In this case (1) describes a conditional factor model, in the sense that it captures time-variation in asset return exposures to the common factors (i.e., β(z_it)) as well as the pricing errors (i.e., α(z_it)), which are both functions of characteristics (i.e., z_it). As emphasized by Cochrane (2011), this model is central to empirical asset pricing, since it potentially allows for distinguishing between “risk” and “mispricing” explanations of the role of characteristics in predicting asset returns. Pooling the information in a multitude of stock characteristics and summarizing the common variation using a small number of factors would amount to “taming the zoo” of factors that proliferate in empirical asset pricing. The challenge to doing so is threefold: first, the identities of the common factors f_t are unknown since the factors are latent; second, the functional forms of the alpha and beta functions are also generally unknown; finally, the cross-sectional dimension N is typically much larger than the sample time-series length T, which renders standard tools of factor analysis inapplicable, especially when conditional covariances are time-varying.
（拙訳）
ファーマ＝マクベス（1973）の出版以来経過した過去半世紀以上の間、ファイナンス経済学者は彼らの中心的な質問に取り組み続けてきた。即ち、資産のリターンは、平均して、それらの資産のシステマティックリスクへのエクスポージャーに比例するか、という問いである。そうした議論の中心になったのは、平均リターンに関連すると見られる資産の特性が果たす役割と、その関係が「ミスプライシング」を表しているのか、それとも動学的に変化するリスクエクスポージャーを捉える上で特性が果たす役割を表しているのか、という点であった。問題は、そうしたリスクのシステマティックな源泉の性質も、それらリスクの源泉に対する時変的かつ資産特有のエクスポージャーも、事前には未知であることである。
我々は以下のセミパラメトリックファクターモデルを検討した。
　　y_it = α(z_it) + β(z_it)′f_t + ε_it, 　 i = 1, . . . , N, t = 1, . . . , T, 　　　(1)
ここで f_tは観測されないファクターの K × 1 ベクトル、β(·) は未知のファクターローディング関数の K × 1 ベクトル、α(·) は未知の切片関数、ε_itは共通要因によって説明されない固有要因であり、y_itとz_it――共変量の M × 1 ベクトル――は観測される。我々の主たる論点は、y_itが資産リターンの実現値であり、z_itが事前に定まった資産の特性である（即ち、それらは期間tの当初において既知である）場合におけるクロスセクションの資産の価格付けである。この場合、(1)式は条件付きファクターモデルを表す。それは、同式が、共通ファクターに対する資産リターンのエクスポージャー（即ち β(z_it)）ならびに価格付け誤差（即ち α(z_it)）の時系列変動を捉える、という意味においてである。その2つはともに特性（即ちz_it）の関数である。コクラン（2011）が強調したように、このモデルは実証的な資産価格付けにおいて中心的な位置を占めているが、それは同モデルが、資産リターンを予測する上で特性が果たす役割についての「リスク」と「ミスプライシング」の説明を区別することを潜在的に可能にしているからである。少数のファクターを用いて、数多くの株式特性に含まれる情報をプールし、共通変動を集約することは、実証的な資産価格付けにおいて増殖しているファクターの「動物園を飼い慣らす*2」ことに帰着するであろう。その際の課題は3つある。第一に、共通ファクターf_tの正体は未知である。というのは、ファクターは潜在的なものだからである。第二に、アルファ関数とベータ関数の関数形もまた一般的に未知である。最後に、クロスセクションの次元Nはサンプルの時系列の長さTよりもかなり大きいのが普通である。そのため、ファクター分析の標準的なツールは適用できなくなる。特に、条件付き共分散が時変的な場合はそうである。

特性とリスクというと「ノーベル賞受賞研究におけるファーマとシラーの対立点 - himaginary’s diary」で紹介したダニエル＝ティットマンとファーマ＝フレンチの論争を想起するが、この点について論文では以下のように述べている。

While some of the factor models have an explicit justification based on economic theory, many implicitly rely on the idea that factors capture common variation in portfolio returns, thus appealing to arbitrage pricing theory and its extensions (Ross, 1976; Chamberlain and Rothschild, 1982; Connor and Korajczyk, 1986, 1988). Since implementing the latter requires knowledge of the conditional covariance matrix of returns, which is infeasible to estimate when N is larger than T, most studies rely on stock characteristics to proxy for (imperfectly measured) factor exposures. However, this makes distinguishing between the two types of explanations virtually impossible, as exemplified by the “characteristics versus covariances” debate (Daniel and Titman, 1997). Our method is perfectly suited for resolving this debate, since it allows characteristics to simultaneously appear in both pricing errors and conditional covariances with unobserved common factors, which they also help recover.
（拙訳）
ファクターモデルの中には経済理論に基づく明示的な正当化を備えたものもあるが、多くは、ファクターがポートフォリオのリターンの共通変動を捉えるという考えに暗黙裡に依拠しており、従って裁定価格理論やその拡張版に訴求することになる（Ross, 1976; Chamberlain and Rothschild, 1982; Connor and Korajczyk, 1986, 1988）。後者を実装するためにはリターンの条件付き共分散行列を知っていることが要求されるが、NがTより大きい時にはそれは推計不可能であるため、大半の研究はファクターエクスポージャーの（不完全に測定された）代理変数として株式特性に頼ることになる。しかしそのことは、「特性対共分散」議論（Daniel and Titman, 1997）で例証されたように、2種類の説明を区別することを事実上不可能にしてしまう。我々の手法は、価格付け誤差と、観測されない共通ファクターとの条件付き共分散の両方に特性が同時に現れることを許容するため、この議論を解決するのに完全に適している。特性は、観測されない共通ファクターを導出する助けともなる。

*1:cf. Sieve estimator - Wikipedia。本ブログで紹介した研究ではここやここで言及されている。

*2:cf. 高次元のファクターモデルとファクター動物園 - himaginary’s diary。