勾配ブースティングのソースを表示

'''勾配ブースティング'''（こうばいブースティング、Gradient Boosting）は、[[回帰分析|回帰]]や[[分類 (統計学)|分類]]などのタスクのための[[機械学習]]手法であり、弱い予測モデル weak prediction model（通常は決定木）のアンサンブルの形で予測モデルを生成する<ref name=":1">{{Cite journal|last=Piryonesi|first=S. Madeh|last2=El-Diraby|first2=Tamer E.|date=2020-03-01|title=Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index|url=https://ascelibrary.org/doi/abs/10.1061/%28ASCE%29IS.1943-555X.0000512|journal=Journal of Infrastructure Systems|volume=26|issue=1|pages=04019036|language=EN|DOI=10.1061/(ASCE)IS.1943-555X.0000512|ISSN=1943-555X}}</ref><ref name="hastie2009">{{Cite book|last=Hastie|isbn=978-0-387-84857-0|archiveurl=https://web.archive.org/web/20091110212529/http://www-stat.stanford.edu/~tibs/ElemStatLearn/|pages=337&ndash;384|chapter=10. Boosting and Additive Trees|location=New York|publisher=Springer|chapterurl=http://www-stat.stanford.edu/~tibs/ElemStatLearn/|edition=2nd|first=T.|title=The Elements of Statistical Learning|year=2009|first3=J. H.|last3=Friedman|first2=R.|last2=Tibshirani|archivedate=2009-11-10}}</ref>。決定木が弱い学習者 weak learner である場合、結果として得られる予測器は勾配ブースト木と呼ばれ、通常は[[ランダムフォレスト]]よりも優れている<ref name=":0">{{Cite journal|last=Piryonesi|first=S. Madeh|last2=El-Diraby|first2=Tamer E.|date=2021-02-01|title=Using Machine Learning to Examine Impact of Type of Performance Indicator on Flexible Pavement Deterioration Modeling|url=http://ascelibrary.org/doi/10.1061/%28ASCE%29IS.1943-555X.0000602|journal=Journal of Infrastructure Systems|volume=27|issue=2|pages=04021005|language=en|DOI=10.1061/(ASCE)IS.1943-555X.0000602|ISSN=1076-0342}}</ref>。他の[[ブースティング]]手法と同様に段階的にモデルを構築するが、任意の[[微分可能関数|微分可能]]な[[損失関数]]の最適化を可能にすることで一般化している。

== 歴史 ==
勾配ブースティングのアイデアは、ブースティングが適切なコスト関数に対する最適化アルゴリズムとして解釈できるというレオ・ブライマンの観察に端を発している<ref name="Breiman1997">{{Cite journal|last=Breiman|first=L.|date=June 1997|title=Arcing The Edge|url=https://statistics.berkeley.edu/sites/default/files/tech-reports/486.pdf|journal=Technical Report 486|publisher=Statistics Department, University of California, Berkeley}}</ref>。その後、ジェローム・H・フリードマンが回帰勾配ブースティングアルゴリズムを開発し<ref name="Friedman1999Feb">{{Cite journal|last=Friedman|first=J. H.|date=February 1999|title=Greedy Function Approximation: A Gradient Boosting Machine|url=https://statweb.stanford.edu/~jhf/ftp/trebst.pdf}}</ref><ref name="Friedman1999Mar">{{Cite journal|last=Friedman|first=J. H.|date=March 1999|title=Stochastic Gradient Boosting|url=https://statweb.stanford.edu/~jhf/ftp/stobst.pdf}}</ref>、Llew Mason、Jonathan Baxter、Peter Bartlett、MarcusFreanがより一般的な関数型勾配ブースティングの観点から発表した<ref name="MasonBaxterBartlettFrean1999a">
{{Cite conference|last=Mason|first1=L.|last2=Baxter|first2=J.|last3=Bartlett|first3=P. L.|last4=Frean|first4=Marcus|year=1999|title=Boosting Algorithms as Gradient Descent|book-title=Advances in Neural Information Processing Systems 12|editor=S.A. Solla and T.K. Leen and K. Müller|publisher=MIT Press|pages=512–518|url=http://papers.nips.cc/paper/1766-boosting-algorithms-as-gradient-descent.pdf}}</ref> <ref name="MasonBaxterBartlettFrean1999b">
{{Cite journal|last=Mason|first=L.|last2=Baxter|first2=J.|last3=Bartlett|first3=P. L.|last4=Frean|first4=Marcus|date=May 1999|title=Boosting Algorithms as Gradient Descent in Function Space|url=https://www.maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf}}</ref>''。''後者の2つの論文では、ブースティング・アルゴリズムを反復的な関数型勾配降下アルゴリズムとして捉えることが紹介された。すなわち、負の勾配方向を向く関数（弱い仮説 weak hypothesis）を繰り返し選択することにより、関数空間上のコスト関数を最適化するアルゴリズムである。このブースティングの関数型勾配としての見方により、回帰や分類にとどまらず、[[機械学習]]や[[統計学]]の多くの分野でブースティング・アルゴリズムが開発されている。

== 簡単な紹介 ==
本節では、Li による勾配ブースティングの説明を紹介する<ref>{{Cite web|url=http://www.chengli.io/tutorials/gradient_boosting.pdf|title=A Gentle Introduction to Gradient Boosting|author=Cheng Li|accessdate=2021-10-06}}</ref>。

他のブースティング方法と同様に、勾配ブースティングは、弱い学習器を反復的に結合し1つの強い学習器を構成する。[[回帰分析|最小二乗回帰]]の設定で説明するのが簡単で、

*<math>\hat y_i </math> は <math>F(x_i)</math> の予測値
* <math>y_i</math> は <math>F(x_i)</math> の観測値

とする。ここで、<math> i </math> は訓練集合におけるインデックスであり<math>n</math> は 訓練集合のの標本数である。目標は、平均二乗誤差<math>\tfrac{1}{n}\sum\nolimits_i(F(x_i) - y_i)^2</math>  を最小化することにより未知の <math>x</math>に対する予測値を<math>\hat{y} = F(x)</math> によって得るようなモデル <math>F</math> を訓練することである。

ここで、<math>M</math> 個のステージがからなる勾配ブースティング・アルゴリズムについて考える。勾配ブースティングの <math>m</math>（<math>1 \le m \le M</math>）ステージ目において、いくつかの不完全なモデル <math>F_m</math> を想定する。<math>m</math> が小さいうちは、このモデルは単にy の平均値を返すだけかもしれない（<math>\hat y_i = \bar y</math>）。<math>F_m</math> を改善するために新しい推定量 <math>h_m(x)</math> を追加すると、

: <math>
F_{m+1}(x) = F_m(x) + h_m(x) = y
</math>

または、同等に、

: <math>
h_m(x) = y - F_m(x)
</math> 

したがって、勾配ブースティングは、{{Mvar|h}} を[[残差]] <math>y - F_m(x)</math>  に適合させる。他のブースティング手法と同様、<math>F_{m+1}</math> は前任者<math>F_{m}</math>のエラーを修正しようとする。二乗誤差以外の損失関数や分類・ランク付け問題に一般化すると、モデルの残差 <math>h_m(x)</math> は <math>F(x)</math> に関する平均二乗誤差損失関数の負の勾配に比例する。

: <math>
L_{\rm MSE} = \frac{1}{2}\left(y - F(x)\right)^2
</math>

: <math>
h_m(x) = - \frac{\partial L_{\rm MSE}}{\partial F} = y - F(x)
</math> 。

したがって、勾配ブースティングは[[最急降下法|勾配降下]]アルゴリズムに特化したものであり、これを一般化するには、異なる損失とその勾配を「プラグイン」する必要がある。

== アルゴリズム ==
多くの[[教師あり学習]]問題では、出力変数 {{Mvar|y}} と入力変数のベクトル {{Mvar|x}} があり、相互に何らかの確率分布で関連している。目標は、入力変数の値から出力変数を最もよく近似する関数 <math>\hat{F}(x)</math> を見つけることである。これは、損失関数 <math>L(y, F(x))</math> の最小化として形式化することができる。

: <math>\hat{F} = \underset{F}{\arg\min} \, \mathbb{E}_{x,y}[L(y, F(x))]</math> 。

勾配ブースティング法では、実数 {{Mvar|y}} を仮定し、クラス<math>\mathcal{H}</math> の関数 <math>h_i (x)</math> （基本学習者 base learners ないし弱い学習者 weak learners）の加重和の形で近似 <math>\hat{F}(x)</math> を求める。

: <math>\hat{F}(x) = \sum_{i=1}^M \gamma_i h_i(x) + \mbox{const.}</math>

通常、既知の標本 {{Mvar|x}} に対応する {{Mvar|y}} の値からなるトレーニングセット <math>\{ (x_1,y_1), \dots, (x_n,y_n) \} </math> が提供される。経験的リスク最小化の原則に基づき、トレーニングセットにおける損失関数の平均値を最小化する（経験的リスクを最小化する）近似<math>\hat{F}(x)</math> を探索する。これは定数関数 <math>F_0(x)</math> に基づくモデルから開始し、[[貪欲法]]で段階的に拡張する。

: <math>F_0(x) = \underset{\gamma}{\arg\min} {\sum_{i=1}^n {L(y_i, \gamma)}}</math> 
: <math>F_m(x) = F_{m-1}(x) + \underset{h_m \in \mathcal{H}}{\operatorname{arg\,min}} \left[{\sum_{i=1}^n {L(y_i, F_{m-1}(x_i) + h_m(x_i))}}\right]</math> 、

ここで、<math> h_m \in \mathcal{H} </math> は基本学習関数。

残念ながら、任意損失関数{{Mvar|L}}に対して各ステップで最適な関数 {{Mvar|h}} を選択することは、一般に計算上実行不可能な最適化問題である。そのため、単純化されたバージョンにアプローチを限定する。

この最小化問題に[[最急降下法]]のステップを適用する。

最急降下法の基本的な考え方は、<math>F_m(x)</math> を反復することによって損失関数の極小値を見つけることである。 


<math>F_m(x) = F_{m-1}(x) - \gamma \sum_{i=1}^n {\nabla_{F_{m-1}} L(y_i, F_{m-1}(x_i))}</math>


ここで <math>\gamma > 0</math> 。これは、次のことを意味する。 <math>L(y_i, F_{m}(x_i)) \le L(y_i, F_{m-1}(x_i))</math> 。


損失関数が最小値を取る <math>\gamma</math> をみつけることで、<math>\gamma</math> を最適化することができる。

<math>\gamma_m = \underset{\gamma}{\arg\min} {\sum_{i=1}^n {L\left(y_i, F_{m}) \right)}} = \underset{\gamma}{\arg\min} {\sum_{i=1}^n {L\left(y_i, F_{m-1}(x_i) -
  \gamma \nabla_{F_{m-1}} L(y_i, F_{m-1}(x_i)) \right)}},</math>


連続的な場合、つまり、 <math>\mathcal{H} </math> を <math> \R</math> 上の任意の微分可能な関数の集合と考えるト、次の式に従ってモデルを更新する

: <math>
  F_m(x) = F_{m-1}(x) + \gamma_m h_m(x), \quad
  \gamma_m = \underset{\gamma}{\operatorname{arg\,min}} \sum_{i=1}^n L(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)).
 </math>

ここで、関数 <math> F_i </math>,  <math> i \in \{ 1,..,m \}</math> を微分する。<math>\gamma_m</math>がステップ長である。離散的な場合、すなわち集合 <math>\mathcal{H}</math> が有限の場合、{{Mvar|L}} の勾配に最も近い {{Mvar|h}} を選択する。この候補関数の係数 {{Mvar|&gamma;}} は、上記の方程式の[[線型探索]]を使用して計算できる。このアプローチはヒューリスティックであるため、特定の問題に対する正確な解決策ではなく、近似値が得られることに注意。擬似コードでは、一般的な勾配ブースティング方法は次のとおり<ref name="Friedman1999Feb" /><ref name="hastie2009" />。

{{枠の始まり|blue}}
Input: training set <math>\{(x_i, y_i)\}_{i=1}^n,</math> a differentiable loss function <math>L(y, F(x)),</math> number of iterations {{Mvar|M}}.

Algorithm:

# Initialize model with a constant value:
#: <math>F_0(x) = \underset{\gamma}{\arg\min} \sum_{i=1}^n L(y_i, \gamma).</math>
# For {{Mvar|m}} = 1 to {{Mvar|M}}:
## Compute so-called ''pseudo-residuals'':
##: <math>r_{im} = -\left[\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)}\right]_{F(x)=F_{m-1}(x)} \quad \mbox{for } i=1,\ldots,n.</math>
## Fit a base learner (or weak learner, e.g. tree) <math>h_m(x)</math> to pseudo-residuals, i.e. train it using the training set <math>\{(x_i, r_{im})\}_{i=1}^n</math>.
## Compute multiplier <math>\gamma_m</math> by solving the following [[Line search|one-dimensional optimization]] problem:
##: <math>\gamma_m = \underset{\gamma}{\operatorname{arg\,min}} \sum_{i=1}^n L\left(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)\right).</math>
## Update the model:
##: <math>F_m(x) = F_{m-1}(x) + \gamma_m h_m(x).</math>
# Output <math>F_M(x).</math>
{{枠の終わり}}

== 勾配ツリーブースティング ==
勾配ブースティングは通常、固定サイズの[[決定木]]（特にCART木）を基本学習者として使用する。フリードマンは、この特殊なケースに対して、各基本学習者の適合性を向上させる勾配ブースティング法の修正を提案している。

一般的な勾配ブースティングでは、m 番目のステップにおいて、決定木 <math>h_m(x)</math> を疑似残差に適合させる。<math>J_{m}</math> をその葉の数とする。ツリーは入力空間を <math>J_{m}</math> 個の互いに素な領域 <math>R_{1m}, \ldots, R_{J_{m}m}</math>に分けて各地域の定数値を予測する。入力 x に対する出力 <math>h_m(x)</math> を[[指示関数]]を使って記述すると

: <math>h_m(x) = \sum_{j=1}^{J_{m}} b_{jm} \mathbf {1}_{R_{jm}}(x)</math>

ここで、<math>b_{jm}</math>は領域 <math>R_{jm}</math> における予測値を表す<ref>Note: in case of usual CART trees, the trees are fitted using least-squares loss, and so the coefficient <math>b_{jm}</math> for the region <math>R_{jm}</math> is equal to just the value of output variable, averaged over all training instances in <math>R_{jm}</math>.</ref>。

次に、係数 <math>b_{jm}</math> に <math>\gamma_m</math>（損失関数を最小化するように[[線型探索]]で選択する）を乗じ、モデルは次のように更新される。

: <math>
  F_m(x) = F_{m-1}(x) + \sum_{j=1}^{J_{m}} \gamma_{jm} \mathbf {1}_{R_{jm}}(x), \quad
  \gamma_{jm} = \underset{\gamma}{\operatorname{arg\,min}} \sum_{x_i \in R_{jm}} L(y_i, F_{m-1}(x_i) + \gamma).
 </math>

フリードマンは、木全体に対する <math>\gamma_m</math> ではなく、領域毎に異なる別の最適値 <math>\gamma_{jm}</math> を選択するようにこのアルゴリズムを修正することを提案している。彼は修正されたアルゴリズムを「TreeBoost」と呼んでいる。係数 <math>b_{jm}</math> を破棄して、モデルの更新規則は次のようになる。

: <math>F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x), \quad 0 < \nu \leq 1,</math>

=== 木のサイズ ===
<math>J</math> は木の末端ノードの数であり、本手法のパラメータで、手元のデータセットに合わせて調整できる。これは、モデル内の変数間の[[交互作用]]の最大許容レベルを制御する。<math>J = 2</math> （決定株）では、変数間の交互作用は許可されていない。また、<math>J = 3</math> では、最大2つの変数の間の交互作用の影響をモデルに含めることができる。

Hastie らは、典型的には <math>4 \leq J \leq 8</math> でブースティングが上手くいき、結果は <math>J</math> の選択にあまり影響を受けないが、<math>J = 2</math> では不十分であり、<math>J > 10</math> が必要になることはあまりないと述べている<ref name="hastie2009" />。

== 正則化 ==
トレーニングセットをフィットさせすぎると、モデルの汎化能力が低下してしまう。[[正則化]]と呼ばれるいくつかの手法は、フィッティング手順を制約することで、この[[オーバーフィッティング]]を軽減する。

自然な正則化パラメータの一つに、勾配ブースティングの反復回数 ''M'' （すなわち、基本学習者が決定木である場合、モデルに含まれる木の数）がある。 M を増加させると、トレーニングセットのエラーが減少するが、M が大きすぎると、[[オーバーフィッティング]]につながる可能性がある。M の最適な値は、別の検証データセットで予測誤差を監視することによって選択されることが多い。 Mの制御以外にも、いくつかの正則化手法が使用される。

もう1つの正則化パラメータは、木の深さである。この値が大きいほど、モデルがトレーニングデータに過剰適合する可能性が高くなる。

=== 収縮 ===
勾配ブースティング方法の重要な部分は、収縮による正則化であり、更新規則を次のように変更することである。

: <math>F_m(x) = F_{m-1}(x) + \nu \cdot \gamma_m h_m(x), \quad 0 < \nu \leq 1,</math>

ここでパラメータ<math>\nu</math> は「学習率」と呼ばれる。

経験的には、小さな学習率（例えば <math>\nu < 0.1</math> など）を用いると、学習率を下げずに勾配ブースティングを行った場合（<math>\nu = 1</math>）に比べて、モデルの汎化能力が劇的に向上することが分かっている<ref name="hastie2009" />。
ただし、学習率が低いと反復回数が多くなり、学習時と検索時の計算時間が長くなる。

=== 確率的勾配ブースティング ===
勾配ブースティングが導入後されて間もない頃、フリードマンは、ブレイマンのブートストラップ・アグリゲーション（[[バギング]]）法を参考にして、アルゴリズムのマイナーチェンジを提案した<ref name="Friedman1999Mar" />。具体的には、アルゴリズムの各反復において、置換なしでランダムに抽出されたトレーニングセットのサブサンプルにベース学習器を適合させることを提案した。<ref>Note that this is different from bagging, which samples with replacement because it uses samples of the same size as the training set.</ref>。フリードマンは、この変更により、勾配ブースティングの精度が大幅に向上することを確認しました。

サブサンプルはトレーニングセットから一定の割合<math>f</math>で選ばれる。<math>f = 1</math> のとき、アルゴリズムは決定論的であり、上記のものと同じになる。<math>f</math> の値が小さいと、アルゴリズムにランダム性を導入し、[[オーバーフィッティング]]の防止に役立つ。回帰木は各反復でより小さなデータセットに適合させるため、アルゴリズムも高速になる。フリードマンは小規模および中規模のトレーニングセットのいて <math>0.5 \leq f \leq 0.8 </math> で良好な結果が得られることを突き止めた<ref name="Friedman1999Mar" />。そのため、<math>f</math> は通常は0.5に設定される。これは、トレーニングセットの半分が各基本学習者の構築に使用されることを意味する。

また、バギングの場合と同様に、サブサンプリングでは、次の基本学習者の構築に使用されなかった観測値の予測を評価することで、予測性能の向上のアウトオブバッグエラーを定義できる。アウトオブバッグの推定値は、独立した検証データセットの必要性を回避するのに役立つが、実際の性能向上や最適な反復回数を過小評価することがよくある<ref name="gbm-vignette">Ridgeway, Greg (2007). [https://cran.r-project.org/web/packages/gbm/gbm.pdf Generalized Boosted Models: A guide to the gbm package.]</ref> <ref>[https://www.analyticsvidhya.com/blog/2015/09/complete-guide-boosting-methods/ Learn Gradient Boosting Algorithm for better predictions (with codes in R)]</ref>。

=== 葉の観察数 ===
勾配ツリーブースティングの実装では、木の末端ノードでの観測の最小数を制限する正則化もよく使用される。この正則化は、木の構築プロセスにおいて、この数より少ないトレーニングセットインスタンスを含むノードにつながる分割を無視する。

この制限を設けることで、葉での予測のばらつきを抑えることができる。

=== ツリーの複雑さにペナルティを課す ===
[[決定木|勾配ブーストツリーの]]もう1つの有用な正則化手法は、学習したモデルのモデルの複雑さにペナルティを課すことである<ref>Tianqi Chen. [http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf Introduction to Boosted Trees]</ref>。モデルの複雑さは、学習したツリーの葉の数に比例するものとして定義できる。損失とモデルの複雑さの共同最適化は、損失をしきい値で減らすことができない枝を取り除くポストプルーニング・アルゴリズムに対応する。他の正則化の種類としては、<math>\ell_2</math>正則化を行うことで、[[オーバーフィッティング]]を防ぐことができる。

== 使用法 ==
勾配ブースティングは、ランク付けの学習の分野でも利用されている。商用ウェブ検索エンジンである[[Yahoo!]] <ref>Cossock, David and Zhang, Tong (2008). [http://www.stat.rutgers.edu/~tzhang/papers/it08-ranking.pdf Statistical Analysis of Bayes Optimal Subset Ranking] {{Webarchive|url=https://web.archive.org/web/20100807162855/http://www.stat.rutgers.edu/~tzhang/papers/it08-ranking.pdf|date=2010-08-07}}, page 14.</ref>や [[ヤンデックス|Yandex]] <ref name="snezhinsk">[http://webmaster.ya.ru/replies.xml?item_no=5707&ncrnd=5118 Yandex corporate blog entry about new ranking model "Snezhinsk"] (in Russian)</ref>は、機械学習型のランキングエンジンに勾配ブースティングの変法を使用している。また、高エネルギー物理学の分野でも、データ解析に勾配ブースティングが利用されている。大型ハドロン衝突型加速器（LHC）では、[[ヒッグス粒子]]の発見に使用されたデータセットにおいて、勾配ブースティングを用いたディープニューラルネットワーク（DNN）が、機械学習ではない解析方法の結果を再現することに成功した<ref>{{Cite arXiv|arxiv=2001.06033|class=stat.ML|last=Lalchand|first=Vidhi|title=Extracting more from boosted decision trees: A high energy physics case study}}</ref>。

== 名前 ==
この方法にはさまざまな名前が付けられている。
フリードマンは、自分の回帰手法を「Gradient Boosting Machine」（GBM）として紹介した<ref name="Friedman1999Feb" />。
メイソン、バクスターらは、一般化された抽象的なクラスのアルゴリズムを「関数的勾配ブースティング」と表現している<ref name="MasonBaxterBartlettFrean1999a">
{{Cite conference|last=Mason|first1=L.|last2=Baxter|first2=J.|last3=Bartlett|first3=P. L.|last4=Frean|first4=Marcus|year=1999|title=Boosting Algorithms as Gradient Descent|book-title=Advances in Neural Information Processing Systems 12|editor=S.A. Solla and T.K. Leen and K. Müller|publisher=MIT Press|pages=512–518|url=http://papers.nips.cc/paper/1766-boosting-algorithms-as-gradient-descent.pdf}}</ref> <ref name="MasonBaxterBartlettFrean1999b">
{{Cite journal|last=Mason|first=L.|last2=Baxter|first2=J.|last3=Bartlett|first3=P. L.|last4=Frean|first4=Marcus|date=May 1999|title=Boosting Algorithms as Gradient Descent in Function Space|url=https://www.maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf}}</ref>。フリードマンらは、勾配ブーストモデルを発展させたものを Multiple Additive Regression Trees（MART）と表現し<ref>{{Cite journal|last=Friedman|first=Jerome|year=2003|title=Multiple Additive Regression Trees with Application in Epidemiology|journal=Statistics in Medicine|volume=22|issue=9|pages=1365–1381|DOI=10.1002/sim.1501|PMID=12704603}}</ref>、Elithらは、そのアプローチを「Boosting Regression Trees」（BRT）として説明する<ref>{{Cite journal|last=Elith|first=Jane|year=2008|title=A working guide to boosted regression trees|journal=Journal of Animal Ecology|volume=77|issue=4|pages=802–813|ref=Elith2008|DOI=10.1111/j.1365-2656.2008.01390.x|PMID=18397250}}</ref>。

[[R言語]]のオープンソースの実装では「Generalized Boosting Model」と呼んでいるが<ref name="gbm-vignette">Ridgeway, Greg (2007). [https://cran.r-project.org/web/packages/gbm/gbm.pdf Generalized Boosted Models: A guide to the gbm package.]</ref> 、「BRT」を使用している<ref>{{Cite web|author=Elith|first=Jane|title=Boosted Regression Trees for ecological modeling|url=https://cran.r-project.org/web/packages/dismo/vignettes/brt.pdf|website=CRAN|publisher=CRAN|accessdate=31 August 2018}}</ref>。また、木ベースの方手法を開発した研究者の1人であるSalford System社のDan Steinbergによる初期の商用実装にちなんで、TreeNet とも呼ばれている<ref>https://www.kdnuggets.com/2013/06/exclusive-interview-dan-steinberg-salford-systems-data-mining-solutions-provider.html</ref>。XGBoostは、2次最適化などの拡張機能を備えた最新の実装として人気がある。

== 短所 ==
ブースティングは、決定木や線形回帰などの基本学習者の精度を高めることができるが、分かりやすさ intelligibility や解釈のしやすさ interpretability を犠牲にする<ref name=":1">{{Cite journal|last=Piryonesi|first=S. Madeh|last2=El-Diraby|first2=Tamer E.|date=2020-03-01|title=Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index|url=https://ascelibrary.org/doi/abs/10.1061/%28ASCE%29IS.1943-555X.0000512|journal=Journal of Infrastructure Systems|volume=26|issue=1|pages=04019036|language=EN|DOI=10.1061/(ASCE)IS.1943-555X.0000512|ISSN=1943-555X}}</ref> <ref>{{Cite journal|last=Wu|first=Xindong|last2=Kumar|first2=Vipin|last3=Ross Quinlan|first3=J.|last4=Ghosh|first4=Joydeep|last5=Yang|first5=Qiang|last6=Motoda|first6=Hiroshi|last7=McLachlan|first7=Geoffrey J.|last8=Ng|first8=Angus|last9=Liu|first9=Bing|date=2008-01-01|title=Top 10 algorithms in data mining|journal=Knowledge and Information Systems|volume=14|issue=1|pages=1–37|language=en|DOI=10.1007/s10115-007-0114-2|ISSN=0219-3116}}</ref>。また、計算量が多くなるため、実装が難しくなることもある。

== 関連項目 ==
* [[AdaBoost]]
* [[ランダムフォレスト]]
* [[CatBoost]]
* [[LightGBM]]
* [[XGBoost]]
* [[決定木]]

== 脚注 ==
=== 注釈 ===
{{Reflist|group="注"}}
=== 出典 ===
{{Reflist}}

== 参考文献 ==
* {{Cite book|first=Bradley|last=Boehmke|first2=Brandon|last2=Greenwell|chapter=Gradient Boosting|pages=221–245|title=Hands-On Machine Learning with R|publisher=Chapman & Hall|year=2019|isbn=978-1-138-49568-5}}

== 外部リンク ==
* [http://explained.ai/gradient-boosting/index.html 勾配ブースティングを説明する方法]
* [https://blog.datarobot.com/gradient-boosted-regression-trees 勾配ブースト回帰ツリー]
* [https://github.com/microsoft/LightGBM LightGBM]

{{統計学}}

{{DEFAULTSORT:こうはいふーすていんく}}
[[Category:機械学習]]
[[Category:統計学]]
[[Category:決定木]]
[[Category:分類アルゴリズム]]