XGBoostのソースを表示

{{Infobox Software
| 名称 = XGBoost
| ロゴ = XGBoost_logo.png
| スクリーンショット = 
| 説明文 = 
| 開発元 = The XGBoost Contributors
| 初版 = {{Start date and age|2014|03|27}}
| 最新版 = {{wikidata|property|reference|edit|P348}}
| 最新版発表日 = {{start date and age2|{{wikidata|qualifier|raw|P348|P577}}}}
| プログラミング言語 = [[C++]]
| 対応OS = {{hlist-comma|[[Linux]]|[[macOS]]|[[Windows]]}}
| 種別 = [[機械学習]]
| ライセンス = [[Apache License 2.0]]
| 公式サイト = {{official URL}}
}}

'''XGBoost'''<ref name="source-code">{{Cite web|url=https://github.com/dmlc/xgboost|title=GitHub project webpage|accessdate=2021-10-06}}</ref>は、 [[C++]]、[[Java]]、[[Python]]<ref name="xgboost-python">{{Cite web|url=https://pypi.python.org/pypi/xgboost/|title=Python Package Index PYPI: xgboost|accessdate=2016-08-01}}</ref>、[[R言語|R]]<ref name="xgboost-cran">{{Cite web|url=https://cran.r-project.org/web/packages/xgboost/index.html|title=CRAN package xgboost|accessdate=2016-08-01}}</ref>、[[Julia (プログラミング言語)|Julia]]<ref name="xgboost-julia">{{Cite web|url=http://pkg.julialang.org/?pkg=XGBoost#XGBoost|title=Julia package listing xgboost|accessdate=2016-08-01}}</ref>、[[Perl]] <ref name="xgboost-perl">{{Cite web|url=https://metacpan.org/pod/AI::XGBoost|title=CPAN module AI::XGBoost|accessdate=2020-02-09}}</ref>、[[Scala]]用の[[正則化]][[勾配ブースティング]][[フレームワーク]]を提供する[[オープンソースソフトウェア]]ライブラリ。
[[Linux]]、[[Microsoft Windows|Windows]]<ref name="xgboost-windows">{{Cite web|title=Installing XGBoost for Anaconda in Windows|url=https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows?lang=en|accessdate=2016-08-01}}</ref>、[[macOS]]で動作する<ref name="xgboost-macos">{{Cite web|title=Installing XGBoost on Mac OSX|url=https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en|accessdate=2016-08-01}}</ref>。
プロジェクトの説明によると、「スケーラブルでポータブルな分散型[[勾配ブースティング]]（GBM、GBRT、GBDT）ライブラリ」を提供することを目的としている。
単一のマシンだけでなく、分散処理フレームワークである[[Apache Hadoop]]、[[Apache Spark]]、Apache Flink、Daskでも動作する<ref name="Dask-docs">{{Cite web|title=Dask Homepage|url=https://www.dask.org|accessdate=2021-10-06}}</ref><ref>{{Cite web|title=Distributed XGBoost with Dask — xgboost 1.5.0-dev documentation|url=https://xgboost.readthedocs.io/en/latest/tutorials/dask.html|accessdate=2021-07-15|website=xgboost.readthedocs.io}}</ref>。

機械学習コンテストの優勝チームの多くが選択するアルゴリズムとして、人気と注目を集めている<ref name="xgboost-competition-winners">{{Cite web|title=XGBoost - ML winning solutions (incomplete list)|url=https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions|accessdate=2016-08-01}}</ref>。

同様に勾配ブースティングに基づくアルゴリズムとして、[[LightGBM]]と[[CatBoost]]が存在する。

== 歴史 ==
XGBoostは、Distrubuted (Deep) Machine Learning Community (DMLC) グループの一員であるTianqi Chen氏の研究プロジェクトとしてスタートした<ref name="history">{{Cite web|url=http://homes.cs.washington.edu/~tqchen/2016/03/10/story-and-lessons-behind-the-evolution-of-xgboost.html|title=Story and Lessons behind the evolution of XGBoost|accessdate=2016-08-01}}</ref>。当初は、libsvmの設定ファイルで設定可能なターミナル・アプリケーションだった。 Higgs Machine Learning Challenge で優勝した際に使用されたことで、機械学習コンテストの世界で広く知られるようになった。その後すぐに[[Python]]とRのパッケージが作られ、[[Java]]、[[Scala]]、Julia、[[Perl]]、その他の言語のパッケージ実装ができた。これにより、XGBoost はより多くの開発者に利用されるようになり、[[Kaggle]]コミュニティでも人気を博し、多くのコンペティションで利用されている<ref name="xgboost-competition-winners" />。

すぐに他の多くのパッケージと統合され、それぞれのコミュニティでの使用が容易になった。
[[Python]]ユーザーには[[scikit-learn]]、[[R言語|R]]ユーザーには[https://cran.rstudio.com/web/packages/caret/vignettes/caret.html caret]パッケージと統合された。
また、抽象化されたRabit<ref name="rabit">{{Cite web|url=https://github.com/dmlc/rabit|title=Rabit - Reliable Allreduce and Broadcast Interface|accessdate=2016-08-01}}</ref>とXGBoost4Jを使って、[[Apache Spark]]、 [[Apache Hadoop]]、Apache FLINK<ref name="xgboost4j">{{Cite web|url=https://xgboost.readthedocs.io/en/latest/jvm/index.html|title=XGBoost4J|accessdate=2016-08-01}}</ref> などのデータフローフレームワークに統合することもできる。XGBoostは、OpenCL for [[FPGA]]でも利用できる<ref name="xgboost FPGA">{{Cite web|url=https://github.com/InAccel/xgboost|title=XGBoost on FPGAs|accessdate=2019-08-01}}</ref> 。
XGBoostの効率的でスケーラブルな実装は、Tianqi ChenとCarlos Guestrinによって発表された<ref name="paper">{{Cite conference|last=Chen|editor3-first=Alexander J.|title=XGBoost: A Scalable Tree Boosting System |conference=the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|location=San Francisco, California, USA |publisher=Association for Computing Machinery|pages=785–794|doi=10.1145/2939672.2939785|editor4-first=Charu C.|editor4-last=Aggarwal|editor3-last=Smola|first1=Tianqi|editor2-first=Mohak|editor2-last=Shah|editor1-first=Balaji|editor1-last=Krishnapuram|first2=Carlos|last2=Guestrin|date=2016-08-13}}</ref>。

== 特徴 ==
XGBoostは、他の[[勾配ブースティング]]アルゴリズムとは異なる、以下の様な特徴を持っている<ref>{{Cite web|url=https://medium.com/hackernoon/gradient-boosting-and-xgboost-90862daa6c77|title=Gradient Boosting and XGBoost|author=Gandhi|first=Rohith|date=2019-05-24|website=Medium|language=en|accessdate=2020-01-04}}</ref><ref>{{Cite web|url=https://towardsdatascience.com/boosting-algorithm-xgboost-4d9ec0207d|title=Boosting algorithm: XGBoost|date=2017-05-14|website=Towards Data Science|language=en|accessdate=2020-01-04}}</ref><ref>{{Cite web|url=https://syncedreview.com/2017/10/22/tree-boosting-with-xgboost-why-does-xgboost-win-every-machine-learning-competition/|title=Tree Boosting With XGBoost – Why Does XGBoost Win "Every" Machine Learning Competition?|date=2017-10-22|website=Synced|language=en-US|accessdate=2020-01-04}}</ref>。

* Clever penalization of trees
* A proportional shrinking of leaf nodes
* Newton Boosting
* Extra randomization parameter
* Implementation on single, distributed systems and out-of-core computation
* Automatic Feature selection

== アルゴリズム ==
XGBoostは、関数空間でニュートンラフソンとして動作する。関数空間で[[勾配降下法]]として機能する[[勾配ブースティング]]とは異なり、損失関数に2次テイラー近似を使用してニュートンラフソン法との関連性を持たせている。

一般的な非正則化 XGBoost アルゴリズムは次の通り。{{枠の始まり|blue}}
Input: training set <math>\{(x_i, y_i)\}_{i=1}^N</math>, a differentiable loss function <math>L(y, F(x))</math>, a number of weak learners <math>M</math> and a learning rate <math>\alpha</math>.

Algorithm:

# Initialize model with a constant value:
#: <math>\hat{f}_{(0)}(x) = \underset{\theta}{\arg\min} \sum_{i=1}^N L(y_i, \theta).</math>
# For {{Mvar|m}} = 1 to {{Mvar|M}}:
## Compute the 'gradients' and 'hessians':
##: <math>\hat{g}_m(x_i)=\left[\frac{\partial L(y_i,f(x_i))}{\partial f(x_i)} \right]_{f(x)=\hat{f}_{(m-1)}(x)}.</math>
##: <math>\hat{h}_m(x_i)=\left[\frac{\partial^2 L(y_i,f(x_i))}{\partial f(x_i)^2} \right]_{f(x)=\hat{f}_{(m-1)}(x)}.</math>
## Fit a base learner (or weak learner, e.g. tree) using the training set <math>\displaystyle\{x_i,-\frac{\hat{g}_m(x_i)}{\hat{h}_m(x_i)}\}_{i=1}^{N}</math> by solving the optimization problem below:
##:<math>\hat{\phi}_m=\underset{\phi \in \mathbf{\Phi}}{\arg\min}\sum_{i=1}^{N}\frac{1}{2}\hat{h}_m(x_i)\left[-\frac{\hat{g}_m(x_i)}{\hat{h}_m(x_i)}-\phi(x_i) \right]^2.</math>
##:<math> \hat{f}_m(x)=\alpha \hat{\phi}_m(x).</math>
## Update the model:
##: <math>\hat{f}_{(m)}(x) = \hat{f}_{(m-1)}(x) + \hat{f}_m(x).</math>
# Output <math>\hat{f}(x)=\hat{f}_{(M)}(x)=\sum_{m=0}^{M}\hat{f}_m(x).</math>
{{枠の終わり}}

== 賞 ==
; 2006年
* ジョン・チェンバース賞<ref name="john-chambers">{{Cite web|url=http://stat-computing.org/awards/jmc/winners.html|title=John Chambers Award Previous Winners|accessdate=2016-08-01}}</ref>
* High Energy Physics Meets Machine Learning Award（HEP Meets ML<ref name="hep-meets-ml">{{Cite web|url=https://higgsml.lal.in2p3.fr/prizes-and-award/award/|title=HEP meets ML Award|accessdate=2016-08-01}}</ref>

== 関連項目 ==
* [[勾配ブースティング]]
* [[LightGBM]]
* [[CatBoost]]

== 脚注 ==
{{Reflist}}

== 外部リンク ==
* {{official website}}
* {{GitHub|dmlc/xgboost}}
* [https://xgboost.readthedocs.io/ XGBoost Documentation]

[[Category:2014年のソフトウェア]]
[[Category:C++でプログラムされた自由ソフトウェア]]
[[Category:Apache Licenseのソフトウェア]]
[[Category:機械学習]]