バックプロパゲーションのソースを表示

{{pathnav|ニューラルネットワーク|frame=1}}
{{Machine learning bar}}
'''バックプロパゲーション'''（{{lang-en-short|Backpropagation}}）または'''誤差逆伝播法'''（ごさぎゃくでんぱほう<!-- 「でんぱん」という読みは間違い-->）<ref>逆誤差伝搬法（ぎゃくごさでんぱんほう）と呼ばれることもあるが，[[電波伝播]]に対する電波伝搬と同じく誤読に起因する誤字である。</ref>は[[ニューラルネットワーク]]の[[機械学習|学習]][[アルゴリズム]]である<ref>"We describe a new learning procedure, '''back-propagation''', for networks of neurone-like units." p.533 of Rumelhart (1986)</ref>。

== 概要 ==
バックプロパゲーションは[[数理モデル]]である[[ニューラルネットワーク]]の重みを層の数に関わらず更新できる（[[機械学習|学習]]できる）[[アルゴリズム]]である。[[ディープラーニング]]の主な学習手法として利用される。

そのアルゴリズムは次の通りである：

# ニューラルネットワークに学習のためのサンプルを与える。
# ネットワークの出力を求め、出力層における誤差を求める。その誤差を用い、各出力ニューロンについて誤差を計算する。
# 個々のニューロンの期待される出力値と倍率 (scaling factor)、要求された出力と実際の出力の差を計算する。これを局所誤差と言う。
# 各ニューロンの重みを局所誤差が小さくなるよう調整する。
# より大きな重みで接続された前段のニューロンに対して、局所誤差の責任があると判定する。
# そのように判定された前段のニューロンのさらに前段のニューロン群について同様の処理を行う。

[[アルゴリズム]]名が示唆するように、エラー（および学習）は出力ノードより<u>前方のノードへと伝播</u>する。技術的に言えば、バックプロパゲーションはネットワーク上の変更可能な重みについて、誤差の傾斜を計算するものである<ref>Paul J. Werbos (1994). The Roots of Backpropagation. From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY: John Wiley & Sons, Inc.</ref>。この傾斜はほとんどの場合、誤差を最小にする単純なアルゴリズムである[[最急降下法|確率的最急降下法]]で使われる。「バックプロパゲーション」という用語はより一般的な意味でも使われ、傾斜を求める手順と確率的最急降下法も含めた全体を示す。バックプロパゲーションは通常すばやく収束して、対象ネットワークの誤差の局所解(区間を限定したときの極小値、[[極値]]参照)を探し出す。[[人工神経|人工ニューロン]](または「ノード」)で使われる[[活性化関数]]は[[微分法|可微分]]でなければならない。また、[[ガウス・ニュートン法]]とも密接に関連する。

バックプロパゲーションのアルゴリズムは何度か再発見されており、逆積算モードにおける[[自動微分]]という汎用技法の特殊ケースと見ることもできる。

[[数理最適化]]問題の一種であるため、バッチ学習・オンライン学習のいずれかが採用される。典型的には[[確率的勾配降下法]]を用いたミニバッチ学習が行われる。

=== 目的 ===
ネットワーク <math>\hat{y}=f(x;w)</math> に対する誤差関数 <math>E(\hat{y}, y)</math> を定義したとき、現在の重み <math>w_k = a_{now}</math> における <math>E</math> の傾きすなわち偏微分値 <math>\left.{\partial E \over \partial w_k}\right|_{w_k=a_{now}}</math> がわかれば、最適化手法である[[勾配法]]を用いて誤差 <math>E</math> が小さくなるように <math>w_k</math> を更新（=学習）できる。学習アルゴリズムであるバックプロパゲーションの目的はこの勾配値を得て重みを学習することである。膨大数の偏微分値を[[自動微分]]により高速計算することで、極めて多次元の最適化計算の実用的な高速化が可能となる。

== テクニック ==
バックプロパゲーションを用いて（[[深層ニューラルネットワーク|深層]]）ニューラルネットワークモデルを素早く・最適解へ収束させるために様々なテクニックが提唱されている。

標準的なテクニックを[[ヤン・ルカン]]らが1998年にまとめていて<ref>{{cite journal
|author=Yann LeCun
|author2=Leon Bottou
|author3=Genevieve B. Orr
|author4=Klaus-Robert Muller
|title=Efficient BackProp
|year=1998
|url=http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
}}</ref>、2010年に Xavier Glorot らが追証・発展させている<ref name="Glorot2010">{{cite journal
|title=Understanding the difficulty of training deep feedforward neural networks
|author=Xavier Glorot
|author2=Yoshua Bengio
|year=2010
|url=http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
}}</ref>。以下に要約する。詳細はそれぞれの論文を参照。

* オンライン学習において訓練データが一周したら毎回シャッフルし直す
* 入力は、平均を0にし、[[主成分分析]]により線形[[相関]]を取り除き、分散が1になるように線形変換する。面倒だったら主成分分析は省略しても良い。
* 目標値（出力）は活性化関数を通す場合は、二次導関数が最大になる範囲内を使用するべきである。<math>1.7159 \tanh(2x/3)</math> の場合は −1〜1 で、tanh(''x'') の場合は <math>-0.5 \cosh^{-1}(2)</math> 〜 <math>0.5 \cosh^{-1}(2)</math> = −0.65848 〜 0.65848 である。
* 初期値: 各層で平均0分散1、かつ[[連続一様分布]]<ref name="deep_tutorial">[http://deeplearning.net/tutorial/mlp.html Multilayer Perceptron — DeepLearning 0.1 documentation]</ref>
** 入力ベース: <math>U(-\sqrt{3/{\text{fan}_\text{in}}}, \sqrt{3/{\text{fan}_\text{in}}})</math> by ヤン・ルカン
** 入出力ベース: <math>U(-\sqrt{6 / {\text{fan}_\text{in} + \text{fan}_\text{out}}}, \sqrt{6 / {\text{fan}_\text{in} + \text{fan}_\text{out}}})</math> by Xavier Glorot
* 勾配法: 様々なパラメータ更新法が提案され利用されている（[[確率的勾配降下法#%E5%AD%A6%E7%BF%92%E7%8E%87%E3%81%AE%E8%AA%BF%E6%95%B4%E6%96%B9%E6%B3%95%E3%81%8A%E3%82%88%E3%81%B3%E5%A4%89%E7%A8%AE|確率的勾配降下法#変種]]を参考）。
* 活性化関数
** 原点を通過すなわち <math>f(0) = 0</math>
*** 例:<math>\tanh(x)</math>、<math>\frac{x}{1 + |x|}</math><ref name="Glorot2010" />、逆に標準[[シグモイド関数]]は ''f''(0) = 0.5 のため不適切
** 入出力範囲 <math>f(\pm 1) = \pm 1</math><ref>ヤン・ルカンらによる</ref>
*** 例: <math>1.7159 \tanh(2x/3)</math>
** [[正規化線形関数|ReLU]]（[[ランプ関数]]、アナログ閾素子（{{lang-en-short|analog threshold element}}）<ref>{{Cite book|和書|author=福島邦彦|year=1989|title=神経回路と情報処理|publisher=朝倉書店|isbn=978-4254120639}}</ref>）: 経験的に良い性能<ref>{{cite journal|author=Xavier Glorot|title=Deep Sparse Rectifier Neural Networks|url=http://jmlr.csail.mit.edu/proceedings/papers/v15/glorot11a/glorot11a.pdf|journal=Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11)|volume=15|pages=315-323|author2=Antoine Bordes|author3=Yoshua Bengio}}</ref><ref name="nature201505">{{cite journal|author=Yann LeCun|date=2015-05-28|title=Deep learning|journal=Nature|volume=521|issue=7553|pages=436-444|doi=10.1038/nature14539|author2=Yoshua Bengio|author3=Geoffrey Hinton}}</ref>

== 高速化 ==
=== GPU ===
行列の掛け算は[[GPGPU]]が得意としており、高速に計算できる。[[Python]]では[[Theano]]などのライブラリおよびそれを間接的に使用してる機械学習のライブラリなどがある。

=== CPUによる並列化 ===
CPUの[[メニーコア]]やSIMDを有効活用する簡単な方法は行列演算ライブラリを使用する方法である。行列演算ライブラリとしては、例えばインテルのCPU向けでは[[Intel Math Kernel Library]]などがある。

バックプロパゲーションは完了までに非常に時間のかかる反復処理である。[[マルチコア]]のコンピュータで[[スレッド (コンピュータ)|マルチスレッド]]技法を使えば、収斂までにかかる時間を大幅に短縮することができる。バッチ学習を行う場合、マルチスレッドでバックプロパゲーションのアルゴリズムを実行するのが比較的簡単である。

訓練データをそれぞれのスレッド毎に同程度の大きさに分割して割り当てる。それぞれのスレッドで順方向と逆方向のプロパゲーションを行う。重みとしきい値のデルタをスレッド毎に合計していく。反復の周回毎に全スレッドを一時停止させて、重みとしきい値のデルタを合計し、ニューラルネットワークに適用する。これを反復毎に繰り返す。このようなバックプロパゲーションのマルチスレッド技法が[[Encog|Encog Neural Network Framework]]で使われている<ref name="MultiProp">J. Heaton http://www.heatonresearch.com/encog/mprop/compare.html Applying Multithreading to Resilient Propagation and Backpropagation</ref>。

== 歴史 ==
バックプロパゲーションに相当（連鎖律+勾配法）するニューラルネットワーク学習手法は何度も再発見されてきた。

* [[1960年]],  {{仮リンク|バーナード・ヴィドロー|en|Bernard Widrow}} & [[マーシャン・ホフ]]. <ref>{{cite journal|author=Benerard Widrow|month=August|year=1960|title=Adaptive Switching Circuits|url=http://www-isl.stanford.edu/people/widrow/papers/c1960adaptiveswitching.pdf|journal=IRE WESCON Convention Record|volume=4|pages=96-104|author2=M.E. Hoff, Jr.}}</ref><ref>{{cite journal|author=Benerard Widrow|year=1995|title=Perceptorons, Adalines, and Backpropagation|url=http://isl-www.stanford.edu/~widrow/papers/bc1995perceptronsadalines.pdf|author2=Michael A. Lehr}}</ref>:  Widrow-Hoff法（デルタルール）、隠れ層のない2層のニューラルネットワークでの出力誤差からの[[確率的勾配降下法]]
* [[1967年]], [[甘利俊一]]<ref>{{cite journal|author=Shun-ichi Amari|month=June|year=1967|title=Theory of adaptive pattern classifiers|journal=IEEE Transactions|volume=EC-1|pages=299–307|doi=10.1109/PGEC.1967.264666}}</ref><ref>{{cite journal|author=Shun-ichi Amari|year=2013|title=Dreaming of mathematical neuroscience for half a century|journal=Neural Networks|volume=37Baby|pages=48–51}}</ref>: 隠れ層のある3層のニューラルネットワーク
* [[1969年]], {{仮リンク|アーサー・E・ブライソン|en|Arthur E. Bryson}}& {{仮リンク|何毓琦|en|Yu-Chi Ho}}<ref>{{Cite book|title=Artificial Intelligence A Modern Approach|author=Stuart Russell and [[ピーター・ノーヴィグ|Peter Norvig]]|quote=The most popular method for learning in multilayer networks is called Back-propagation. It was first invented in 1969 by Bryson and Ho, but was largely ignored until the mid-1980s.|page=578}}</ref><ref>{{Cite book|title=Applied optimal control: optimization, estimation, and control|authors=Arthur Earl Bryson, Yu-Chi Ho|year=1969|pages=481|publisher=Blaisdell Publishing Company or Xerox College Publishing}}</ref>: 多段動的システム最適化手法として提案
* [[1974年]], {{仮リンク|ポール・ワーボス|en|Paul Werbos}}<ref>Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974</ref>: ニューラルネットワークにおける応用を示唆
* [[1986年]], [[デビッド・ラメルハート]]、[[ジェフリー・ヒントン]]、{{仮リンク|ロナルド・J・ウィリアムス|en|Ronald J. Williams}}<ref name="Alpaydin2010">{{Cite book|last=Alpaydın|first=Ethem|title=Introduction to machine learning|year=2010|publisher=MIT Press|location=Cambridge, Mass.|isbn=978-0-262-01243-0|edition=2nd ed.|quote=...and hence the name ''backpropagation'' was coined (Rumelhart, Hinton, and Williams 1986a).|page=250}}</ref><ref name="Rumelhart1986">{{Cite journal|last=Rumelhart|first=David E.|date=8 October 1986|title=Learning representations by back-propagating errors|journal=Nature|volume=323|issue=6088|pages=533–536|doi=10.1038/323533a0|coauthors=Hinton, Geoffrey E., Williams, Ronald J.}}</ref>: ''backwards propagation of errors''（後方への[[誤差伝播]]）の略からバックプロパゲーションの名で再発明、以後定着

* 21世紀における[[ディープラーニング]]（4層以上）ではバックプロパゲーションが学習法として多く用いられる。

== 限界 ==
* 損失超平面が極小値 ({{lang-en-short|local minima}}) を持ちうるため、勾配降下で広域最適解 ({{lang-en-short|global minima}}) に収束する保証がない (Remelhart, 1986<ref>"The most obvious drawback of the learning procedure is that the error-surface may contain local minima so that gradient descent is not guaranteed to find a global minimum." p.536 of  Rumelhart, et al. (1986). [https://www.nature.com/articles/323533a0 ''Learning representations by back-propagating errors'']. Nature.</ref>)
* 一カ所でも[[勾配消失問題|勾配消失]]を起こすとそれより下層は学習が進まなくなるため、層数が増えるほど勾配消失を起こす確率が増大していく
* 勾配が0に近い部分が存在する活性化関数を使っていると勾配消失を起こしやすい
* {{要出典範囲|学習が必ず収束するとは限らない|date=2022年6月}}
* {{要出典範囲|各次元の分散に差がありすぎると分散の小さいところに重みが集中しやすい|date=2022年6月}}

== 脚注 ==
{{Reflist}}

==関連項目==
* [[ディープラーニング]]

== 外部リンク ==
*[http://www-ailab.elcom.nitech.ac.jp/lecture/neuro/menu.html ニューラルネットワーク入門]（[[岩田彰]]）
* Chapter 7 [http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf The backpropagation algorithm] of [http://page.mi.fu-berlin.de/rojas/neural/index.html ''Neural Networks - A Systematic Introduction''] by Raul Rojas (ISBN 978-3540605058)
*[http://www.codeproject.com/Articles/19323/Image-Recognition-with-Neural-Networks Implementation of BackPropagation in C#]
*[http://ai4r.rubyforge.org/neuralNetworks.html Implementation of BackPropagation in Ruby]
*[https://amazedsaint-articles.blogspot.com/2006/06/brainnet-ii-inside-story-of-brainnet.html Backpropagation Algorithm Explained In Simple English: Along with a sample implementation in Microsoft.NET]
*[http://www.tek271.com/articles/neuralNet/IntoToNeuralNets.html Quick explanation of the backpropagation algorithm]
*[http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html Graphical explanation of the backpropagation algorithm]
*[http://www.speech.sri.com/people/anand/771/html/node37.html Concise explanation of the backpropagation algorithm using math notation]
*[http://www.matematica.ciens.ucv.ve/dcrespin/Pub/backprop.pdf Backpropagation for mathematicians]
*[http://www.codeproject.com/KB/recipes/BP.aspx Implementation of BackPropagation in C++]
*[http://sourceforge.net/projects/backprop1 Implementation of BackPropagation in Java]
*[http://arctrix.com/nas/python/bpnn.py Implementation of BackPropagation in Python]
*[http://freedelta.free.fr/r/php-code-samples/artificial-intelligence-neural-network-backpropagation/ Implementation of BackPropagation in PHP]
*[http://en.wikiversity.org/wiki/Learning_and_Neural_Networks Backpropagation neural network tutorial at the Wikiversity]

{{Normdaten}}
{{DEFAULTSORT:はつくふろはけえしよん}}
[[Category:機械学習アルゴリズム]]
[[Category:人工ニューラルネットワーク]]
[[Category:教師あり学習]]