劣勾配法のソースを表示

'''劣勾配法'''（れつこうばいほう、{{Lang-en-short|Subgradient methods}}）とは、[[劣微分]]を用いた[[凸最適化]]の解法である。1960年代から1970年代にかけて{{仮リンク|ナウム・ショア|en|Naum Z. Shor}}によって編み出された解法であり、微分不可能な目的関数に対して収束性を持つことが知られている。目的関数が微分可能な関数で無制約な問題の場合は[[最急降下法]]と同様の探索方向が使用される。

劣勾配法は2階微分可能な連続凸最小化問題に対してニュートン法より収束が遅いが、ニュートン法は微分不可能な点を持つ問題に対して適用することができないことから、汎用性が高い解法である。

近年では、凸最適化問題に対して[[内点法]]が提案されているが、射影劣勾配法やバンドル法といった解法も研究がなされている。劣勾配法などは計算にかかるメモリの量が比較的少量で済むことから、高次元の凸最適化問題に対しては適した解法である。

射影劣勾配法は大規模問題に対して分解法と共に使用されることが多い。分解法を用いることで問題を分割して問題を安易に扱うことができる。

== 古典的な劣勾配法の規則 ==
定義域 <math>\Reals^n.</math> において[[凸関数]]を <math>f : \Reals^n \to \Reals</math> とする。
最も古典的な劣勾配法は以下の式によって反復点が更新される:
<math display=block>x^{(k+1)}=x^{(k)} - \alpha_k g^{(k)} \ </math>
ただし <math>g^{(k)}</math> は 点 <math>x^{(k)} \ </math> における <math> f \ </math> の[[劣勾配]]を表し、<math>x^{(k)}</math> は <math>k</math> 回目の <math>x</math> を表す。 
もし <math>f \ </math> が微分可能関数であるならば、劣勾配は勾配 <math>\nabla f</math> と等しい。
ある反復において劣勾配 <math>-g^{(k)}</math> が <math>x^{(k)}</math> の <math>f \ </math> における降下方向ではない可能性もあり得る。したがって反復を通じて最良の目的関数値 <math>f_{\rm{best}} \ </math> を記録する必要があり、これは:
<math display=block>f_{\rm{best}}^{(k)}=\min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}</math>
と表される。

=== ステップサイズ規則 ===
劣勾配法にはいくつかのステップサイズ規則が知られている。本記事では収束性が[[証明 (数学)|証明]]されている古典的なステップサイズ規則について説明する。
*Constant step size: <math>\alpha_k=\alpha.</math>
*Constant step length: <math>\alpha_k=\gamma/\lVert g^{(k)} \rVert_2.</math> ただし、<math>\lVert x^{(k+1)} - x^{(k)} \rVert_2=\gamma.</math>
*Square summable but not summable step size: 以下の性質を満たすもの<math display="block">\alpha_k\geq0,\qquad\sum_{k=1}^\infty \alpha_k^2 < \infty,\qquad \sum_{k=1}^\infty \alpha_k=\infty.</math>
*Nonsummable diminishing: 以下の性質を満たすもの<math display="block">\alpha_k \geq 0,\qquad \lim_{k\to\infty} \alpha_k=0,\qquad \sum_{k=1}^\infty \alpha_k=\infty.</math>
*Nonsummable diminishing step lengths: <math>\alpha_k=\gamma_k/\lVert g^{(k)} \rVert_2.</math> ただし、<math display="block">\gamma_k \geq 0,\qquad \lim_{k\to\infty} \gamma_k=0,\qquad \sum_{k=1}^\infty \gamma_k=\infty.</math>
上記のステップサイズの規則ではステップサイズは反復開始前にあらかじめ固定するオフライン型に分類される。つまり各ステップサイズは各反復における情報を利用しない。このオフライン型の規則は微分可能関数に対する降下法で用いられるオンライン型のステップサイズの規則とは異なった規則となっている。具体的には微分可能関数の最小化問題に対する手法では[[ウルフ条件]]を満たすステップサイズを選択する。このときステップサイズは各反復における点や探索方向を用いて決定される。（改良型を含む）劣勾配法におけるステップサイズの規則に関する内容は Bertsekas{{Sfn|Bertsekas|2015}}および Bertsekas、Nedic、Ozdaglar{{Sfn|Bertsekas|2003}} の著書にまとめられている。

=== 収束の結果 ===
constant step-length を使用し劣勾配の[[ユークリッドノルム]]が1となるようにスケーリングした場合、劣勾配法は最小値に十分近い値へ収束することが{{仮リンク|ナウム・ショア|en|Naum Z. Shor|label=ショア}}により示されている<ref>The approximate convergence of the constant step-size (scaled) subgradient method is stated as Exercise 6.3.14(a) in [[:en:Dimitri P. Bertsekas|Bertsekas]]（636頁）: {{Harvnb|Bertsekas|1999|p=636}}, Bertsekas attributes this result to Shor: {{Harvnb|Shor|1985}}</ref>。すなわち、
:<math>\lim_{k\to\infty} f_{\rm{best}}^{(k)} - f^* <\epsilon</math>
が成り立つ。古典的なこれらの劣勾配法は収束が遅いことから、現在では一般的な問題に対して推奨されていないが<ref name="Lem"/><ref name="KLL"/>、特定の問題ではその問題特有の性質を活かすことで簡単に適応するできるため、広く用いられている。

== 射影劣勾配法とバンドル法 ==
1970年代、凸最適化問題に対して降下法の一種のバンドル法{{Efn|{{lang-en-short|bundle methods}}}}を{{仮リンク|クロード・ルマレシャル|en|Claude Lemaréchal}}と{{仮リンク|フィル・ウルフ|en|Philip Wolfe (mathematician)}}によって提案された{{Sfn|Bertsekas|1999}}。バンドル法は提案当時と現在において違う意味合いで用いられていた。現在知られている改良型のバンドル法や収束性の解析についてはKiwielによってによってまとめられた<ref>{{cite book |last=Kiwiel |first=Krzysztof |title=Methods of Descent for Nondifferentiable Optimization |publisher=[[Springer Verlag]] |location=Berlin |year=1985 |pages=362 |isbn=978-3540156420 |mr=0797754 }}</ref>。
現在のバンドル法はボリス・ポリャク（1969）の射影劣勾配法から編み出されたステップサイズ決定のための''[[等位集合|Level]] Control''規則を用いている。しかし、特定の問題では射影劣勾配法の方がバンドル法よりも優位性を持っている<ref name="Lem">{{cite book | last=Lemaréchal|first=Claude|author-link=:en:Claude Lemaréchal|chapter=Lagrangian relaxation|pages=112–156|title=Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000|editor=Michael Jünger and Denis Naddef|series=Lecture Notes in Computer Science|volume=2241|publisher=Springer-Verlag| location=Berlin|year=2001|isbn=3-540-42877-1|mr=1900016|doi=10.1007/3-540-45586-8_4|s2cid=9048698 }}</ref><ref name="KLL">{{cite journal|last1=Kiwiel|first1=Krzysztof&nbsp;C.|last2=Larsson |first2=Torbjörn|last3=Lindberg|first3=P.&nbsp;O.|title=Lagrangian relaxation via ballstep subgradient methods|url=http://rcin.org.pl/Content/139438/PDF/RB-2002-76.pdf |journal=[[:en:Mathematics of Operations Research|Mathematics of Operations Research]]|volume=32|date=August 2007|number=3|pages=669–686|mr=2348241|doi=10.1287/moor.1070.0261}}</ref>。

== 制約付き最適化問題 ==
=== 射影勾配法 ===
劣勾配法を拡張させた解法として射影劣勾配法が挙げられる。以下の[[数理最適化|最適化]]問題:
:minimize <math>f(x) \ </math> subject to <math display=block>x \in \mathcal{C}</math>

を考える。ただし、<math>\mathcal{C}</math> は[[凸集合]]を表す。 
射影劣勾配法は以下の式によって値を更新していく:
<math display=block>x^{(k+1)}=P \left(x^{(k)} - \alpha_k g^{(k)}\right)</math>
ただし、<math>P</math> は <math>\mathcal{C}</math> の射影、かつ <math>g^{(k)}</math> は <math>x^{(k)}</math> における <math>f \ </math> の劣勾配を表す。

=== 一般の制約 ===
劣勾配法は不等式制約付き最適化問題に対する解法として拡張することができる。以下の最適化問題を考える:

:minimize <math>f_0(x) \ </math> subject to <math display=block>f_i (x) \leq 0,\quad i=1,\ldots,m</math>

ただし、<math>f_i</math> は凸関数である。不等式制約付き最適化問題においても無制約最適化問題と同様に更新式は
<math display=block>x^{(k+1)}=x^{(k)} - \alpha_k g^{(k)} \ </math>
となる。ただし、<math>\alpha_k>0</math> はステップサイズであり、<math>g^{(k)}</math> は <math>x \ </math> における目的関数・制約の関数の劣勾配を表す。すなわち、
<math display=block>g^{(k)}=
\begin{cases} 
  \partial f_0 (x)  & \text{ if } f_i(x) \leq 0 \; \forall i=1 \dots m \\
  \partial f_j (x)  & \text{ for some } j \text{ such that } f_j(x) > 0 
\end{cases}</math>
と表される。ただし、<math>\partial f</math> は <math>f \ </math> の[[劣微分]]である。現在の反復点が制約を満たす場合、劣勾配法は目的関数の劣勾配により値を更新する。現在の反復点が制約を満たさない場合、劣勾配法は違反している制約関数の劣勾配から値を更新する。

== 脚注 ==
{{脚注ヘルプ}}
=== 注釈 ===
{{Notelist}}
=== 出典 ===
{{Reflist}}

== 参考文献 ==
* {{cite book |last=Bertsekas |first=Dimitri P. |title=Nonlinear Programming |publisher=Athena Scientific |year=1999 |location=Belmont, MA. |isbn=1-886529-00-0 |ref={{Sfnref|Bertsekas|1999}} }}
* {{cite book|last1=Bertsekas|first1=Dimitri P.|last2=Nedic|first2=Angelia|last3=Ozdaglar|first3=Asuman| title=Convex Analysis and Optimization| edition=Second| publisher=Athena Scientific| year=2003| location=Belmont, MA.| isbn= 1-886529-45-0 |ref={{Sfnref|Bertsekas|2003}} }} 
* {{cite book |last=Bertsekas |first=Dimitri P. |title=Convex Optimization Algorithms |publisher=Athena Scientific |year=2015 |location=Belmont, MA. |isbn=978-1-886529-28-1 |ref={{Sfnref|Bertsekas|2015}} }}
* {{cite book |last=Shor |first=Naum Z. |title=Minimization Methods for Non-differentiable Functions |publisher=[[Springer-Verlag]] |isbn=0-387-12763-1 |year=1985 |ref={{Sfnref|Shor|1985}} }}
* {{cite book|last=Ruszczyński|author-link=:en:Andrzej Piotr Ruszczyński|first=Andrzej|title=Nonlinear Optimization|publisher=[[Princeton University Press]]|location=Princeton, NJ|year=2006|pages=xii+454|isbn=978-0691119151 |mr=2199043}}

== 関連項目 ==
* {{annotated link|確率的勾配降下法}}

== 外部リンク ==
* [http://www.stanford.edu/class/ee364a/ EE364A] and [http://www.stanford.edu/class/ee364b/ EE364B], Stanford's convex optimization course sequence.

{{最適化アルゴリズム}}

{{DEFAULTSORT:れつこうはいほう}}
[[Category:凸解析]]
[[Category:凸最適化]]
[[Category:最適化アルゴリズムとメソッド]]
[[Category:数学に関する記事]]