マージン分類器のソースを表示

'''マージン分類器'''（マージンぶんるいき、{{lang-en-short|margin classifier}}）は[[機械学習]]における分類器の一つ。適当な空間にマップされたサンプルの[[集合]]に対し決定境界 (decision boundary) を設定して、サンプルの各要素と決定境界との[[距離空間|距離]]を評価することによって統計的な分類を行う。たとえば、[[パーセプトロン]]や'''線形判別分析''' (linear discriminant analysis; LDA) に代表される[[線形分類器]]を用いる場合、分類はサンプルがマップされている空間を分割する[[超平面]]によって行われる。この場合、超平面とサンプルの各要素との距離がそれぞれのサンプル要素に対する[[マージン]]となる。なお超平面とサンプル要素との距離を与える[[距離関数]]は、典型的には[[ユークリッド距離]]を用いるが別の距離関数を用いることもままある。
<!--
In [[machine learning]], a '''margin classifer''' is a [[Statistical classification|classifier]] which is able to give an associated distance from the decision boundary for each example.  For instance, if a [[linear classifier]] (e.g. [[perceptron]] or [[linear discriminant analysis]]) is used, the distance (typically [[euclidean distance]], though others may be used) of an example from the separating hyperplane is the margin of that example.
-->

マージンという考えは様々な機械学習における分類アルゴリズムを通じて重要であり、分類器の{{仮リンク|汎化誤差|en|generalization error}}を制限するために用いられる。この制限はしばしば{{仮リンク|VC次元|en|VC dimension}}（ヴァプニク・チェルヴォーネンキス次元、{{lang-en-short|Vapnik&ndash;Chervonenkis dimension}}）によって示される。特に際立っていることは[[ブースティング|ブースティングアルゴリズム]]や[[サポートベクターマシン]]の汎化誤差に対する制限である。
<!--
The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the [[generalization error]] of the classifier.  These bounds are frequently shown using the [[VC dimension]].  Of particular prominence is the generalization [[error bound]] on [[Boosting (meta-algorithm)|boosting]] algorithms and [[support vector machine]]s.
-->

== サポートベクターマシン ==
{{main|サポートベクターマシン}}

== ブースティングアルゴリズム ==
与えられたサンプルを 2 つのクラスに分類する反復[[ブースティング|ブースティングアルゴリズム]]に対するマージンは以下のように定義できる。分類器にはサンプル要素とそれに対するラベルの組 {{math|(''x'', ''y''), (''x'' &isin; ''X'', ''y'' &isin; ''Y'')}} が与えられる。{{mvar|X}} はサンプル空間であり、{{mvar|Y}} は {{mvar|x}} に対するラベル {{mvar|y}} の集合を表し、{{math|1=''Y'' = {{(}}&minus;1, +1{{)}}}} である。反復ブースティングは各反復 {{mvar|j}} で、実際の値を予言する可能な分類器 {{math|''h{{sub|j}}'' &isin; ''C''}} を選択する。ブースティングによって選択されたこの提案は[[実数]] {{math|''&alpha;{{sub|j}}'' &isin; '''R'''}} で重み付けされる。サンプル要素 {{mvar|x}} に対するマージンは結局、次のように定義される。
: <math>\frac{y \sum_{j=1}^t \alpha_j h_j (x)}{\sum |\alpha_j|}.</math>
この定義によって、マージンはサンプル要素に与えられたラベルが正しい場合に正値をとり、ラベルが誤っている場合には負値をとることになる。
<!--
The margin for an iterative [[Boosting (machine learning)|boosting]] algorithm given a set of examples with two classes can be defined as follows.  The classifier is given an example pair <math>(x,y)</math> where <math>x \in X</math> is a domain space and <math>y \in Y = \{-1, +1\}</math> is the label of the example.  The iterative boosting algorithm then selects a classifier <math>h_j \in C</math> at each iteration <math>j</math> where <math>C</math> is a space of possible classifiers that predict real values.  This hypothesis is then weighted by <math>\alpha_j \in R</math> as selected by the boosting algorithm.  At iteration <math>t</math>, The margin of an example <math>x</math> can thus be defined as

: <math>\frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|}</math>

By this definition, the margin is positive if the example is labeled correctly and negative if the example is labeled incorrectly.
-->

ブースティングに対するマージンの定義の仕方は上述の方法が唯一ということはなく、他にも拡張や改変が考えられる。従って問題設定に応じて有効な定義が用いられる{{sfn|Schapire|Freud|Bartlett|Lee|1998}}。
<!--
This definition may be modified and is not the only way to define margin for boosting algorithms.  However, there are reasons why this definition may be appealing.<ref name="Statistics 1686">Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee.(1998) "Boosting the margin: A new explanation for the effectiveness of voting methods", ''The Annals of Statistics'', 26(5):1651–1686</ref>
-->

== マージンに基づくアルゴリズム ==
多くの分類器はそれぞれのサンプル要素に対してマージンを設定できる。しかしながら、限られたいくつかの分類器だけがデータセットからの学習によって得られたマージンの情報を利用できる。
<!--
Many classifiers can give an associated margin for each example.  However, only some classifiers utilize information of the margin while learning from a data set.
-->

多くのブースティングアルゴリズムは、マージンによってサンプルに重み付けをするという発想に依拠している。凸損失関数 {{en|(convex loss function)}} を利用する場合（[[AdaBoost]], {{仮リンク|LogitBoost|en|LogitBoost}}, また {{仮リンク|AnyBoost|en|LogitBoost}} 系のアルゴリズムを使うなど）、高いマージンを持つサンプルはより低いマージンのサンプルより小さな（あるいは等しい）重みがつけられる。このことからブースティングアルゴリズムはマージンの低いサンプルに対して重点的に重みをつけることとなる。
凸損失を利用しない {{仮リンク|BrownBoost|en|BrownBoost}} のようなアルゴリズムでは、マージンはサンプルの重みを左右し得るが、凸損失関数を利用する場合と異なり重みとマージンの間の関係は[[単調写像|単調]]ではなくなる。ブースティングアルゴリズムの中には、最小マージンを最大化するようなものが存在する（たとえば {{harvnb|Warmuth|Glocer|R&auml;tsch|2007}} を参照）。
<!--
Many boosting algorithms rely on the notion of a margin to give weights to examples.  If a convex loss is utilized (as in [[AdaBoost]], [[LogitBoost]], and all members of the [[AnyBoost]] family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin.  This leads the boosting algorithm to focus weight on low margin examples.  In nonconvex algorithms (e.g. [[BrownBoost]]), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin.  There exists boosting algorithms that provably maximize the minimum margin (e.g. see <ref>Manfred Warmuth and Karen Glocer and Gunnar R&auml;tsch. Boosting Algorithms for Maximizing the Soft Margin. In the Proceedings of Advances in Neural Information Processing Systems 20, 2007, pp 1585–1592.</ref>).
-->

[[サポートベクターマシン]]はサンプルを分割する超平面のマージンを最大化する。（超平面によって完全にデータを分離できないような）ノイズありのデータを用いて訓練されたサポートベクターマシンはソフトマージンを最大化する（詳細は[[サポートベクターマシン]]を参照）。
<!--
[[Support vector machine]]s provably maximize the margin of the separating hyperplane.  Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin.  More discussion of this can be found in the [[support vector machine]] article.
-->

多数決パーセプトロン {{en|(voted perceptron)}} は古典的な[[パーセプトロン]]の反復適用を基礎とするマージン最大化アルゴリズムである。
<!--
The [[voted-perceptron]] algorithm is a margin maximizing algorithm based on an iterative application of the classic [[perceptron]] algorithm.
-->

== 汎化誤差の制限 ==
マージン分類器の背後にある理論的な動機の一つは、アルゴリズムのパラメタとマージン項によって{{仮リンク|汎化誤差|en|generalization error}}を制限できるということにある。そのような制限の例として [[AdaBoost]] に対するものがある{{sfn|Schapire|Freund|Bartlett|Lee|1998}}。{{mvar|S}} を {{mvar|m}} 個のサンプル要素の集合とする。これらのサンプルは分布 {{mvar|D}} から独立かつ無作為に選ばれたものである。基底の分類器に対する{{仮リンク|VC次元|en|VC dimension}}は {{math|1=''d'', (1 &le; ''d'' &le; ''m'')}} となる。このとき確率 {{math|1=1 &minus; ''&delta;''}} で
:<math>
P_D\left( \frac{y \sum_{j=1}^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq 0\right)
\leq P_S\left(\frac{y \sum_{j=1}^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq \theta\right)
+ \mathcal{O}\left(\frac{1}{\sqrt{m}} \sqrt{d\log^2(m/d)/ \theta^2  + \log(1/\delta)}\right),
~~\textrm{for~all}~~\theta > 0
</math>
という制限が得られる。
<!--
One theoretical motivation behind margin classifiers is that their [[generalization error]] may be bound by parameters of the algorithm and a margin term.  An example of such a bound is for the AdaBoost algorithm.<ref name="Statistics 1686"/>  Let <math>S</math> be a set of <math>m</math> examples sampled independently at random from a distribution <math>D</math>.  Assume the VC-dimension of the underlying base classifier is <math>d</math> and <math>m \geq d \geq 1</math>.  Then with probability <math>1-\delta</math> we have the bound

: <math>P_D\left( \frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq 0\right)          \leq         P_S\left(\frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq \theta\right)  + O\left(\frac{1}{\sqrt{m}} \sqrt{d\log^2(m/d)/ \theta^2  + \log(1/\delta)}\right)</math>

for all <math>\theta > 0</math>.
-->
==出典==
{{reflist}}

== 参考文献 ==
*{{cite journal
|first1=Robert E. |last1=Schapire
|first2=Yoav      |last2=Freund
|first3=Peter     |last3=Bartlett
|first4=Wee Sun   |last4=Lee
|year=1998
|title=Boosting the margin: A new explanation for the effectiveness of voting methods
|journal=The Annals of Statistics
|volume=26
|issue=5
|pages=1651–1686
|doi=10.1214/aos/1024691352
|ref=harv}}
*{{cite journal
|first1=Manfred |last1=Warmuth
|first2=Karen   |last2=Glocer 
|first3=Gunnar  |last3=R&auml;tsch
|title=Boosting Algorithms for Maximizing the Soft Margin
|journal=Proceedings of Advances in Neural Information Processing Systems
|volume=20
|year=2007
|pages=pp. 1585–1592
|ref=harv}}

{{DEFAULTSORT:まあしんふんるいき}}

[[Category:機械学習]]
[[Category:分類アルゴリズム]]