Kademliaのソースを表示

'''Kademlia'''は [[Petar Maymounkov]]、[[Ben Jonston]]、[[Perry Stiller]]および[[David Mazieres]]により設計された分散 [[Peer to Peer|ピアツーピア]][[コンピュータネットワーク]]のための[[分散ハッシュテーブル]]である。<ref name="kademlia-paper">*[http://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf Kademlia: A Peer to peer information system Based on the XOR Metric]
</ref>Kademliaはネットワーク構造および[[ノード (ネットワーク)|ノード]]検索による情報の送受信を規定している。Kademliaのノードは[[User Datagram Protocol|UDP]]により相互に通信を行う。
参加ノードにより仮想的な[[オーバーレイ・ネットワーク]]が形成される。各ノードは''ノードID''と呼ばれる番号で管理されている。''ノードID''はノードの識別に用いるだけでなく、Kademliaアルゴリズムでは''ノードID''により値を抽出するために使われる。この値は通常ファイルの[[ハッシュ関数|ハッシュ値]]やキーワードである。実際には、''ノードID''はファイルハッシュへの直接的なマッピングを与え、そのノードはファイルやリソースを取得する対象

ある値を検索する際、このアルゴリズムではそれに割り当てられたキーの情報が必要となり、ネットワークを数ステップかけて探索する。各ステップにおいて、よりキーに近いノードが発見され、最終的に該当するノードが値を返すか、それ以上近いノードがない状態となる。これは非常に効率が良く、他の多くの[[分散ハッシュテーブル]]のようにKademliaは<math>n</math>ノードのシステムにおいて検索の間に合計<math>O(\log (n))</math>ノードへの通信を行う。([[ランダウの記号]]参照)

分散化された構造には[[DoS攻撃]]に対する耐性が明確に向上するという利点がある。たとえあるノード集合へのアクセスが飽和しても、ネットワーク全体の可用性に及ぼす影響は限定的であり、これらの「穴」を避けてネットワークが回復される。

== システム詳細 ==
<!--
First generation peer-to-peer file sharing networks, such as Napster, relied on a central database to co-ordinate look ups on the network.  Second generation peer-to-peer networks, such as Gnutella, used flooding to locate files, searching every node on the network.  Third generation peer-to-peer networks use ''Distributed hash tables'' to look up files in the network.  ''Distributed hash tables'' store resource '''locations''' throughout the network.  A major criterion for these protocols is locating the desired nodes quickly.

The Kademlia algorithm is based on the calculation of the "distance" between two nodes. This distance is computed as the [[排他的論理和]] of the two node IDs, taking the result as an [[整数]]. Keys and Node ID's have the same format and length, so distance can be calculated among them in exactly the same way.

This "distance" does not have anything to do with geographical conditions, but designates the distance within the ID range. Thus it can and does happen that, for example, a node from Germany and one from Australia are "neighbours".  Exclusive Or was chosen because it shares some properties with the [[距離|geometric distance formula]].  Specifically:

* the distance between a node and itself is zero
* it is symmetric: the "distance" calculated from A to B and from B to A are the same
* it has the triangle property: given three points A < B < C, the "distance" from A to C is less than or equal to the sum of the "distance" from A to B and the "distance" from B to C

These properties provide most of the important behaviors of measuring a real distance with a significantly lower amount of computation to calculate it.<ref name="kademlia-paper" />

Each Kademlia search iteration comes one bit closer to the target.  A '''basic''' Kademlia network with 2<sup>n</sup> nodes will only take ''n'' steps (in the worst case) to find that node.



=== ルーティングテーブル ===

Kademlia routing tables consist of a ''list'' for each bit of the node id. (e.g. if a node ID consists of 128 bits, a node will keep 128 such ''lists''.) A list has many entries. Every entry in a ''list'' holds the necessary data to locate another node.  The data in each ''list'' entry is typically the ''ip address'', ''port'', and ''node id'' of another node.  Every ''list'' corresponds to a specific distance from the node. Nodes that can go in the n<sup>th</sup> ''list'' must have a differing n<sup>th</sup> bit from the node's id; the first n-1 bits of the candidate id must match those of the node's id.  This means that it is very easy to fill the first ''list'' as 1/2 of the nodes in the network are far away candidates.  The next ''list'' can use only 1/4 of the nodes in the network (one bit closer than the first), etc.

With an ID of 128 bits, every node in the network will classify other nodes in one of 128 different distances, one specific distance per bit.

As nodes are encountered on the network, they are added to the ''lists''.  This includes store and retrieval operations and even when helping other nodes to find a key.  Every node encountered will be considered for inclusion in the ''lists''. Therefore the knowledge that a node has of the network is very dynamic. This keeps the network constantly updated and adds resilience to failures or attacks.

In the Kademlia literature, the ''lists'' are referred to as ''k-buckets''.  ''k'' is a system wide number, like 20. Every k-bucket is a ''list'' having up to ''k'' entries inside. i.e. all nodes on the network will have ''lists'' containing up to 20 nodes for a particular bit (a particular distance from himself). 

Since the possible nodes for each ''k-bucket'' decreases quickly (because there will be very few nodes that are that close), the lower bit ''k-buckets'' will fully map all nodes in that section of the network.  Since the quantity of possible ID's is much larger than any node population can ever be, some of the k-buckets corresponding to very short distances will remain empty.

[[Image:Dht example.png|thumb|420px|An example network partition]]

Consider the simple network to the right.  There are seven nodes; the small circles at the bottom.  The node under consideration is node six (binary 110) in black.  There are three ''k-buckets'' in this network.  Nodes zero, one and two (binary 000, 001, and 010) are candidates for the first ''k-bucket''.  Node three (binary 011) is not participating in the network.  In the second ''k-bucket'', nodes four and five (binary 100 and 101) are placed.  Finally, the third ''k-bucket'' can only contain node seven (binary 111).  Each of the three ''k-buckets'' is enclosed in a gray circle.  If the size of the ''k-bucket'' was two, then the first ''2-bucket'' could only contain two of the three nodes.

It is known that nodes that have been connected for a long time in a network will probably remain connected for a long time in the future<ref>Stefan Saroiu, P. Krishna Gummadi and Steven 
D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems. Technical Report UW-CSE-01-06-02, University of Washington, Department of Computer Science and Engineering, July 2001.</ref>. Because of this statistical distribution, Kademlia selects long connected nodes to remain stored in the k-buckets. This increases the number of known valid nodes at some time in the future and provides for a more stable network.

When a ''k-bucket'' is full and a new node is discovered for that ''k-bucket'', the least recently seen node in the ''k-bucket'' is PINGed.  If the node is found to be still alive, the new node is place in a secondary list; a replacement cache.  The replacement cache is used only if a node in the ''k-bucket'' stops responding. In other words: new nodes are used only when older nodes disappear.


=== Protocol messages ===


Kademlia has four messages.  

* PING       - used to verify that a node is still alive.
* STORE      - Stores a (key, value) pair in one node.
* FIND_NODE  - The recipient of the request will return the k nodes in his own buckets that are the closest ones to the requested key.
* FIND_VALUE - as FIND_NODE, but if the recipient of the request has the requested key in its store, it will return the corresponding value.

Each [[RPC]] message includes a random value from the initiator.  This ensures that when the response is received it corresponds to the request previously sent.

=== Locating nodes ===

Node lookups can proceed asynchronously.  The quantity of simultaneous lookups is denoted by α and is typically three. A node initiates a FIND_NODE request by querying to the k nodes in its own ''k-buckets'' that are the closest ones to the desired key.  When these recipient nodes receive the request, they will look in their ''k-buckets'' and return the ''k'' closest nodes to the desired key that they know. The requestor will update a results list with the results (node ID's) he receives, keeping the k best ones (the k nodes that are closer to the searched key) that respond to queries. Then the requestor will select these k best results and issue the request to them, and iterate this process again and again. Because every node has a better knowledge of his own surroundings than any other node has, the received results will be other nodes that are every time closer and closer to the searched key. The iterations continue until no nodes are returned that are closer than the best previous results. When the iterations stop, the best k nodes in the results list are the ones in the whole network that are the closest to the desired key.

The node information can be augmented with [[ラウンドトリップタイム]], or [[ラウンドトリップタイム|RTT]].  This information will be used to choose a time-out specific for every consulted node. When a query times out, another query can be initiated, never surpassing α queries at the same time.


=== Locating resources ===


Information is located by mapping it to a key.  A [[ハッシュ関数]] is typically used for the map.  The storer nodes will have information due to a previous STORE message.  Locating a value follows the same procedure as locating the closest nodes to a key, except the search terminates when a node has the requested value in his store and returns this value.

The values are stored at several nodes (k of them) to allow for nodes to come and go and still have the value available in some node. i.e. to provide redundancy.  
Every certain time, a node that stores a value will explore the network to find the k nodes that are close to the key value and replicate the value onto them. This compensates for disappeared nodes.
Also, for popular values that might have many requests, the load in the storer nodes is diminished by having a retriever store this value in some node near, but outside of, the k closest ones. This new storing is called a cache. In this way the value is stored farther and farther away from the key, depending on the quantity of requests.  This allows popular searches to find a storer quicker. Because the value is returned from nodes farther away from the key, this alleviates possible "hot spots". Caching nodes will drop the value after a certain time depending on their distance from the key.
Some implementations (eg. Kad) do not have replication nor caching. The purpose of this is to remove old information quickly from the system. The node that is providing the file will periodically refresh the information onto the network (perform NODE-LOOKUP and STORE messages).  When all of the nodes having the file go offline, nobody will be refreshing its values (sources and keywords) and the information will eventually disappear from the network.

=== Joining the network ===

A node that would like to join the net must first go through a [[Bootstrapping node|bootstrap]] process.  In this phase, the node needs to know the [[IPアドレス]] and port of another node (obtained from the user, or from a stored list) that is already participating in the Kademlia network. If the bootstrapping node has not yet participated in the network, it computes a [[ランダム]] ID number that is supposed not to be already assigned to any other node. It uses this ID until leaving the network.

The joining node inserts the bootstrap node into one of its ''k-buckets''.  The new node then does a NODE_LOOKUP of his own ID against the only other node he knows.  The "self-lookup" will populate other nodes' ''k-buckets'' with the new node id, and will populate the new node k-buckets with the nodes in the path between him and the bootstrap node.  After this, the new node refreshes all ''k-buckets'' further away than the k-bucket where the bootstrap node falls in.  This refresh is just a lookup of a random key that is within that ''k-bucket'' range. 

Initially, nodes have one ''k-bucket''.  When the ''k-bucket'' becomes full, it can be split.  The split occurs if the range of nodes in the ''k-bucket'' spans the nodes own id (values to the left and right in a binary tree).  Kademlia relaxes even this rule for the one "closest nodes" ''k-bucket'', because typically one single bucket will correspond to the distance where  all the nodes that are the closest to this node are, they may be more than k, and we want it to know them all. It may turn out that a highly unbalanced binary sub-tree exists near the node.  If ''k'' is 20, and there are 21+ nodes with a prefix "xxx0011....." and the new node is "xxx0000''11001''", the new node can contain multiple ''k-buckets'' for the other 21+ nodes.  This is to guarantee that the network knows about all nodes in the closest region.

=== Accelerated lookups ===

Kademlia uses a ''[[排他的論理和]] [[metric (mathematics)|metric]]'' to define distance. Two node ID's or a node ID and a key are XORed and the result is the distance between them. For each bit, the XOR function returns zero if the two bits are equal and one if the two bits are different. XOR metric distances hold the [[三角不等式]]: The distance from "A" to "B" is shorter than (or equal to) the distance from "A" to "C" plus the distance from "C" to "B".

The ''XOR metric'' allows Kademlia to extend routing tables beyond single bits.  Groups of bits can be placed in ''k-buckets''.  The group of bits are termed a prefix.  For an ''m-bit'' prefix, there will be 2<sup>m</sup>-1 ''k-buckets''.  The missing ''k-bucket'' is a further extension of the routing tree that contains the node ID.  An ''m-bit'' prefix reduces the maximum number of lookups from ''log<sub>2</sub> n'' to ''log<sub>2<sup>b</sup></sub> n''.  These are '''maximum''' values and the average value will be far less, increasing the chance of finding a node in an own ''k-bucket'' that share more bits than just the prefix with the target key.

Nodes can use mixtures of prefixes in their routing table, such as the [[Kad Network]] used by [[EMule|eMule]].  The Kademlia network could even be heterogeneous in routing table implementations, at the expense of complicating the analysis of lookups.

-->


== 学術的意義 ==

<!--
While the XOR metric is not needed to understand Kademlia, it is critical in the analysis of the protocol.  The XOR arithmetic forms a [[群 (数学)]] and [[抽象代数学]] allows closed analysis.  Other DHT protocols and algorithms required [[Computer simulation]] or complicated formal analysis in order to predict network behavior and correctness.  Using groups of bits as routing information also simplifies the algorithms.

-->

== ファイル共有ネットワークでの使用 ==

<!--

Kademlia is used in [[ファイル共有ソフト]] networks. By making Kademlia keyword searches, one can find information in the file-sharing network so it can be downloaded. 
Since there is no central instance to store an index of existing files, this task is divided evenly among all clients: If a node wants to share a file, it processes the contents of the file, calculating from it a number ([[ハッシュ関数|hash]]) that will identify this file within the file-sharing network. The hashes and the node IDs must be of the same length. It then searches for several nodes whose ID is close to the hash, and has his own IP address stored at those nodes. i.e. it publishes itself as a source for this file. A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the file hash, then will retrieve the sources list that is stored in that node.

Since a key can correspond to many values, e.g. many sources of the same file, every storing node may have different information. Then, the sources are requested from all k nodes close to the key. 

The file hash is usually obtained from a specially formed Internet [[Magnet URI scheme|link]] found elsewhere, or included within an indexing file obtained from other sources.

Filename searches are implemented using [[キーワード]]s. The filename is divided into its constituent words. Each of these keywords is hashed and stored in the network, together with the corresponding filename and file hash. A search involves choosing one of the keywords, contacting the node with an ID closest to that keyword hash, and retrieving the list of filenames that contain the keyword. Since every filename in the list has its hash attached, the chosen file can then be obtained in the normal way.

-->


== 実装 ==

<!--

Public clients using the Kademlia algorithm (these networks are incompatible with one another): 
*Overnet network: [[Overnet]] (integrated in [[EDonkey2000|eDonkey]] (no longer available) and [[MLDonkey|MLDonkey]]). With [https://kadc.sourceforge.net/ KadC] a C library for handling its Kademlia is available.
*[[Kad Network]]: [[EMule|eMule]] v0.40+, [[MLDonkey]] v2.5-28+. [[Lphant]] v.3.50 beta 2+ and [[AMule|aMule]] v2.1.0+.
*[http://www.revconnect.com/ RevConnect] - v0.403+.
* BitTorrent Mainline DHT: [[BitTorrent (software)|BitTorrent]] v4.1.0+, [[ΜTorrent|μTorrent]] v1.2+, [[BitSpirit]] v3.0+, [[BitComet]] v0.59+, [[KTorrent]], [[Vuze|Azureus]] 3.0+ (via a Plugin), [[Transmission]] 1.70+ , BitFlu.pl, and many [[libtorrent]]-based: They all share a DHT based on an implementation of the Kademlia algorithm, for trackerless torrents.
* [[Vuze#DHT|Azureus DHT]] v2.3.0.0+: used for decentralized [[BitTorrent]] tracking and various [http://azureus.aelitis.com/wiki/index.php/Distributed_hash_table other features]; ''differing'' from the BitTorrent Mainline DHT.
* [[Osiris_(Serverless_Portal_System)|Osiris sps]] (all version): used to manage distributed and anonymous web portal.
* Mojito - a Java Kademlia library written for the [[LimeWire]] project. Mojito is used in LimeWire to provide DHT support for BitTorrent as well as to augment the [[Gnutella]] protocol. See the [http://www.limewire.org/nightly/modules/mojito/api/org/limewire/mojito/class-use/MojitoDHT.html Class interface  documentation] for more information. <ref>[http://www.slyck.com/story1235.html Mojito and LimeWire]</ref>
*[https://khashmir.sourceforge.net/ Khashmir] - Python implementation of Kademlia. Used in the mainline Bittorrent, with some modifications.
*[http://www.thomas.ambus.dk/plan-x/routing/ Plan-x] - Java implementation.
*[http://www.heim-d.uni-sb.de/~heikowu/SharkyPy/ SharkyPy] - another python implementation of a Kademlia Distributed Hash Table. LGPL licenced.
*[https://entangled.sourceforge.net/ Entangled] - Python implementation of Kademlia, also providing a distributed [[tuple space]]. LGPL licenced
*[http://retroshare.sf.net/ RetroShare] - Kademlia implementation for secure Peer-to-Peer messaging and File Sharing

-->


== 関連項目 ==
* [[Content addressable network]]
* [[Chord]]
* [[Tapestry (DHT)]]
* [[Pastry (DHT)]]
* [[Koorde]]
* [[InterPlanetary File System]]

== 外部リンク ==
<!-- *"Kademlia: The optimized Peer network", http://kademlia.scs.cs.nyu.edu/ (possibly dead, [https://web.archive.org/web/20031206222134/http%3a//kademlia.scs.cs.nyu.edu/ mirror])
This has been dead for some time.  Links at cs.nyu.edu list kademlia as a project, but provide no link.
 -->
*[http://pdos.csail.mit.edu/~petar Petar Maymounkov's Academic Home Page]
* Kademlia Specification : https://xlattice.sourceforge.net/components/protocol/kademlia/specs.html
*[http://www.cs.northwestern.edu/~yqiao Yi Qiao] and [http://www.cs.northwestern.edu/~fabianb Fabian E. Bustamante] [http://www.aqualab.cs.northwestern.edu/publications/YQiao06SUO.pdf USENIX 2006 paper that characterizes Overnet and Gnutella]

* [[Template:Cite web]]<!-- {{cite web | url=http://www.barsoom.org/~agthorr/papers/infocom-2006-kad.pdf | last = Stutzbach| first = Daniel| title = Improving Lookup Performance over a Widely-Deployed DHT| publisher = University of Oregon  | year = 2006}} -->
* [http://www.kademlia.ru/ Russian site about Kad Network]
* [http://boykin.acis.ufl.edu/?p=142 Brunet]

== 参考文献 ==
<references/>

{{DEFAULTSORT:かてむりあ}}
{{internet-stub}}

[[Category:分散処理]]
[[Category:分散データ共有]]