LEB128のソースを表示


'''LEB128''' または '''Little Endian Base 128''' は [[任意精度演算|任意精度の整数]]を少ないバイト数に格納するのに用いられる[[可変長符号]]圧縮である。

LEB128は[[DWARF]] デバッグファイルフォーマッ<ref name="dwarfspec2">{{Citation|title=DWARF Debugging Information Format Specification Version 2.0, Draft|last=UNIX International|date=July 1993|url=http://dwarfstd.org/doc/dwarf-2.0.0.pdf|chapter=7.8|access-date=2009-07-19}}</ref><ref name="dwarfspec3">{{Cite web |url=http://dwarfstd.org/doc/Dwarf3.pdf |title=DWARF Debugging Information Format Specification Version 3.0 |page=70 |access-date=2009-07-19 |author=Free Standards Group |date=December 2005}}</ref>や [[WebAssembly]] のすべての整数リテラルの符号化方式に使用されている<ref name="wasmint">{{Cite web |url=https://webassembly.github.io/spec/core/binary/values.html#integers |title=Values — Binary Format — WebAssembly 1.1 |access-date=2020-12-31 |author=WebAssembly Community Group |date=2020-11-12}}</ref>。

== 符号化形式 ==
LEB128形式は[[可変長数値表現]] (VLQ) と非常に似ている。主な違いは LEB128 は [[エンディアン|リトルエンディアン]] であるのに対し、可変長数値表現は[[エンディアン|ビッグエンディアン]]であることであり、どちらも小さな数値を1バイトで格納できる一方で、任意の長さの数値をエンコードすることもできる。LEB128には2つのバージョンがあり、符号無しLEB128と符号付きLEB128である。復号する際は、エンコードされた値が符号無しLEB128か符号付きLEB128かを知っている必要がある。

=== 符号無し LEB128 ===
符号無し整数を Unsigned LEB128 ('''ULEB128''') にするには、まず2進数で表現する。

次に数値を7ビットの倍数になるように[[符号拡張|ゼロ拡張]]する（数値が0でない場合、最上位7ビットがすべて0にならないようにする）。数値を7ビットのグループに分割する。各7ビットグループに対して、最下位グループから最上位グループの順に1バイトずつ出力する。各バイトは、そのグループを7つの最下位ビットに持ちます。最後のバイトを除くすべてのバイトの最上位ビットを1に設定する。数値0は通常、単一バイト0x00としてエンコードされふ。WebAssemblyでは、0の代替エンコーディングも許可されている（0x80 0x00、0x80 0x80 0x00、...）。
 MSB ------------------ LSB
       10011000011101100101  元の値の2進数表現
      010011000011101100101  7の倍数ビットに拡張
  0100110  0001110  1100101  7ビットごとに分割
 00100110 10001110 11100101  最後（最上位）のグループを除くすべてに高位1ビットを追加してバイトを形成
     0x26     0x8E     0xE5  16進数表現
 
 → 0xE5 0x8E 0x26            出力 (最下位ビットから最上位ビットの順)

=== 符号付き LEB128 ===
符号付きの数値も同様に表現される。[[2の補数]] で表現された、7の倍数の <math>N</math>ビットから始め、符号なしエンコーディングと同様にグループに分割する。

例えば、符号付き数値-123456は0xC0 0xBB 0x78としてエンコードされる。
 MSB ------------------ LSB
          11110001001000000  123456 の2進数表現
      000011110001001000000  21ビットの
      111100001110110111111  すべてのビットを反転 ([[1の補数]])
      111100001110111000000  1を足す(2の補数)
  1111000  0111011  1000000  7ビットごとに分割
 01111000 10111011 11000000  最後（最上位）のグループを除くすべてに高位1ビットを追加してバイトを形成
     0x78     0xBB     0xC0  16進数表現
 
 → 0xC0 0xBB 0x78            出力 (最下位ビットから最上位ビットの順)

== C言語風 擬似コード ==

=== 符号無し整数のエンコード ===
<syntaxhighlight lang="c">
do {
  byte = low-order 7 bits of value;
  value >>= 7;
  if (value != 0) /* more bytes to come */
    set high-order bit of byte;
  emit byte;
} while (value != 0);
</syntaxhighlight>

=== 符号付き整数のエンコード ===
<syntaxhighlight lang="c">
more = 1;
negative = (value < 0);

/* the size in bits of the variable value, e.g., 64 if value's type is int64_t */
size = no. of bits in signed integer;

while (more) {
  byte = low-order 7 bits of value;
  value >>= 7;
  /* the following is only necessary if the implementation of >>= uses a
     logical shift rather than an arithmetic shift for a signed left operand */
  if (negative)
    value |= (~0 << (size - 7)); /* sign extend */

  /* sign bit of byte is second high-order bit (0x40) */
  if ((value == 0 && sign bit of byte is clear) || (value == -1 && sign bit of byte is set))
    more = 0;
  else
    set high-order bit of byte;
  emit byte;
}
</syntaxhighlight>

=== 符号無し整数の復号 ===
<syntaxhighlight lang="c">
result = 0;
shift = 0;
while (true) {
  byte = next byte in input;
  result |= (low-order 7 bits of byte) << shift;
  if (high-order bit of byte == 0)
    break;
  shift += 7;
}
</syntaxhighlight>

=== 符号付き整数の復号 ===
<syntaxhighlight lang="c">
result = 0;
shift = 0;

/* the size in bits of the result variable, e.g., 64 if result's type is int64_t */
size = number of bits in signed integer;

do {
  byte = next byte in input;
  result |= (low-order 7 bits of byte << shift);
  shift += 7;
} while (high-order bit of byte != 0);

/* sign bit of byte is second high-order bit (0x40) */
if ((shift <size) && (sign bit of byte is set))
  /* sign extend */
  result |= (~0 << shift);
</syntaxhighlight>{{Reflist}}{{Reflist}}

== 使用例 ==

* [[Android (オペレーティングシステム)|Android]] プロジェクトはDalvik 実行可能ファイルフォーマット (.dex) で LEB128 で使用<ref name="dex">{{Cite web |url=https://source.android.com/devices/tech/dalvik/dex-format |title=Dalvik Executable Format |access-date=2021-05-18}}</ref>。
* Hewlett-Packard IA-64の例外処理におけるテーブルの圧縮<ref name="Christophe">{{Cite web |url=https://www.usenix.org/legacy/events/wiess2000/full_papers/dinechin/dinechin_html/ |title=C++ Exception Handling for IA-64 |author=Christophe de Dinechin |access-date=2009-07-19 |date=October 2000}}</ref>。
* [[DWARF]] ファイルフォーマットは、様々なフィールドで符号なしと符号付きの両方のLEB128エンコーディングを使用。
* [[LLVM]] ではカバレッジマッピングフォーマットで使用している<ref>{{Cite web |url=https://llvm.org/docs/CoverageMappingFormat.html#leb128 |title=LLVM Code Coverage Mapping Format |author=LLVM Project |access-date=2016-10-20 |year=2016}}</ref>。LLVMのLEB128エンコーディングとデコーディングの実装は、前述の擬似コードと並んで有用である<ref>{{Cite web |url=https://llvm.org/doxygen/LEB128_8h_source.html |title=LLVM LEB128 encoding and decoding |author=LLVM Project |access-date=2019-11-02 |year=2019}}</ref>。
* [[マイクロソフト|Microsoft]] [[.NET Framework]] では、'''BinaryReader'''と'''BinaryWriter'''クラスで「7ビットエンコード整数」フォーマットをサポートしており、'''BinaryWriter'''で文字列を書き込む際、文字列の長さはこのメソッドでエンコードされている。
* ''[[Minecraft]]'' は、パケット内のデータの長さを測定するためにプロトコルでLEB128を使用<ref>{{Cite web |url=https://wiki.vg/Protocol#VarInt_and_VarLong |title=Minecraft Modern Varint & Varlong |website=wiki.vg |access-date=2020-11-29 |year=2020}}</ref>。
* mpatrolデバッギングツールは、トレースファイルフォーマットでLEB128を使用<ref name="tracefile">{{Cite web |url=https://mpatrol.sourceforge.net/doc/Tracing-file-format.html |title=MPatrol documentation |access-date=2009-07-19 |date=December 2008}}</ref>。
* [[osu!]] は、osu!リプレイ（.osr）フォーマットでLEB128を使用<ref>{{Cite web |url=https://osu.ppy.sh/help/wiki/osu!_File_Formats/Osr_(file_format) |title=Osr (file format) - osu!wiki |website=osu.ppy.sh |language=en |access-date=2017-03-18}}</ref>。
* [[Efficient XML Interchange|W3C Efficient XML Interchange (EXI)]] は、ここで説明されているのとまったく同じ方法で、LEB128を使用して符号無し整数を表現する<ref>{{Cite web |title=Efficient XML Interchange (EXI) Format 1.0 |url=https://www.w3.org/TR/exi/ |access-date=2020-12-31 |date=2014-02-11 |publisher=[[World Wide Web Consortium]] |website=www.w3.org}}</ref>。
* [[WebAssembly]]は、モジュールのポータブルなバイナリエンコーディングで使用。
* I[[XZ Utils|xzファイルフォーマット]]<ref>{{Cite web |url=https://tukaani.org/xz/xz-file-format.txt |title=The .xz File Format |website=tukaani.org |access-date=2017-10-30 |year=2009}}</ref>

== 関連するエンコーディング ==

* [https://web.archive.org/web/20210224160104/http://www.dlugosz.com/ZIP2/VLI.html Dlugoszの可変長整数エンコーディング] ([http://www.dlugosz.com/ZIP2/VLI.html original]) は、最初の3つのサイズブレークで7ビットの倍数を使用しますが、その後は増分が変化します。また、プレフィックスビットを各バイトの先頭ではなく、ワードの先頭にすべて配置します。
* [[ヒューマン・インタフェース・デバイス|ヒューマン・インターフェース・デバイス]]レポートディスクリプタバイトは、2ビットのバイトカウントビットフィールドを使用して、後に続く0、1、2、または4バイトの整数のサイズをエンコードします。常にリトルエンディアンです。符号性、つまり短縮された整数を符号付きで拡張するかどうかは、ディスクリプタタイプに依存します。
* [[LLVM]] ビットコードファイルフォーマットは、類似の技術を使用して<ref>{{Cite web |url=https://llvm.org/docs/BitCodeFormat.html#variable-width-value |title=LLVM Bitcode File Format — LLVM 13 documentation |access-date=2024-10-04}}</ref> 固定の7ビットの代わりに、コンテキストに依存したサイズのビットグループに値を分割し、最上位ビットで継続を示します。
* [[Protocol Buffers]] (Protobuf) は、符号なし整数には同じエンコーディングを使用しますが、符号付き整数は最初のバイトの最下位ビットとして符号を付加します。
* [http://luca.ntop.org/Teaching/Appunti/asn1.html ASN.1 BER, DER] は、各ASN.1タイプの値を8ビットオクテットの文字列としてエンコードします。

== 関連項目 ==

* [[DWARF|DWARF デバッグフォーマット]]
* [[UTF-7]]
== 出典 ==
<references />
[[Category:可逆圧縮アルゴリズム]]