フェーズボコーダのソースを表示

<!-- 英語版記事 [[w:en:Phase vocoder]] 20:33, 22 December 2012 (UTC) に基づく日本語版 -->
'''フェーズボコーダ'''（{{lang-en|Phase vocoder}}）は[[音声]]信号を[[周波数領域]]の振幅と<u>[[位相]]</u>でモデル化する[[音声分析合成#%E3%83%9C%E3%82%B3%E3%83%BC%E3%83%80%E3%83%BC|<u>ボコーダ</u>]]である<ref>"The method specifies the speech signal in terms of its short-time amplitude and phase spectra. For this reason, it is called phase vocoder." Flanagan, et al. (1966). ''[[doi:10.1016/0167-6393(90)90021-Z|Phase Vocoder]]''.</ref><ref group="注釈">
	信号の位相情報とは、信号を[[複素数#極形式|極形式]]表示した場合の[[複素数の偏角|偏角]] <math>\theta</math> の事。信号の周波数や位相の時間的変化を踏まえて{{仮リンク|瞬時位相|en|instantaneous phase}}と呼ぶ。

*一般に[[信号解析]]では[[複素解析]]を応用して、実信号<math>x(t)</math>を形式的に複素拡張して<!-- (虚数成分<math>i\cdot\tilde{x}(t) = i\cdot H(x)(t)</math>を加えた) -->{{仮リンク|解析信号|en|analytic signal}}<math>x_a(t) = x(t) + i\cdot\tilde{x}(t)</math> として扱い、[[オイラーの公式]]で極形式 <math>x_a(t) = r\cdot e^{i\theta}</math> に変換して、偏角<math>\theta</math>を得る。
*信号が単純な[[余弦関数]]もしくは[[正弦関数]]の場合、その引数が偏角である。
*フェーズボコーダのようなフーリエ変換の応用では、信号のフーリエ級数<math>x_k(t)</math>を極形式に変換して、フーリエ級数の偏角<math>\theta_k(t)</math>を得る。

:{|
|信号のフーリエ級数:　
|<math>x_k(t) </math>
|<math>= a_k\cos(2\pi f_k\cdot t) + b_k\sin(2\pi f_k\cdot t) </math>
|　　(直交形式)
|-
|
|
|<math>= r_k\cos(2\pi f_k\cdot t + \phi_k)</math>
|　　(極形式)
|-
|フーリエ級数の偏角:　
|<math>\theta_k(t)</math>
|<math>= 2\pi f_k\cdot t + \phi_k</math>
|}
(関連記事: 「[[アディティブ・シンセシス]]」の"[[アディティブ・シンセシス#周期関数のフーリエ級数展開|周期関数のフーリエ級数展開]]"、"[[アディティブ・シンセシス#周波数の時間発展（インハーモニック形式）|周波数の時間発展]]")
</ref>。

フェーズボコーダの心臓部は[[短時間フーリエ変換]] (STFT)であり、次の段階を経る。 
# '''分析''': 　STFTによる[[時間領域]]表現→{{仮リンク|時間-周波数表現|en|Time-frequency representation}}変換
# '''変更''':　 任意の周波数成分の振幅・位相操作
# '''再合成''': 逆STFTによる[[周波数領域]]表現→時間領域表現変換
<!-- The computer [[algorithm]] allows [[frequency-domain]] modifications to a digital sound file (typically [[Audio timescale-pitch modification|time expansion/compression and pitch shifting]]). -->
フェーズボコーダは周波数領域での変更処理により[[タイムストレッチ/ピッチシフト|音声信号の時間伸縮とピッチ変換]]などを可能にする。また再合成前にSTFT分析フレームの時間的位置を変更すれば、再合成結果の時間発展を変更でき、たとえば音の[[タイムストレッチ|時間スケール変更]]を実現できる。

== 位相コヒーレンス問題 ==

<!-- The main problem that has to be solved for all case of manipulation of the STFT is the fact that individual signal components (sinusoids, impulses) will be spread over multiple frames and multiple STFT frequency locations (bins).  This is because the STFT analysis is done using overlapping [[Window function|analysis windows]]. The windowing results in [[spectral leakage]] such that the information of individual sinusoidal components is spread over adjacent STFT bins. To avoid border effects of tapering of the  analysis windows, STFT analysis windows overlap in time. This time overlap results in the fact that adjacent STFT analysis are strongly correlated (a sinusoid present in analysis frame at time "t" will be present in the subsequent frames as well).  -->
位相コヒーレンス問題はSTFTによる時間-周波数表現 (STFT表現) の操作で必ず解決が必要な主要問題である。これは、時間軸方向にオーバーラップした分析窓（[[窓関数]]）を使用する事により、個々の信号成分 (正弦波、インパルス) が、複数のフレームやSTFT周波数ビン(bin)へ拡散しまう問題である（周波数については「[[スペクトル漏れ]]」にあたる。）。窓関数の時間的オーバーラップは、隣接するSTFT分析結果が互いに強い相関を持つという事実に基づいている (時刻 ''t'' の分析フレームに存在する正弦波成分は、後続フレームにも同様に存在し続ける可能性が高い)。

<!-- The problem of signal transformation with the phase vocoder is related to the problem that all modifications that are done in the STFT representation need to preserve the appropriate correlation between adjacent frequency bins (vertical coherence) and time frames (horizontal coherence). Except in the case of extremely simple synthetic sounds, these appropriate correlations can only be preserved approximately, and since the invention of the phase vocoder research has been mainly concerned with finding algorithms that would preserve the vertical and horizontal coherence of the STFT representation after the modification. For time scaling operations amplitude coherence is only a minor problem because shifting analysis frames in time has only a minor impact on the amplitude. The phase coherence problem was investigated for quite a while before appropriate solutions emerged. -->
STFT表現上で行なう全ての変更は、隣接する 周波数ビン (垂直コヒーレンス) や 時間フレーム (水平コヒーレンス) との間で 「適切な相関関係」を維持する必要がある。これは、フェーズボコーダによる信号変形の問題に関連している。合成音が極めて単純な場合以外、この「適切な相関関係」を正確に維持ですることは困難である。フェーズボコーダの発明以来、研究は主にSTFT表現変更後に垂直/水平コヒーレンスを維持するアルゴリズムの発見のために行われてきた。なお、振幅コヒーレンスは、時間スケール操作に関してマイナーな問題に過ぎない。なぜなら、分析フレームの時間シフトは、振幅に小さな影響しか与えないからである。しかし位相コヒーレンスの問題は、適切な解決策が得られるまでかなり長い期間の検討を要した。

== 歴史 ==

<!-- The phase vocoder was introduced in 1966 by Flanagan as an algorithm that would preserve horizontal coherence between the phases of bins that represent sinusoidal components.<ref>
{{citation
 | last1     = Flanagan | first1 = J.L. 
 | last2     = Golden   | first2 = R. M.
 | year      = 1996
 | title     = Phase vocoder
 | journal   = Bell System Technical Journal
 | volume    = 45 | pages = 1493&ndash;1509
}}</ref> This original phase vocoder did not take into account the vertical coherence between adjacent frequency bins, and therefore, time stretching with this system did produce sound signals that were missing clarity.  -->
フェーズボコーダは {{lang|en|{{harvtxt|Flanagan|Golden|1966}}}} によって、正弦波成分を表す各ビンの位相間で水平コヒーレンスを維持するアルゴリズムとして導入された<ref>
{{lang|en|{{citation
 | last1     = Flanagan | first1 = J.L.
 | last2     = Golden   | first2 = R.M.
 | year      = 1966
 | title     = Phase vocoder
 | url       = http://www.alcatel-lucent.com/bstj/vol45-1966/articles/bstj45-9-1493.pdf
 | journal   = Bell System Technical Journal
 | volume    = 45 | issue = 9 | pages = 1493&ndash;1509
}}}}</ref>。このオリジナルのフェーズボコーダは、隣接する周波数ビン間の垂直コヒーレンスを考慮しなかったので、このシステムによるタイムストレッチ(時間伸縮)の音響信号は明瞭さが欠けていた。

<!-- The optimal reconstruction of the sound signal from STFT after amplitude modifications has been proposed by Griffin and Lim in 1984.<ref>
{{citation
 | last1     = Griffin  | first1 = D.
 | last2     = Lim      | first2 = J.
 | year      = 1984
 | title     = Signal Estimation from Modified Short-Time Fourier Transform
 | journal   = IEEE Transactions on Acoustics, Speech and Signal Processing
 | volume    = 32 | issue = 2 | pages = 236&ndash;243
}}</ref> This algorithm does not consider the problem of producing a coherent STFT, but it does allow finding the sound signal that has an STFT that is as close as possible to the modified STFT even if the modified STFT is not coherent (does not represent any signal).  -->
振幅変更後のSTFT表現から音響信号を再構築する最適な方法は、{{lang|en|{{harvtxt|Griffin|Lim|1984}}}} により提案された<ref>
{{lang|en|{{citation
 | last1     = Griffin  | first1 = D.
 | last2     = Lim      | first2 = J.
 | year      = 1984
 | title     = Signal Estimation from Modified Short-Time Fourier Transform
 | url       = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1164317
 | journal   = IEEE Transactions on Acoustics, Speech and Signal Processing
 | volume    = 32 | issue = 2 | pages = 236&ndash;243
 | doi       = 10.1109/TASSP.1984.1164317
}}}}</ref>。このアルゴリズムはコヒーレントSTFT生成の問題を考慮していないが、たとえ変更済みSTFTがコヒーレントでなくとも(いかなる信号も表現していなくとも)、それに可能な限り近いSTFTに相当する音響信号の発見を可能にする。

<!-- The problem of the vertical coherence remained a major issue for the quality of time scaling operations until 1999 when Laroche and Dolson<ref>
{{citation
 | author1   = Jean Laroche
 | author2   = Mark Dolson
 | year      = 1999
 | title     = Improved Phase Vocoder Time-Scale Modification of Audio
 | url       = http://ieeexplore.ieee.org/iel4/89/16428/00759041.pdf
 | journal   = IEEE Transactions on Speech and Audio Processing
 | volume    = 7 | issue = 3 | pages = 323&ndash;332
 | doi       = 10.1109/89.759041
}}</ref> proposed a rather simple means to preserve phase consistency across spectral bins. The proposition of Laroche and Dolson has to be seen as a turning point in phase vocoder history. It has been shown that by means of ensuring vertical phase consistency  very high quality time scaling transformations can be obtained.  -->
垂直コヒーレンスの問題は、タイム・スケーリング操作の品質に関する大きな問題として、{{lang|en|{{harvtxt|Laroche|Dolson|1999}}}}<ref>
{{lang|en|{{citation
 | last1     = Laroche | first1 = Jean
 | last2     = Dolson  | first2 = Mark
 | year      = 1999
 | title     = Improved Phase Vocoder Time-Scale Modification of Audio
 | url       = http://ieeexplore.ieee.org/iel4/89/16428/00759041.pdf
 | journal   = IEEE Transactions on Speech and Audio Processing
 | volume    = 7 | issue = 3 | pages = 323&ndash;332
 | doi       = 10.1109/89.759041
}}}}</ref> が周波数ビン間の位相的整合性を保つずっと簡単な方法を提案するまで、(問題が)残り続けた。LarocheとDolsonの提案は、フェーズボコーダの歴史的転換点と看做すべきだろう。垂直位相の整合性保証により、非常に高品質なタイムスケーリング変換が得られることが示されている。

<!-- The algorithm proposed by Laroche did not allow preservation of horizontal phase coherence for sound onsets (note onsets). A solution for this problem has been proposed by Roebel.<ref>
{{citation
 | author    = Axel Röbel (IRCAM)
 | year      = 2003
 | title     = A new approach to transient processing in the phase vocoder
 | url       = http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf
 | journal   = DAFx-03: Proc. of the 6th Int. Conference on Digital Audio Effects
}}</ref> -->
Larocheが提案したアルゴリズムは、発声(あるいはノート発音)の瞬間の水平位相コヒーレンスの維持は不可能だった。この問題の解決策は{{lang|en|{{harvtxt|Röbel|2003}}}} が提案した<ref>
{{lang|en|{{citation | last      = Röbel | first      = Axel (IRCAM) | year      = 2003 | title      = A new approach to transient processing in the phase vocoder | url      = http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf | journal      = DAFx-03: Proc. of the 6th Int. Conference on Digital Audio Effects | archiveurl      = https://web.archive.org/web/20040617224423/http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf | archivedate      = 2004年6月17日 | deadurldate      = 2017年9月 }}}}</ref><!--
-->
<!-- An example of software implementation of phase vocoder based signal transformation using means similar to those described here to achieve high quality signal transformation is [[Ircam]]'s SuperVP.<ref>
{{citation
 | title     = SuperVP (Software)
 | url       = http://anasynth.ircam.fr/home/english/software/supervp
 | work      = Analyse-Synthèse
 | publisher = IRCAM
}} </ref> {{Citation needed|date=July 2011}} -->
。ここで説明したRöbel提案までの音質改善策を施した、フェーズボコーダ・ベースの信号変換のソフトウェア実装例として、{{lang|en|[[IRCAM]] SuperVP}}を挙げることができる
<ref>
{{lang|en|{{citation
 | title     = SuperVP (Software)
 | url       = http://anasynth.ircam.fr/home/english/software/supervp
 | publisher = Analysis-Synthesis Team, IRCAM
}}}}<br />
	{{lang|en|SuperVP (Super Phase Vocoder)}}は、{{lang|en|[[IRCAM]]}} で使われている 拡張フェーズボコーダ であり、その提供機能は、[[タイムストレッチ]]、[[ピッチシフト]]、フィルタリング、[[クロスシンセシス]]、[[音源分離]]とリミックス、構成成分のトリートメントと再構築、[[ノイズ除去]]、等。
	{{lang|en|[http://anasynth.ircam.fr/home/english/software/audiosculpt AudioSculpt]}}のカーネルとして、{{lang|en|[[Sinusoidal modeling]]}}を提供する{{lang|en|[http://anasynth.ircam.fr/home/english/software/pm2 Pm2]}}ライブラリと共に使用されている。</ref><!-- resolved in January 2013: {{Citation needed|date=July 2011}} -->。

== 音楽での使用 ==

<!-- British composer [[Trevor Wishart]] used phase vocoder analyses and transformations of a human voice as the basis for his composition [[VOX 5]] (part of his larger [[VOX Cycle]]).<ref>
{{citation
 | author    = Trevor Wishart
 | year      = 1988
 | title     = The Composition of Vox 5
 | url       = http://www.jstor.org/stable/3680150
 | journal   = Computer Music Journal
 | volume    = 12 | issue = 4 | date = Winter, 1988
 | pages     = 21&ndash;27
 | jstor     = 3680150
}}</ref> [[Transfigured Wind]] by American composer [[Roger Reynolds]] uses the phase vocoder to perform time-stretching of flute sounds.<ref>
{{citation
 | degree    = Ph.D.
 | author    = Xavier Serra
 | year      = 1989
 | title     = A System for Sound Analysis/Transformation/Synthesis based on a Deterministic plus Stochastic Decomposition
 | page      = 12
 | work      = PhD thesis
 | publisher = Stanford University
 | id        = {{citeseerx|10.1.1.76.2306}}
}}</ref> -->
イギリスの作曲家 {{lang|en|{{仮リンク|トレヴァー・ウィシャート|en|Trevor Wishart}}}}は、人間の声のフェーズボコーダ分析/変換に基づいて、“''{{lang|en|{{仮リンク|Vox V}}}}''” (アルバム “''{{lang|en|{{仮リンク|Vox Cycle|en|Vox Cycle}}}}''”) を制作した<ref>
{{lang|en|{{citation
 | last      = Wishart | first = Trevor
 | year      = 1988
 | title     = The Composition of Vox 5
 | url       = http://www.jstor.org/stable/3680150
 | journal   = Computer Music Journal
 | volume    = 12 | issue = 4 | date = Winter, 1988 | pages = 21&ndash;27
 | jstor     = 3680150
}}}}</ref>。アメリカの作曲家 [[ロジャー・レイノルズ]]の作品 “''{{lang|en|{{仮リンク|Transfigured Wind}}}}''” は、フェーズボコーダをフルート音のタイムストレッチに使用した<ref>
{{lang|en|{{citation
 | degree    = Ph.D.
 | last      = Serra | first = Xavier
 | authorlink= Xavier Serra
 | year      = 1989
 | title     = A System for Sound Analysis/Transformation/Synthesis based on a Deterministic plus Stochastic Decomposition
 | page      = 12
 | work      = PhD thesis
 | publisher = Stanford University
 | id        = {{citeseerx|10.1.1.76.2306}}
}}}}</ref>。

<!-- The proprietary [[Auto-Tune]] pitch-correcting software, widely used in commercial music production, is based on the phase vocoder principle.{{Citation needed|date=December 2009}} -->
商用音楽制作で広く活用されている[[プロプライエタリソフトウェア|プロプライエタリ]]なピッチ修正ソフトウェア「{{lang|en|[[オートチューン|Auto-Tune]]}}」も、フェーズボコーダの動作原理に基づいているとみなされている<ref>Mary Bellis, [http://inventors.about.com/od/astartinventions/a/Who-Invented-Auto-Tune.htm Who Invented Auto-Tune?, Harold Hildebrand aka Dr Andy Hildebrand Invented Auto-Tune], About.com, 2014年7月26日閲覧.</ref><ref name="diaz 2009">Joe Diaz, [http://ocw.mit.edu/courses/music-and-theater-arts/21m-380-music-and-technology-contemporary-history-and-aesthetics-fall-2009/projects/MIT21M_380F09_proj_mtech_3.pdf The Fate of Auto-Tune], [[マサチューセッツ工科大学]], 2009.</ref>。

== 関連項目 ==

* [[音声分析合成]]
* [[周波数スペクトル]]
* [[スペクトログラム]]

== 注釈 ==
{{脚注ヘルプ}}
<references group="注釈"/>

== 出典 ==
<references />

== 参考文献 ==
* {{lang|en|{{citation
   | last      = Dudley | first = Homer
   | authorlink= :en:Homer Dudley
   | year      = 1939
   | title     = The vocoder
   | periodical= Bell Labs Record
   | volume    = 17 | pages = 122&ndash;126
  }}}}

== 外部リンク ==
{{wikibookslang|en|MATLAB Programming/Phase Vocoder and Encoder|フェーズボコーダとエンコーダ}}
* {{lang|en|{{citation
   | <!-- last      = Dolson | first = Mark -->
   | title     = The Phase Vocoder: A Tutorial
   | url       = http://www.panix.com/~jens/pvoc-dolson.par
  }}}} {{en icon}} &mdash; フェーズボコーダに関するチュートリアル
* {{lang|en|{{citation
   | <!-- last1     = Laroche | first1 = Jean | last2 = Dolson | first2 = Mark -->
   | title     = New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects
   | url       = http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf
   | <!-- journal   = Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics -->
  }}}} {{en icon}} &mdash; [ピッチシフト、ハーモナイジング、その他のエキゾティックなエフェクトのための新しいフェーズボコーダ・テクニック]
<!-- * [http://www.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf A new Approach to Transient Processing in the Phase Vocoder] -->
* {{lang|en|{{citation
   | <!-- last1     = Grondin | first1 = François | last2 = Vakili | first2 = Arash | last3 = Demers | first3 = Laurier -->
   | chapter   = Phase Vocoder
   | chapterurl= http://www.guitarpitchshifter.com/algorithm.html#33
   | title     = Guitar Pitch Shifter
  }}}} {{en icon}} &mdash; フェーズボコーダの解説(図・式入り)
;ソフトウェア
* {{lang|en|{{citation
   | <!-- last1     = Parikh | first1 = Ravi | last2 = Poppen | first2 = Keegan -->
   | url       = https://web.archive.org/web/20110709004823/http://decabear.com/awesomebox.html
   | title     = AwesomeBox
  }}}} {{en icon}} &mdash; オープンソースのピッチ修正ソフト

{{DEFAULTSORT:ふええすほこおた}}
<!-- {{Speech synthesis}} -->{{音声合成}}
<!-- [[Category:Signal processing]] -->[[Category:信号処理]]
<!-- [[Category:Speech synthesis]]  -->[[Category:音声合成]]