OpenJTalkを使ってみる

OpenJTalkをubuntu10.04にインストールして使ってみる。

依存パッケージのopenrtmのインストール

http://www.openrtm.org/openrtm/ja/node/932
上記URLからインストールスクリプトをダウンロードして、実行
[bash]
wget http://www.openrtm.org/OpenRTM-aist/download/install_scripts/pkg_install_python_ubuntu.sh
chmod +x pkg_install_python_ubuntu.sh
sudo ./pkg_install_python_ubuntu.sh
[/bash]

openjtalkのインストール

[bash]
sudo apt-add-repository ppa:openhri/ppa
sudo apt-get install openhrivoice
[/bash]

使い方

open_jtalk [option] [infile]
[bash]
#!/bin/sh
VOICE=/usr/lib/hts-voice/nitech-jp-atr503-m001
echo $1 | open_jtalk
-td $VOICE/tree-dur.inf
-tf $VOICE/tree-lf0.inf
-tm $VOICE/tree-mgc.inf
-md $VOICE/dur.pdf
-mf $VOICE/lf0.pdf
-mm $VOICE/mgc.pdf
-df $VOICE/lf0.win1
-df $VOICE/lf0.win2
-df $VOICE/lf0.win3
-dm $VOICE/mgc.win1
-dm $VOICE/mgc.win2
-dm $VOICE/mgc.win3
-ef $VOICE/tree-gv-lf0.inf
-em $VOICE/tree-gv-mgc.inf
-cf $VOICE/gv-lf0.pdf
-cm $VOICE/gv-mgc.pdf
-k $VOICE/gv-switch.inf
-x /usr/lib/open_jtalk/dic/utf-8
-ow out.wav
-ot out.log
[/bash]
これをtalk.shとかで保存してchmod +x talk.shして、./talk.sh “OK牧場”とかすると音声合成されたwavファイルと解析内容のログファイルが出力される。
このファイルをaplay out.wavとして再生するとよい。

つかった感じ

さっそく、ぷち音声入力チャットの出力側をfestivalから変えてみたが、festivalと比べるとかなり聞き取りやすい。まぁ当たり前か。
試しに、open_jtalkで出力した音声を、マイクで拾って、juliusに音声入力してあげたところ「こんにちは」とか「OK牧場」とか自分の名前とかくらいは大丈夫だった(｀･ω･´)

あと、juliusもopen_jtalkも基本的にディスプレイのないマシンで動かしているので、sshで接続して音声出力やらマイク入力の選択はできないものかと探したところ alsamixer で端末上でGUIっぽく操作することができました。
juliusが毎回入力デバイスねーよ、ってエラーを吐くので困ってましたがばっちりです。

open_jtalk オプションとか
[bash]
NAIST Japanese dictionary
mecab-naist-jdic version 0.6.1-20090630 (http://naist-jdic.sourceforge.jp/)
Copyright (C) 2009 Nara Institute of Science and Technology
All rights reserved.

open_jtalk – An HMM-based text to speech system

usage:
open_jtalk [ options ] [ infile ]
options: -x dir : dictionary directory -td tree : decision trees file for state duration -tf tree : decision trees file for Log F0 -tm tree : decision trees file for spectrum -md pdf : model file for state duration -mf pdf : model file for Log F0 -mm pdf : model file for spectrum -df win : window files for calculation -dm win : window files for calculation -ow s : filename of output -ot s : filename of output trace information -s i : sampling frequency -p i : frame period (point) -a f : all-pass constant -g i : gamma = -1 / i (if i=0 then gamma=0) -b f : postfiltering coefficient -l : regard input as log -u f : voiced/unvoiced threshold -ef tree : decision tree file for GV of Log F0 -em tree : decision tree file for GV of spectrum -cf pdf : filename of GV for Log F0 -cm pdf : filename of GV for spectrum -jf f : weight of GV for Log F0 -jm f : weight of GV for spectrum -k tree : use GV switch -z i : audio buffer size infile:
text file note:
option ‘-d’ may be repeated generated spectrum and log F0 sequences endian, binary (float) format.
[/bash] [ def][ min–max]
[ N/A]
[ N/A]
[ N/A]
[ N/A]
[ N/A]
[ N/A]
[ N/A]
delta of Log F0 [ N/A]
delta of spectrum [ N/A]
wav audio (generated speech) [ N/A]
[ N/A]
[16000][ 1–48000]
[ 80][ 1–]
[ 0.42][ 0.0–1.0]
[ 0][ 0– ]
[ 0.0][-0.8–0.8]
gain and output linear one (LSP) [ N/A]
[ 0.5][ 0.0–1.0]
[ N/A]
[ N/A]
[ N/A]
[ N/A]
[ 0.7][ 0.0–2.0]
[ 1.0][ 0.0–2.0]
[ N/A]
[ 1600][ 0–48000]
[stdin]
to use multiple delta parameters.
are saved in natural

参考１ open_jtalk

参考２ alsamixer

alsamixerとamixer、GUIのミキサー – 試験運用中なLinux備忘録

依存パッケージのopenrtmのインストール

openjtalkのインストール

使い方

つかった感じ

参考１ open_jtalk

参考２ alsamixer

One thought to “OpenJTalkを使ってみる”