Voice and vocoder models for larynx based on the SIWIS.
Used in Rhasspy in the rhasspy-tts-larynx-hermes service.
$ larynx \
--model /path/to/tts-checkpoint.pth.tar \
--vocoder-model /path/to/vocoder-checkpoint.pth.tar \
--output-file /path/to/output.wav \
'Merci beaucoup!'
Run a web server at http://localhost:5002
$ docker run -it -p 5002:5002 \
--device /dev/snd:/dev/snd \
rhasspy/larynx:fr-siwis-1
Endpoints:
/api/tts
- returns WAV audio for textGET
with?text=...
POST
with text body
/api/phonemize
- returns phonemes for textGET
with?text=...
POST
with text body
/process
- compatibility endpoint to emulate MaryTTSGET
with?INPUT_TEXT=...
- Type: Glow-TTS
- Sample rate: 22050 Hz
- Frequency range: 0-8000 Hz
See configuration for details.
- Type: Multi-band MelGAN
- Sample rate: 22050 Hz
- Frequency range: 0-8000 Hz
See configuration for details.
Some files are split into multiple parts so that they can be uploaded to GitHub. This is done with the split
command:
split -d -b 25M FILE FILE.part-
They can be recombined simply with:
cat FILE.part-* > FILE