French
TTS Voices
French text-to-speech voices with phrase-level prosody
From text to talk.
Pick your path.
Call our TTS & STT endpoints directly, wire voice into LiveKit rooms with one plug-in, or spin up an AI assistant on a real phone number.
TTS & STT Endpoints
Production-grade streaming and batch TTS/STT. Low latency, 50+ languages, customizable voices, and SDKs for Node/Python/Browser.
- ›Streaming for live apps
- ›Multi-speaker diarization & punctuation
- ›SDKs, code samples, and latency benchmarks
Sends text to the TTS endpoint and saves the synthesized audio as an MP3 file.
LiveKit Plug-in
Plug our real-time speech pipeline into LiveKit rooms — transcribe live sessions, synthesize responses and stream audio back into the room.
- ›One-line install, example room demo
- ›WebRTC + server bridge patterns
- ›Works in browser & mobile
Connects to a LiveKit room and attaches real-time TTS/STT — transcribes audio in, synthesizes audio out.
AI-Assistants (Phone)
Deploy a phone-number based AI assistant in minutes — inbound/outbound calls, IVR, call recording, and DTMF support.
- ›Purchase & map a phone number
- ›Templates: Support Bot, Sales Assistant, Reminder Bot
- ›PSTN reliability & compliance tools
Creates an AI assistant bound to a phone number with inbound call handling, recording, and DTMF support.
Spanish voices
294TTS voicesEspañol
French voices
98TTS voicesFrançais
German voices
82TTS voicesDeutsch
Indonesian voices
31TTS voicesBahasa Indonesia
Italian voices
51TTS voicesItaliano
Japanese voices
85TTS voices日本語
Korean voices
171TTS voices한국어
Portuguese voices
277TTS voicesPortuguês
Russian voices
34TTS voicesРусский
Chinese voices
189TTS voices中文
French phonology and prosody
Vowels that travel through the nose
French has nasal vowels[1]: /ɛ̃/, /ɑ̃/, /ɔ̃/: produced with airflow through both the mouth and the nasal cavity. English has no equivalent phonemes. The words "vin," "bon," and "un" each carry a distinct nasal vowel that changes meaning if denasalized. Combined with tenser articulation and more extreme lip rounding[2] on vowels like /y/ in "tu," French demands a vowel space English-trained models simply don't map. Synthesizing these sounds accurately requires models that run where the audio is rendered: not piped across providers that flatten the nasal-oral distinction in transit.
Rhythm without a downbeat
English is stress-timed[1]: strong and weak syllables alternate, and unstressed vowels collapse toward schwa[2]. French runs closer to syllable-timed[3], distributing duration more evenly across every syllable. Where English "I don't want to GO" hammers one word and swallows the rest, French "Je ne veux pas y aller" keeps each syllable roughly equal in weight[4]. A TTS system built on English stress-timed assumptions will impose strong-weak patterning that sounds immediately wrong. Even rhythm at this precision requires inference co-located with audio processing, with no hops to introduce timing artifacts.
Stress locked to the phrase edge
In English, stress is lexical: it falls on different syllables and distinguishes words[1] ("REcord" vs. "reCORD"). French stress is predictable and phrase-final[2], landing on the last full syllable of each prosodic group. It marks boundaries, not meanings. French vowels also maintain their quality in unstressed positions[3] rather than reducing: an /o/ stays /o/ regardless of where stress falls. Voice infrastructure that handles French needs to track phrase-level grouping and place prominence at the edge, running synthesis and telephony in one stack so prosodic boundaries survive intact.