Kokoro spoke "983-4969" as "nine hundred eighty-three dash forty-nine sixty- nine". Added SpokenKokoroTTSService which normalizes text just before synthesis (run_tts gets the full sentence): US phone patterns and 4-5 digit runs (street numbers, zips) are spoken one digit at a time, country code dropped, no "dash"/ parens. Dates and times are left natural. Deterministic, so it's robust to whatever the model emits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
24 KiB
24 KiB