Fish Audio S2 Pro Long-Form TTS Lab

Build 1-60 minute multilingual TTS tests with inline prosody control, optional voice cloning, and Space-ready long-form chunking.

Model: fishaudio/s2-pro
Source runtime: Fish Speech
Strengths surfaced here: 80+ languages, free-form [tag] controls, low-latency oriented presets, long-form narration planning

Narration script

Generation preset

Target narration length (minutes)

1 60

Long-form mode

Smart sections Single pass

Common control tag

Custom tag

Preset note: Recommended default for multi-minute multilingual narration and medium-length exports.

Add text to preview sectioning, timing, and guidance.

Put style tags directly into the script: [whisper], [excited], [professional broadcast tone]
Use Smart sections for multi-minute and hour-scale narration so long passages are synthesized in stable chunks
Use a clean 5-10 second reference clip plus an exact transcript for the best cloning behavior
One-hour exports will create many sections and can take a long time on hosted GPUs
First run will be slower because the runtime downloads checkpoints and warms the model

Checking GPU fit...

Generated audio

Download WAV

Generate audio to see render details.

Multilingual examples