Fish Audio S2 Pro Long-Form TTS Lab

Build 1-60 minute multilingual TTS tests with inline prosody control, optional voice cloning, and Space-ready long-form chunking.

  • Model: fishaudio/s2-pro
  • Source runtime: Fish Speech
  • Strengths surfaced here: 80+ languages, free-form [tag] controls, low-latency oriented presets, long-form narration planning
Generation preset
1 60
Long-form mode
Common control tag

Preset note: Recommended default for multi-minute multilingual narration and medium-length exports.

Long-form plan

Add text to preview sectioning, timing, and guidance.

256 2048
96 384
0.1 1
0.9 2
0.1 1
0 500

Control tips

  • Put style tags directly into the script: [whisper], [excited], [professional broadcast tone]
  • Use Smart sections for multi-minute and hour-scale narration so long passages are synthesized in stable chunks
  • Use a clean 5-10 second reference clip plus an exact transcript for the best cloning behavior
  • One-hour exports will create many sections and can take a long time on hosted GPUs
  • First run will be slower because the runtime downloads checkpoints and warms the model

Runtime compatibility

Checking GPU fit...

Result summary

Generate audio to see render details.

Multilingual examples