Step 5: Audio
Configure voice (style, pacing, language, accent) and background music for your video.
What is the Audio step?
The Audio step controls everything you hear in your video — both the spoken narration and the background music. Toggle voice on or off, choose a language, accent (for English), delivery style, and pacing. Then add AI-generated background music by describing it, and balance the music volume against the voice. This step is optional — defaults produce a natural voiced video with subtle music.
- Enable Voice
- Toggle spoken audio on or off. Off creates a silent or music-only video.
- Voice Source
- Character-Tied uses a voice matching your selected character. This is the default.
- Language
- The spoken language: English, Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Mandarin Chinese, Korean, Arabic, or Russian.
- Accent
- For English, choose an accent variant (e.g., American). The accent option is hidden for other languages.
- Delivery Style
- The tone and energy: Conversational, Energetic, Calm, Authoritative, or Whispery.
- Pacing
- Speaking speed: Slow, Normal, or Fast.
- Music
- AI-generated background music. Describe the music you want in the prompt box, or switch it off for no music.
- Volume Mix
- Balance between background music and speech. Default is 30% music — optimized for speech clarity.
Configuring Audio
- 1
Set the voice
Leave Enable Voice on (default) for spoken audio tied to your character, or toggle it off for a music-only video.
- 2
Choose language and accent
Pick the spoken language from 12 options. For English you can also pick an accent.
- 3
Pick delivery style and pacing
Choose the tone (Conversational, Energetic, Calm, Authoritative, Whispery) and speed (Slow, Normal, Fast).
- 4
Describe your music
In the Add Music & Mix section, describe the background music you want (e.g., 'upbeat acoustic with light percussion'), or turn music off.
- 5
Balance the mix
Use the volume sliders to mix music against speech. The 30% default works well; nudge louder for social content, quieter for educational content.
Note
Voice and music are generated natively by the AI video models as part of the video — there's no separate audio pipeline, so the mix is coherent in one pass.
Tip
Leaving the music prompt empty is fine — the AI falls back to subtle background music matched to your volume setting.
Related Guides
Was this article helpful?