Voice Prompts

Voice generation in PrePrompt takes a written line and speaks it in a chosen character voice. The “prompt” is really two things: the text the voice says and the performance directions that shape how it’s said.

Good written dialogue is not the same as good spoken dialogue. A line that reads fine on the page can sound stiff, rushed, or monotone when delivered. This page covers how to write and direct lines so they land.

Write for the ear, not the eye

Read every line out loud before generating. If you stumble, the model will stumble. Contractions, short sentences, and natural sentence rhythm sound better than formal prose.

Stiff: “I do not believe that I will be attending the meeting tomorrow afternoon.”

Natural: “I don’t think I’m going tomorrow.”

The same applies to numbers and symbols. Spell them out.

Misread: “It’s going to cost $1,250.”

Read right: “It’s going to cost one thousand two hundred and fifty dollars.”

Voice models often mispronounce digit strings, especially for dates, phone numbers, currencies, and units. If you want the number spoken a specific way, write it that way.

Punctuation is pacing

The most reliable tool you have is punctuation. Every voice model respects it.

Periods create full stops — a clear beat between sentences.
Commas add short breaks inside a sentence.
Ellipses (…) create a trailing-off effect.
Em dashes (—) signal an abrupt cut or shift.
Question marks shape rising intonation.
Exclamation points push energy and volume.

If a line needs space to land, add a period and break the sentence. If a line needs to rush, drop the comma and let the words run together.

Performance direction

On top of the text, you can give performance direction — bracketed cues that the model interprets as delivery instructions. Think of them as stage directions for the voice.

Emotion: [sad], [angry], [happily], [nervous], [excited], [curious], [tender], [reflective], [wistful].

Delivery style: [whispers], [shouts], [quietly], [loudly], [slowly], [quickly], [matter-of-fact], [conversational], [sarcastic], [serious tone].

Performance beats: [laughs], [sighs], [gasps], [clears throat], [pause], [long pause].

Placement matters:

A tag at the start of a line sets the tone for everything that follows.
A tag mid-sentence shifts the delivery at that point.
You can layer tags for compound direction.

Example

[whispers] I think someone's in the house. [pause] Stay quiet... [gasps] Did you hear that? [shouts] RUN!

One line, four distinct performance shifts. Each tag tells the model what changes.

Writing monologues

Long passages without emotional variation produce flat, monotone output — regardless of which voice you use. Break them up.

A five-minute monologue that reads as one continuous block will sound like one continuous block. A five-minute monologue broken into beats with shifting performance direction feels alive.

Flat:

I've been running from this for twenty years. Twenty years of looking over my shoulder waiting for the day it catches up. Today's the day.

Directed:

[serious tone] I've been running from this for twenty years. [pause] Twenty years of looking over my shoulder, [quietly] waiting for the day it catches up. [sighs] Well... [long pause] [matter-of-fact] Today's the day.

Same words. Very different delivery.

Compatible versus conflicting tags

Tags layer well when they aren’t contradictory.

Works: [whispers][nervous], [excited][quickly], [sad][slowly], [sarcastic][matter-of-fact].

Doesn’t work: [whispers][shouts], [quickly][slowly], [happily][sad].

If two tags pull in opposite directions, the model picks one — unpredictably. Stay coherent.

Pauses

Pauses create breath, weight, and comedic timing. Use them deliberately.

[pause] — a short beat
[long pause] — a noticeable silence, landing a moment

Or use punctuation: ellipses and em dashes naturally introduce micro-pauses.

Keeping a character’s voice consistent

Every Actor in PrePrompt can be assigned a voice. Once assigned, generate every line for that character with the same voice, not a new one each time. Consistent voice ID plus consistent voice settings equals consistent character.

To change emotional delivery, use audio tags — not voice settings. [angry] on the same voice keeps the character’s identity intact. Re-tuning voice settings mid-project changes the character.

If a character genuinely needs a different voice (aged up, injured, a different era), create a new voice variant rather than editing the original.

Accents and character voices

You can nudge the voice toward an accent or vocal character with tags:

[British accent], [Southern drawl], [French accent], [pirate voice], [old man voice].

These are approximations — models vary in how strongly they hold an accent, and performance beats can pull the voice back toward its base. If an accent is central to the character, pick a voice that already has it rather than forcing one with tags.

What to avoid

Digit strings where the reading matters — spell them out.
Symbols — ”$”, ”%”, ”@” often read incorrectly. Write “dollars,” “percent,” “at.”
Abbreviations the model might not know — “Dr.” may read as “doctor” or “drive” depending on context. Spell them out for safety.
Walls of text with no performance direction — you’ll get monotone.
Conflicting tags — pick one primary delivery.
Too many tags per line — three or four is plenty. Twenty breaks the line.

Audio Node Generating voiceover and dialogue in the pipeline.

Actors and Wardrobe Assigning a voice to a character and keeping it consistent.

Image Prompts Writing the visual prompts your voice lines play over.

Timeline Editor Placing voice lines on the audio track and syncing to frames.

FAQ

Do I have to use bracketed tags? No. Punctuation and sentence rhythm do most of the work on their own. Tags let you push delivery in specific directions when plain text isn’t enough.

Why does my character sound robotic reading a phone number? The model is pronouncing digits as digits instead of speech. Write “555 2340” as “five five five, two three four zero” or “fifty-five, fifty-five, twenty-three, forty.”

My monologue sounds flat even though the text is emotional. Why? Long passages without tag shifts or sentence-level variation default to monotone. Break the line into shorter sentences, add performance direction at key beats, and let punctuation pace the delivery.

Can two characters share the same voice? Technically yes, but they’ll sound identical — which undercuts the scene. Assign each named character a distinct voice early and reuse it consistently.

How do I get precise timing for a specific pause? Use [pause] or [long pause]. For tight sync with a visual beat, generate the line, then nudge it on the Timeline — the audio track snaps to frame boundaries.

Will the same line read the same way twice? Close, but not identical. Voice generation is stochastic. If you need a specific take, generate two or three and pick the one that lands.

← Back to preprompt.studio