ElevenLabs Text to Speech: A Complete Beginner's Tutorial

ElevenLabs is one of the best free AI voice generators available today. With its Text to Speech tool you can turn any written text into a realistic human-sounding voiceover — no microphone, no recording setup, no experience required. This tutorial walks you through the whole process from a blank page to a downloaded audio file.

What you need: A free ElevenLabs account. You get 10,000 characters per month on the free plan, which is plenty to get started.

1. Open the Text to Speech tool

Go to elevenlabs.io/app/speech-synthesis/text-to-speech and sign in. You’ll land on the main Text to Speech editor.

The interface has two main areas:

Left panel — navigation for all of ElevenLabs’ tools (Voices, Studio, Sound Effects, etc.)
Right panel — the Settings sidebar where you configure voice, model, speed, and output format
Center — the large text input area where you type or paste your script

2. Type or paste your text

Click anywhere in the large center area and start typing — or paste text you’ve already written. You can enter up to 5,000 characters per generation on the free plan.

Some quick tips for better results:

Use punctuation naturally. Commas, periods, and question marks tell the AI where to pause and raise its tone.
Write in full sentences. Fragments and bullet points can sometimes produce unnatural pacing.
Use the built-in prompts (“Narrate a story”, “Tell a silly joke”, “Record an advertisement”) at the bottom if you want inspiration.

3. Choose a voice

On the right-hand Settings panel, click the Voice row (it shows the current voice name). This opens the voice library.

You’ll see dozens of voices organised by name and description. Each entry shows a brief style label like “Deep, Resonant and Comforting” or “Easygoing and Effortless”. You can:

Search by name using the search bar at the top
Filter by Language, Accent, Category, Gender, or Age using the filter chips
Preview any voice by clicking the play icon (▶) on the right side of its row
Select a voice by clicking its name — a checkmark appears when it’s active

For most YouTube videos, podcasts, and explainers, voices like Adam (neutral American male), Charlotte (warm British female), or Daniel (radio news host) work well as starting points.

4. Adjust the settings

Once you’re back on the main Settings panel you’ll see several sliders and options. Here’s what each one does:

Model — Eleven Multilingual v2 is the default and works great for most content. If you need maximum expressiveness, click “Try Eleven v3” (uses more credits).

Speed — Controls how fast the voice speaks. The default (centre position) is natural. Slide right for faster delivery, left for slower and more deliberate speech.

Stability — Higher stability means a more consistent, even tone. Lower stability adds more variation and emotion. Keep it above 30% to avoid distortion (ElevenLabs will warn you if it drops too low).

Similarity — How closely the AI sticks to the original voice characteristics. Higher values sound more “locked in”; lower values give a slightly more natural variation.

Style Exaggeration — Amplifies the expressive style of the voice. Useful for dramatic narration; keep it low for neutral business content.

Output Format — MP3 44.1 kHz (128kbps) is the default and works for almost everything. Leave this unless you need a specific format.

Speaker Boost — A small quality enhancement. Leave it on.

5. Generate and download your audio

When your text and settings are ready, click the Generate speech button at the bottom right of the screen.

Generation typically takes 3–10 seconds depending on the length of your text. When it finishes, an audio player appears at the bottom of the screen showing:

The first few words of your text as a label
The voice name and timestamp
A play/pause button and scrubber so you can preview the result
A Download button (↓) on the right to save the MP3 to your computer
A Share button to copy a shareable link
A Regenerate speech button if you want a slightly different take with the same settings

Click the download icon and your MP3 will save to your computer, ready to drop into a video editor, podcast platform, or wherever you need it.

Tips for getting the best results

Regenerate if it sounds off. ElevenLabs adds slight variation each time, so clicking Regenerate often gives a better take without changing any settings.
Use punctuation to control pacing. An em dash (—) creates a longer pause than a comma. Ellipses (…) create a trailing effect.
Check your credit balance. The counter in the bottom bar shows remaining characters for the month. Long scripts use more credits — split them into shorter chunks if needed.
Save to History. Every generation is saved in the History tab (top of the Settings panel) so you can re-download past outputs without regenerating.
Try different voices for the same script. Preview and switch voices without retyping anything — your text stays in the editor while you browse.

What to do next

Once you’re comfortable with basic Text to Speech, ElevenLabs has more tools worth exploring:

Voice Changer — Transform your own recorded voice into a different AI voice in real time
Sound Effects — Generate custom background audio from a text description
Dubbing — Automatically translate and re-voice video content into other languages
Speech to Text — Transcribe audio files into text with high accuracy