Introduction to SSML
Speech Synthesis Markup Language (SSML) gives you precise control over how text-to-speech (TTS) engines pronounce your content. While TTS engines do a reasonable job with plain text, SSML lets you fine-tune pronunciation, pacing, emphasis, and prosody to create natural, professional-sounding speech output.
Why Use SSML?
Plain text works for simple cases, but real-world applications often need more control:
- Ambiguous pronunciations — Is "read" past or present tense? Is "1/2" spoken as "one half" or "January second"?
- Domain-specific terms — Product names, acronyms, and technical terms often need pronunciation hints.
- Natural pacing — Pauses between sentences, slower speech for important information, or faster delivery for disclaimers.
- Emphasis and tone — Highlighting key words or adjusting prosody to convey meaning.
- Mixed content — Spelling out codes, reading phone numbers digit-by-digit, or handling currency and dates correctly.
SSML addresses all of these scenarios with a standardized, portable markup language.
Standards Compliance
Capacity Private Cloud TTS supports SSML 1.1 as defined by the W3C. This means your SSML markup is portable and follows industry-standard conventions. For a complete list of supported elements, see SSML Elements.
Basic Structure
SSML documents are XML with a root <speak> element. Here's a minimal example:
<speak> Hello, and welcome to our service. </speak>
Within the <speak> element, you can mix plain text with SSML tags to control synthesis.
Common Use Cases
Adding pauses:
Use <break> to insert silence, giving listeners time to absorb information:
<speak> Your account balance is $1,250.00. <break time="500ms"/> Would you like to make a payment? </speak>
Controlling pronunciation:
Use <say-as> to specify how content should be interpreted:
<speak> Your confirmation code is <say-as interpret-as="characters">ABC123</say-as>. Please call us at <say-as interpret-as="telephone">+1-800-555-1234</say-as>. </speak>
Adding emphasis:
Use <emphasis> to stress important words:
<speak> This action <emphasis level="strong">cannot</emphasis> be undone. </speak>
Adjusting speech rate and pitch:
Use <prosody> for fine-grained control over delivery:
<speak> <prosody rate="slow">Please listen carefully to the following terms.</prosody> <prosody rate="fast" pitch="-10%">Terms and conditions apply. See website for details.</prosody> </speak>
Custom pronunciation with phonemes:
Use <phoneme> when the TTS engine mispronounces a word:
<speak> Welcome to <phoneme alphabet="ipa" ph="ˈkwɪrki">Quirky</phoneme> Software. </speak>
SSML in VoiceXML Applications
SSML integrates naturally with VoiceXML 2.0. You can embed SSML within <prompt>, <audio>, <choice>, and <enumerate> elements:
<prompt>
<speak>
Your appointment is scheduled for
<say-as interpret-as="date" format="mdy">03/15/2026</say-as>
at <say-as interpret-as="time">2:30pm</say-as>.
</speak>
</prompt>Quick Reference
| Element | Purpose |
<speak> | Root element for all SSML content |
<break> | Insert a pause (by time or strength) |
<say-as> | Specify how to interpret content (digits, date, currency, etc.) |
<emphasis> | Add stress to words or phrases |
<prosody> | Control rate, pitch, and volume |
<phoneme> | Specify exact pronunciation using phonetic alphabet |
<sub> | Substitute spoken text for written abbreviations |
<voice> | Switch between different voices |
<audio> | Insert pre-recorded audio |
For complete element documentation including attributes and examples, see SSML Elements.
Best Practices
- Start simple — Use plain text first, then add SSML only where needed. Over-marking text can make it harder to maintain.
- Test with real listeners — Synthetic speech can sound different than expected. Test your prompts with actual users.
- Use
<say-as>for data — Dates, times, currency, and codes are common sources of mispronunciation. Always mark them explicitly. - Be consistent — If you pronounce a product name a certain way in one prompt, use the same markup everywhere.
- Consider localization — Different languages have different text normalization rules. See our language-specific guides for details.
Related Documentation
- SSML Elements — Complete reference for all supported elements and attributes
- W3C SSML 1.1 Specification — The official standard
