Microsoft Text To Speech Engine

Mar 03, 2009 The Microsoft Speech SDK 5.1 adds Automation support to the features of the previous version of the Speech SDK. You can now use the Win32 Speech API (SAPI) to develop speech applications with Visual Basic ®, ECMAScript and other Automation languages. ZiraPro sounds so much more natural than Robotic Anna but it doesn't work when I click the Speech button in Microsoft Word. When I choose Hazel, it reads as Anna. When I choose ZiraPro, it's completely silent. Both voices read the preview text in Control Panel's 'Text to Speech' options properly. – ADTC Aug 14 '15 at 3:17.

Microsoft Text To Speech Engine Download
Text-to-speech Engines For Microsoft Supported Languages
Microsoft Tts Engine
Speech Recognition And Text-to-speech Engines For Microsoft Supported Languages
Microsoft Text To Speech Engines
Microsoft Text To Speech Engine 4.0 Download

-->

Microsoft Speech API 5.3

This is the documentation for Microsoft Speech API (SAPI) 5.3, the native API for Windows.

These are interfaces, structures, and enumerations that have been added for the SAPI 5.3 release:

New SAPI 5.3 Interfaces
New SAPI 5.3 Enumerations
New SAPI 5.3 Structures

This topic also includes conceptual material that describes and explains the new scenarios that SAPI 5.3 supports:

W3C Speech Synthesis Markup Language
W3C Speech Recognition Grammar Specification
Semantic Interpretation

New Managed API for Speech

Microsoft Text To Speech Engine Download

Windows Vista includes a new .NET namespace, System.Speech, that allows developers to speech-enable applications, especially those based on the Windows Presentation Foundation. Authors of managed applications can use this in addition to, or as an alternative to SAPI. For more information, see the System.Speech.* namespaces in the Windows SDK Class Library. They are:

New SAPI 5.3 Interfaces

The new interfaces in SAPI 5.3 are:

Interface Name

ISpEnginePronunciation

ISpEventSource2

ISpGrammarBuilder2

ISpPhoneticAlphabetConverter

ISpPhoneticAlphabetSelection

ISpPhrase2

ISpPrivateEngineCallEx

ISpRecoContext2

ISpRecognizer2

ISpRecoGrammar2

ISpRecoResult2

ISpSerializeState

ISpShortcut

ISpSRAlternates2

ISpSREngine2

ISpSREngineSite2

ISpXMLRecoResult

ISpeechResourceLoader

ISpeechRecoResultDispatch

ISpeechXMLRecoResult

New SAPI 5.3 Enumerations

The new enumerations in SAPI 5.3 are:

Text-to-speech Engines For Microsoft Supported Languages

Enum Name

DISPID_SpeechXMLRecoResult

PHONETICALPHABET

SPADAPTATIONRELEVANCE

SPADAPTATIONSETTINGS

SPCOMMITFLAGS

SPGRAMMAROPTIONS

SPMATCHINGMODE

SPPRONUNCIATIONFLAGS

SPSHORTCUTTYPE

SPXMLRESULTOPTIONS

SpeechEmulationCompareFlags

New SAPI 5.3 Structures

The new structures in SAPI 5.3 are:

Structure Name

SPEVENTEX

SPNORMALIZATIONLIST

SPRULE

SPSEMANTICERRORINFO

SPSHORTCUTPAIR

SPSHORTCUTPAIRLIST

W3C Speech Synthesis Markup Language

SAPI 5.3 supports the W3C Speech Synthesis Markup Language (SSML) version 1.0, which is defined at http://www.w3.org/TR/speech-synthesis. SSML provides the ability to markup voice characteristics, speed, volume, pitch, emphasis, and pronunciation, so that developers can make TTS sound more natural in their applications.

In addition to SSML, SAPI 5.3 continues to support the proprietary SAPITTS markup language for annotating text for TTS rendering. SSML and SAPITTS have a fairly close mapping - close enough that most SSML can be transformed into SAPITTS. Indeed, this is what SAPI does when it receives SSML, so that underlying TTS engines that have been built for SAPITTS do not need to also support SSML.

SAPI does not support new DDI for TTS engines to accept SSML.

W3C Speech Recognition Grammar Specification

SAPI 5.3 supports the definition of context-free grammars using the W3C Speech Recognition Grammar Specification (SRGS), with these two important constraints:

Microsoft Tts Engine

It does not support the use of SRGS to specify DTMF (touch-tone) grammars.
It only supports the expression of SRGS as XML - not as augmented BNF (ABNF).

SRGS is defined at http://www.w3.org/TR/speech-grammar.

In addition to SRGS, SAPI 5.3 continues to support the proprietary SAPI CFG XML format for specifying a grammar.

Speech Recognition And Text-to-speech Engines For Microsoft Supported Languages

Semantic Interpretation

SAPI 5.3 enables an SRGS grammar to be annotated with semantic information, so that a recognition result may contain not only the recognized text but also the semantic interpretation of that text. For example, the recognized text of a yes/no grammar might be 'yes', 'yeah' or 'yep', but the semantic meaning of all of these is 'yes'. This makes it easier for applications to consume recognition results, as well as empowering grammar authors to provide a full spectrum of possible utterances without burdening the developer with the interpretation task.

The annotation of semantic information within SRGS can be either of the following:

Microsoft Text To Speech Engines

A string literal containing the semantic value.
A Jscript statement that ultimately returns a string containing the semantic value.

Microsoft Text To Speech Engine 4.0 Download

In addition to the annotation of SRGS, SAPI also provides results that contain not only the recognized text but also the semantic information as a hierarchy of name-value pairs.