Speech Recognition and Text-to-Speech

Phone Password Manager supports both speech-to-text (speech recognition) as well as text-to-speech (TTS). Speech recognition converts spoken words to text, and TTS can playback text information as spoken words.

To support these functions, Phone Password Manager requires a speech engine to be installed on the system. Phone Password Manager uses Microsoft Speech API as the programming interface, and supports SAPI versions 5.1+. However, all SAPI-compliant speech engines can be utilized by Phone Password Manager.

Speech recognition is provided by the Speech Service, which is installed during a complete installation or if selected during a custom installation.

The Speech Service can only be installed once on a Phone Password Manager server. If you install a second instance of Phone Password Manager on the same server, then the Speech Service will be unable to run on the new instance.

When speech recognition is enabled, users can enunciate their profile IDs, new password values, and perform key recovery strings without having to use the numeric keypad.

To set up speech recognition:

On the Phone Password Manager server, copy psynch.speech.psl and speech.psl from the samples* directory to the \<instance>\script\ directory.
Modify the idtel.cfg file, located in the <instance>\service\ directory, by changing ScriptName as follows:
```
ScriptName = "psynch.speech.psl"
```

Configure the Speech Service.

Modify idtel.cfg by changing the SpeechService Dll line as follows:

For local Speech Service, specify speechapi.dll :

SpeechService "" = { 
  Dll = "speechapi.dll" 
  //Server = <server> 
  //Port = <port> 
  //Timeout = <timeout> 
 }

For remote Speech Service, specify speechapix.dll :

SpeechService "" = { 
  Dll = "speechapix.dll" 
  Server = <speech service server name or IP address> 
  Port = <speech service port> 
 }

Restart the Phone Password Manager service.

Speech recognition is now configured and ready to use. To test speech recognition, place a phone call to the IVR server and try to use speech instead of the numeric keypad to enter your details.

Configuring the speech service

The Speech Service can be configured using the following options, which are located in the idtel.cfg file:

Option	Description
VoiceActivityDetectThreshold	Controls the sensitivity of the input threshold for the Speech Service. The range of possible values for this option is between -54 and +3; the default value is -40. Lowering the numeric value lowers the input threshold, which increases the sensitivity of the Speech Service. Raising the numeric value raises the input threshold, which decreases the sensitivity of the Speech Service. For example, a value of -54 recognizes even the quietest sounds, whereas a value of +3 only recognizes louder sounds.
SpeechRecognitionMode	Controls which speech recognition mode is used. Possible values: 0 – enables "File based mode", which creates a file in the temp directory before processing the audio file for speech recognition. 1 – enables "Stream mode", which does not create a file, but simply analyzes the stream of audio for speech recognition. This was the only mode available in releases before Bravura Security Fabric version 8.0. By default, stream mode is enabled.
KeepIntermediateSpeechFiles	Controls whether or not to save the audio files created when SpeechRecognitionMode is set to "File based mode." Possible values: 0 – files are not saved; they are deleted after speech recognition is complete. 1 – files are saved in the temp directory: C:\Documents and Settings\psadmin\Local Settings\temp

Option

Description

VoiceActivityDetectThreshold

Controls the sensitivity of the input threshold for the Speech Service. The range of possible values for this option is between -54 and +3; the default value is -40. Lowering the numeric value lowers the input threshold, which increases the sensitivity of the Speech Service. Raising the numeric value raises the input threshold, which decreases the sensitivity of the Speech Service. For example, a value of -54 recognizes even the quietest sounds, whereas a value of +3 only recognizes louder sounds.

SpeechRecognitionMode

Controls which speech recognition mode is used. Possible values:

0 – enables "File based mode", which creates a file in the temp directory before processing the audio file for speech recognition.

1 – enables "Stream mode", which does not create a file, but simply analyzes the stream of audio for speech recognition. This was the only mode available in releases before Bravura Security Fabric version 8.0.

By default, stream mode is enabled.

KeepIntermediateSpeechFiles

Controls whether or not to save the audio files created when SpeechRecognitionMode is set to "File based mode." Possible values:

0 – files are not saved; they are deleted after speech recognition is complete.

1 – files are saved in the temp directory: C:\Documents and Settings\psadmin\Local Settings\temp

Building .wav files using SAPI

Use the voicebuild program to create audio .wav files based on a vocal script .txt file using SAPI.

View voicebuild usage information .

In this section:

Speech Recognition and Text-to-Speech

Configuring the speech service

Building .wav files using SAPI

Search results