Speech-to-Text Conversion
You can test API methods on the interactive API browser page and without writing code
Some of the resources described on the page may not be available by default due to the type of application (see Creating and authorizing applications).
There are two types of speech-to-text conversion:
Voice navigation rule (voice_helper) in the voice menu.
Sending recognized text by events during a conversation.
It operates only when speech_to_text function is activated in the client configuration (this setting may be configured by a platform administrator only).
Voice Navigation Rule in Voice Menu
Before defining voice navigation rules, a platform administrator must turn on the ability to use the voice_helper rule in the voice menu. Otherwise, the rule cannot be created, and if this functionality is disabled, the rule in the voice menu settings will be ignored. The API application permissions must be all (they are also granted by the platform administrator). The setting is configured similarly to other voice menu rules in the context options; the quantity of rules is also not limited. During recognition, the final recognition result and guesses are available (the guesses are available if recognition is interrupted by a dialing or by a timeout).
Description of Data Structures
Name | Type | Description |
---|---|---|
action | string | voice_helper: to set the voice navigation rule. |
sound | integer | The identifier of the sound file to be played may be got using the resource: GET /client/{client_id}/sound/ |
voice_helper_digits_max | integer | Quantity of extension dialing digits, which speech recognition ends after. If at the moment before the required digit quantity is set, the final recognition result is not available, we will use assumptions |
voice_helper_rules | string | Forwarding rules list in the form of an array {"to_option": option_number, "transcription": "recognized word" } |
voice_helper_timeout | integer | The time, after which speech recognition ends, in milliseconds. If during this time the final result is not available, we will use assumptions. The minimum value is 3000 |
voice_helper_classic_term | boolean | Enable or disable classic extension dialing from the keyboard (it is disabled by default and used only as recognition interrupt, see the option voice_helper_digits_max). |
max_pause_between_words | integer | The maximum allowed pause between the client's words. Specified in milliseconds. Values:
|
language | string | The language to be used for speech recognition. If the language is already selected in the client settings, this language will be installed (only if it is in the available list of languages). Available values:
|
voice_helper_long_speech_expected | boolean | It is needed for more accurate translation from voice to text. If clients are expected to use short phrases/words, it is better not to set it to True Otherwise, if customers will use long phrases/words, set it to True |
Creation of Voice Menu Rule
For example, a call comes to an extension number with extension_id 204 in context 1 on the “start“ option. We will assume that the context is empty and there are no rules in it yet. We add a rule and send a request:
System response (other parameters are irrelevant and are excluded from the example):
{
…
"voice_helper_sound": 52,
"voice_helper_timeout": 7000,
"id": 39,
"voice_helper_digits_max": "2",
"voice_helper_rules": [
{
"transcription": "hi|hello|whatsup",
"to_option": 1
},
{
"transcription": "bye|goodbye",
"to_option": 2
}
],
"final": true,
"action": "voice_helper",
…
}
The request response will contain the created rule identifier: "id": 39. As a result of the above request, a voice navigation rule will be created, according to which the default melody will be played.
You must create the to_option options (they are not automatically created) where the words from the transcription will go during the message. The transcription option can contain either an exact word or a part of a word.
Creation of Sound Greeting
If you need to set a specific audio greeting to inform in it that a particular word is expected from the caller, you need to prepare a sound file in advance. The resource "Sound Files” will allow you to download a file and find out its identifier. Then you need to update the voice helper rule parameters by specifying the identifier of the required sound file.
Updating Voice Helper Rule Settings
You can update any rule parameter using the method
PUT /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/{option_digits}/rules/{rule_id}
For example, to set an up-to-date greeting with the required file identifier SOUND_ID, you should send the above request with the body{"sound": SOUND_ID}
You can update the same way any parameter of the rule voice_helper. For example, you can replace the rules voice_helper_rules with the request
PUT /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/{option_digits}/rules/{rule_id} with its body
{"voice_helper_rules": [
{
"to_option": 1,
"transcription": "food|meal"
},
{
"to_option": 2,
"transcription": "comics|comic|mix"
}
]
}
It is not recommended to set many conditions in one rule voice_helper_rules (preferably no more than 500).
The option to_option is the context options (start, invalid, timeout, 1, 2, 3, 4, etc.). The rule voice_helper is created in the option start and the options 1-10 (or, for example, 4-40) will be used for voice navigation.
The options are created by therequest
POST /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/ with its body
{"digits": "string"}
where string is the context options (start, invalid, timeout, 1, 2, 3, 4, etc.).
Getting Recognized Data by Remote Server
It is possible to get recognition data on your remote server.
The "Call Interactive" function allows, as an action of the context option voice menu (IVR), to initiate an HTTP request to the specified URL and process the response to it. With the request, a permanent set of parameters is transmitted that contains information about the call in the IVR. To control actions after recognition, there are optional request options of the function "Call Interactive":
voice_navigator_DTMF: extension dialing from the telephone terminal during the voice menu rule action='voice_helper';
voice_navigator_STT: contains voice recognition during the voice menu rule action='voice_helper'.
The "Call interactive" function with the POST request http://mysite.com/myscript?check_number returns the desired greeting with additional options besides TTS (play_now="false", save_to_var="true"). In this case, voice_helper with the specified option play_sound_from_variable ignores the greeting set in it.
The call enters the starting context (start) where, in addition to the standard context options (start, timeout, invalid), custom options are configured, for example: “1” – ‘call_interactive’, “2” – ‘voice_helper'. The system waits for the caller to say something or to dial an option (this is declared in the rule voice_helper). For example, if the caller says "operator", the caller will enter to option "0", and if the caller says any of the specified words (for example: "know, date, ready, readiness, shipped, shipment, goods, invoice") or dials 1 the caller will go to option "1", where the rule "Call Interactive" will work, according to which a POST request http://mysite.com/myscript?check_stt_res will be sent to the server, after which the server will receive the data: voice_navigator_STT=%D1%85%D0%BE%D1%87%D1%83+%D1%83%D0%B7%D0%BD%D0%B0%D1%82%D1%8C+%D0%B4%D0%B0%D1%82%D1%83+%D0%B3%D0%BE%D1%82%D0%BE%D0%B2%D0%BD%D0%BE%D1%81%D1%82%D0%B8+%D0%BA+%D0%BE%D1%82%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B5+%D1%82%D0%BE%D0%B2%D0%B0%D1%80%D0%B0
url decode voice_navigator_STT=I want to know the date the goods ready to be shipped
or voice_navigator_DTMF=1
If no option in the rule voice_helper was not activated (neither "0", nor "1" had been dialed), then the default greeting sounds in the start context prompting to connect with the operator ("Say "operator" or press "0"). In this case, you can add "Call interactive" with the request POST http://mysite.com/myscript?no_option_voice_helper, where voice_navigator_STT, whether it even is there, contains the value that is alternative to "0" and "1" (for example, if the caller asks: "Where did I get to?"). After that, you can set other actions both by controlling from "Call Interactive" and by static rules in IVR.
Events with Recognized Text
To receive events with the final recognized text during a conversation, you need to use the following scheme. On an event in an extension number (dial-in for incoming calls in IVR, answer for incoming and outgoing calls from an extension number of the "phone terminal" type), depending on the event CallFlow, you need to remember the extension_id (the extension number identifier: it is CalledExtensionID for in and CallerExtensionID for out) and CallAPIID and use the resource:
PUT /extension/{extension_id}/speech_to_text/{call_api_id}
Description of Data Structures
Name | Type | Description |
---|---|---|
extension_id | string | Identifier of the extension number |
call_api_id | string | The identifier of the call to begin speech recognition |
action | string | Action, may be start or stop |
direction | string | Direction of the recognized speech relative to extension_id: out if the voice goes from the extension number, in if the voice goes to the extension number |
url | string | URL to send events with speech-recognized text |
If you try again the same action with the same direction for the same conversation, you will get an error message!
The events look like this:
You can read the content length from Content-Length.
The recognized text in readable form is after url decode utf8.