^{You can test API methods on the}^{interactive API browser}^{page and without writing code}
^{Some of the resources described on the page may not be available by default due to the type of application (see}^{Creating and authorizing applications}^).

There are two types of speech-to-text conversion:

Voice navigation rule (voice_helper) in the voice menu.
Sending recognized text by events during a conversation.

It operates only when speech_to_text function is activated in the client configuration (this setting may be configured by a platform administrator only).

Table of Contents

Voice Navigation Rule in Voice Menu

Before defining voice navigation rules, a platform administrator must turn on the ability to use the voice_helper rule in the voice menu. Otherwise, the rule cannot be created, and if this functionality is disabled, the rule in the voice menu settings will be ignored. The API application permissions must be all (they are also granted by the platform administrator). The setting is configured similarly to other voice menu rules in the context options; the quantity of rules is also not limited. During recognition, the final recognition result and guesses are available (the guesses are available if recognition is interrupted by a dialing or by a timeout).

Description of Data Structures

Name	Type	Description
action	string	voice_helper: to set the voice navigation rule. The main options for this action to work:  sound, voice_helper_digits_max,  voice_helper_rules,  voice_helper_timeout, voice_helper_classic_term, voice_helper_final_count
sound	integer	The identifier of the sound file to be played may be got using the resource: GET /client/{client_id}/sound/ When the file is playing, the caller speech recognition begins that ends by pressing the terminator button #, dialing the extension dialing digits (voice_helper_digits_max), or by timeout (voice_helper_timeout)
voice_helper_digits_max	integer	Quantity of extension dialing digits, which speech recognition ends after. If at the moment before the required digit quantity is set, the final recognition result is not available, we will use assumptions
voice_helper_rules	string	Forwarding rules list in the form of an array {"to_option": option_number, "transcription": "recognized word" } This context option must be created, otherwise it is skipped during the call. The option is not checked during the rule creation. The option is created by the POST rule /extension/{extension_id}/ivr/context/{context_id}/options/ The transcription can be of the form "hi\|hello" (words are listed with "\|", spaces are stripped or must be absent, the character case does not matter. It is not necessary to indicate the entire word (if the word or its part specified in the transcription is contained in the recognized word, this will be considered a match)
voice_helper_timeout	integer	The time, after which speech recognition ends, in milliseconds. If during this time the final result is not available, we will use assumptions. The minimum value is 3000
voice_helper_classic_term	boolean	Enable or disable classic extension dialing from the keyboard (it is disabled by default and used only as recognition interrupt, see the option voice_helper_digits_max). The logic is similar to the extension dialing of the IVR rule Play sound, but before going to the option or to the extension number, voice_helper_rules are checked and, if there are matches, then they will be used for transition, not the extension number (the saying, not dialing, has priority, although the dialing ends recognition of the saying)

voice

max_

helper

pause_

final

between_

count

words

integer

Quantity of attempts to communicate by voice what is in voice_helper_rules. The final recognition is determined by a pause in speech (only assumptions are possible until the final recognition). Usually, one to three attempts are configured for final_count

Creation of Voice Menu Rule

For example, a call comes to an extension number with extension_id 204 in context 1 on the “start“ option. We will assume that the context is empty and there are no rules in it yet. We add a rule and send a request:

…

The maximum allowed pause between the client's words. Specified in milliseconds.

Values:

500 - is the minimum
1500 - is the default
10000 - is the maximum

language

string

The language to be used for speech recognition. If the language is already selected in the client settings, this language will be installed (only if it is in the available list of languages).

Available values:

ru - Russian
en - English (default)

voice_helper_long_speech_expected

boolean

It is needed for more accurate translation from voice to text.

If clients are expected to use short phrases/words, it is better not to set it to True

Otherwise, if customers will use long phrases/words, set it to True

Creation of Voice Menu Rule

For example, a call comes to an extension number with extension_id 204 in context 1 on the “start“ option. We will assume that the context is empty and there are no rules in it yet. We add a rule and send a request:

Tabs

[{"content":{"version":1,"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"In the "},{"type":"text","text":"authentication ","marks":[{"type":"strong"}]},{"type":"text","text":"section enter the "},{"type":"text","text":"Application_ID","marks":[{"type":"strong"}]},{"type":"text","text":" and "},{"type":"text","text":"Application_Secret","marks":[{"type":"strong"}]},{"type":"text","text":" values obtained during the application registration in the "},{"type":"text","text":"client_id ","marks":[{"type":"strong"}]},{"type":"text","text":"and "},{"type":"text","text":"client_secret","marks":[{"type":"strong"}]},{"type":"text","text":" fields, then click on the \""},{"type":"text","text":"Try it out!","marks":[{"type":"strong"}]},{"type":"text","text":"\" button. If the application data is correct, an access token ("},{"type":"text","text":"access_token","marks":[{"type":"strong"}]},{"type":"text","text":") will be returned in response. The received access token should be entered into the \""},{"type":"text","text":"Access Token Field","marks":[{"type":"strong"}]},{"type":"text","text":"\" at the top of the page and saved by clicking the \""},{"type":"text","text":"Set Token","marks":[{"type":"strong"}]},{"type":"text","text":"\" button."}]},{"type":"table","attrs":{"isNumberColumnEnabled":false,"layout":"default"},"content":[{"type":"tableRow","content":[{"type":"tableHeader","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"Method","marks":[{"type":"strong"}]}]}]},{"type":"tableHeader","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"Resource","marks":[{"type":"strong"}]}]}]},{"type":"tableHeader","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"Data","marks":[{"type":"strong"}]}]}]}]},{"type":"tableRow","content":[{"type":"tableCell","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"PUT"}]}]},{"type":"tableCell","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"/extension/34414/valet_park/3232240860-37b98fb2-a66c-11ea-ae69-6fd86d94a0e0"}]}]},{"type":"tableCell","attrs":{},"content":[{"type":"paragraph","content":[{"type":"text","text":"{ \"action\": \"park\", \"slot\" : \"6\" }"}]}]}]}]}]},"id":"2b6fa931-62c7-4957-b467-148a7e16be1d","label":"Interactive API Browser","type":"tab"},{"content":{"version":1,"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"To send requests using the cURL program, set the values of the "},{"type":"text","text":"url ","marks":[{"type":"strong"}]},{"type":"text","text":"and "},{"type":"text","text":"access_token","marks":[{"type":"strong"}]},{"type":"text","text":" variables:"}]},{"type":"paragraph","content":[{"type":"text","text":"export url=\"https://<hostname>/api/ver1.0\"","marks":[{"type":"code"}]},{"type":"text","text":" "},{"type":"hardBreak"},{"type":"text","text":"(where the hostname — is the hostname of the API-server provider of IP-telephony),"}]},{"type":"paragraph","content":[{"type":"text","text":"export access_token=\"8SNsrS0jV35vfmKqKeKtRrHfpbg4UX\"","marks":[{"type":"code"}]},{"type":"text","text":" "},{"type":"hardBreak"},{"type":"text","text":"(the received access token)."}]},{"type":"paragraph","content":[{"type":"text","text":"Send a request:"}]},{"type":"codeBlock","attrs":{"language":"shell"},"content":[{"type":"text","text":"curl \\\n-H \"Authorization: Bearer ${access_token}\" \\\n-H \"Content-Type: application/json\" \\\n-d '{\n  \"action\": \"park\",\n  \"slot\" : \"6\"\n}' \\\n-X PUT ${url}/extension/34414/valet_park/3232240860-37b98fb2-a66c-11ea-ae69-6fd86d94a0e0"}]}]},"id":"e5ee6aac-d150-436f-8be8-e7351ace227b","label":"cURL Program","type":"tab"},{"content":{"version":1,"type":"doc","content":[{"type":"paragraph","content":[{"type":"text","text":"To send requests in Python, set the values of the variables URL and ACCESS_TOKEN:"}]},{"type":"paragraph","content":[{"type":"text","text":"URL = \"https://<hostname>/api/ver1.0\"","marks":[{"type":"code"}]},{"type":"hardBreak"},{"type":"text","text":"(where the hostname — is the hostname of the API-server provider of IP-telephony),"}]},{"type":"paragraph","content":[{"type":"text","text":"ACCESS_TOKEN = \"8SNsrS0jV35vfmKqKeKtRrHfpbg4UX\"","marks":[{"type":"code"}]},{"type":"hardBreak"},{"type":"text","text":"(the received access token)."}]},{"type":"codeBlock","attrs":{"language":"python"},"content":[{"type":"text","text":"#!/usr/bin/python\n\nimport requests\n\nurl = f'{URL}/extension/34414/valet_park/3232240860-37b98fb2-a66c-11ea-ae69-6fd86d94a0e0'\nheaders = {\n    'Authorization': f'Bearer {ACCESS_TOKEN}',\n    'Content-Type': 'application/json'\n}\ndata = '''{\n    \"action\": \"park\",\n    \"slot\" : \"6\"\n}'''\nresponse = requests.put(url, headers=headers, data=data)\nprint(response.text)"}]}]},"id":"ba4bf101-9305-4828-b50d-ed9609b32f31","label":"In Python3","type":"tab"}]

System response (other parameters are irrelevant and are excluded from the example):

Code Block

language	json

{
…
  "voice_helper_sound": 52,
  "voice_helper_timeout": 7000,
  "id": 39,
  "voice_helper_digits_max": "2",
  "voice_helper_rules": [
    { 
      "transcription": "hi|hello|whatsup",
      "to_option": 1
    },
    {
      "transcription": "bye|goodbye",
      "to_option": 2
    }
  ],
  "final": true,
  "action": "voice_helper",
…
}

The request response will contain the created rule identifier: "id": 39. As a result of the above request, a voice navigation rule will be created, according to which the default melody will be played.

You must create the to_option options (they are not automatically created) where the words from the transcription will go during the message. The transcription option can contain either an exact word or a part of a word.

Creation of Sound Greeting

If you need to set a specific audio greeting to inform in it that a particular word is expected from the caller, you need to prepare a sound file in advance. The resource "Sound Files” will allow you to download a file and find out its identifier. Then you need to update the voice helper rule parameters by specifying the identifier of the required sound file.

Updating Voice Helper Rule Settings

You can update any rule parameter using the method
PUT /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/{option_digits}/rules/{rule_id}

For example, to set an up-to-date greeting with the required file identifier SOUND_ID, you should send the above request with the body
{"sound": SOUND_ID}

You can update the same way any parameter of the rule voice_helper. For example, you can replace the rules voice_helper_rules with the request
PUT /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/{option_digits}/rules/{rule_id} with its body

Code Block

language	json

{"voice_helper_rules": [
    {
      "to_option": 1,
      "transcription": "food|meal"
    },
    {
      "to_option": 2,
      "transcription": "comics|comic|mix"
    }
  ]
}

It is not recommended to set many conditions in one rule voice_helper_rules (preferably no more than 500).

The option to_option is the context options (start, invalid, timeout, 1, 2, 3, 4, etc.). The rule voice_helper is created in the option start and the options 1-10 (or, for example, 4-40) will be used for voice navigation.

The options are created by therequest
POST /api/ver1.0/extension/{extension_id}/ivr/context/{context_id}/options/ with its body

Code Block

language	json

{"digits": "string"}

where string is the context options (start, invalid, timeout, 1, 2, 3, 4, etc.).

Getting Recognized Data by Remote Server

It is possible to get recognition data on your remote server.

The "Call Interactive" function allows, as an action of the context option voice menu (IVR), to initiate an HTTP request to the specified URL and process the response to it. With the request, a permanent set of parameters is transmitted that contains information about the call in the IVR. To control actions after recognition, there are optional request options of the function "Call Interactive":

voice_navigator_DTMF: extension dialing from the telephone terminal during the voice menu rule action='voice_helper';
voice_navigator_STT: contains voice recognition during the voice menu rule action='voice_helper'.

The "Call interactive" function with the POST request http://mysite.com/myscript?check_number returns the desired greeting with additional options besides TTS (play_now="false", save_to_var="true"). In this case, voice_helper with the specified option play_sound_from_variable ignores the greeting set in it.

The call enters the starting context (start) where, in addition to the standard context options (start, timeout, invalid), custom options are configured, for example: “1” – ‘call_interactive’, “2” – ‘voice_helper'. The system waits for the caller to say something or to dial an option (this is declared in the rule voice_helper). For example, if the caller says "operator", the caller will enter to option "0", and if the caller says any of the specified words (for example: "know, date, ready, readiness, shipped, shipment, goods, invoice") or dials 1 the caller will go to option "1", where the rule "Call Interactive" will work, according to which a POST request http://mysite.com/myscript?check_stt_res will be sent to the server, after which the server will receive the data: voice_navigator_STT=%D1%85%D0%BE%D1%87%D1%83+%D1%83%D0%B7%D0%BD%D0%B0%D1%82%D1%8C+%D0%B4%D0%B0%D1%82%D1%83+%D0%B3%D0%BE%D1%82%D0%BE%D0%B2%D0%BD%D0%BE%D1%81%D1%82%D0%B8+%D0%BA+%D0%BE%D1%82%D0%B3%D1%80%D1%83%D0%B7%D0%BA%D0%B5+%D1%82%D0%BE%D0%B2%D0%B0%D1%80%D0%B0
url decode voice_navigator_STT=I want to know the date the goods ready to be shipped
or voice_navigator_DTMF=1

If no option in the rule voice_helper was not activated (neither "0", nor "1" had been dialed), then the default greeting sounds in the start context prompting to connect with the operator ("Say "operator" or press "0"). In this case, you can add "Call interactive" with the request POST http://mysite.com/myscript?no_option_voice_helper, where voice_navigator_STT, whether it even is there, contains the value that is alternative to "0" and "1" (for example, if the caller asks: "Where did I get to?"). After that, you can set other actions both by controlling from "Call Interactive" and by static rules in IVR.

Events with Recognized Text

To receive events with the final recognized text during a conversation, you need to use the following scheme. On an event in an extension number (dial-in for incoming calls in IVR, answer for incoming and outgoing calls from an extension number of the "phone terminal" type), depending on the event CallFlow, you need to remember the extension_id (the extension number identifier: it is CalledExtensionID for in and CallerExtensionID for out) and CallAPIID and use the resource:
PUT /extension/{extension_id}/speech_to_text/{call_api_id}

Description of Data Structures

Name	Type	Description
extension_id	string	Identifier of the extension number
call_api_id	string	The identifier of the call to begin speech recognition
action	string	Action, may be start or stop
direction	string	Direction of the recognized speech relative to extension_id: out if the voice goes from the extension number, in if the voice goes to the extension number
url	string	URL to send events with speech-recognized text

Info
If you try again the same action with the same direction for the same conversation, you will get an error message!

The events look like this:

Code Block

language	html

POST / HTTP/1.1
Host: 213.170.66.158:12345
user-agent: Ringme Event Generator
Accept: 
Accept-Encoding: gzip, deflate
Content-Length: 394
Content-Type: application/x-www-form-urlencoded

call_api_id=3232240860-15e67fb4-8f9a-11ea-90b9-05381d8e3a62&extension_id=34414&direction=out&text=%D0%90+%D0%B2%D0%BE%D0%B4%D0%B5+%D0%BC%D0%B8%D0%BC%D0%BE+%D1%81%D0%B5%D0%B9%D1%87%D0%B0%D1%81+%D0%BB%D0%B5%D1%82+%D1%81%D0%B5%D0%B3%D0%BE+%D0%BA%D0%BE%D0%BC%D0%B0%D1%80%D0%B0+%D0%B7%D0%B0%D0%B1%D0%BE%D1%82%D0%B0+%D0%BB%D1%8E%D0%B4%D0%B8+%D1%87%D0%B0%D1%81%D1%83+%D1%81+15+%D0%B3%D0%BE%D0%B4%D0%B0

You can read the content length from Content-Length.
The recognized text in readable form is after url decode utf8.

"Extension Number” Section Resources

Expand

title	PUT /extension/{extension_id}/speech_to_text/{call_api_id}

PUT /extension/{extension_id}/speech_to_text/{call_api_id}

URL Options

Name	Type
extension_id	string
call_api_id	string

Request Options

Name	Type
action	string
direction	string
url	string

Versions Compared

Old Version 3

New Version Current

Key

Voice Navigation Rule in Voice Menu

Description of Data Structures

Creation of Voice Menu Rule

Creation of Voice Menu Rule

Creation of Sound Greeting

Updating Voice Helper Rule Settings

Getting Recognized Data by Remote Server

Events with Recognized Text

Description of Data Structures

"Extension Number” Section Resources

Page Comparison

Versions Compared

Old Version 3

New Version Current

Key

Voice Navigation Rule in Voice Menu

Description of Data Structures

Creation of Voice Menu Rule

Creation of Voice Menu Rule

Creation of Sound Greeting

Updating Voice Helper Rule Settings

Getting Recognized Data by Remote Server

Events with Recognized Text

Description of Data Structures

"Extension Number” Section Resources