Methods¶
This section provides details for the existing ASR REST API methods.
RECOGNIZE¶
Recognizes speech in audios sent to it. The audio content must be send in the body of the HTTP message, and can be RAW (16 bit Linear PCM with a sample rate of 8 kHz or 16 kHz, according to the installed AM) or encoded. Supported encoded formats are: MP3, OPUS, VORBIS, PCM aLaw/uLaw, GSM, FLAC and WAV. Recognition is performed synchronously and the result returned in the HTTP response.
Request
POST /asr-server/rest/recognize
HTTP Headers
Accept |
(Optional) Content type of the recognition results. Valid values:
Default value: application/json. |
User-Agent |
ID of the device and/or application generating the audio. Useful for application log purposes. |
Content-Length |
Indicates the number of bytes in the content. |
Content-Type |
Indicates the format of the streamed audio. Valid formats:
|
Speech recognition can be configured to adjust to the specific characteristics of the application. These settings are configured using parameters defined as headers of the HTTP request. The complete list of parameters is shown in the section Configuration. The following example shows the configuration of the endpointer.levelThreshold
and decoder.confidenceThreshold
:
POST /asr-server/rest/recognize?lm=builtin:slm/general HTTP/1.1
Host: 127.0.0.1:8025
User-Agent: curl/7.47.0
Accept: */*
Content-Type: audio/wav
endpointer.levelThreshold: 10
decoder.confidenceThreshold: 30
Content-Length: ...
[binary content]
Request parameters
lm |
Language model URI. If not entered, an error will be returned. The URI must present one of the following prefixes:
|
Result (HTTP status = 200)
If the HTTP request returns a ‘200’ status code, the body of the response will have the following structure. The format of the response can be defined by the HTTP header ‘Accept’, selecting JSON (Accept: application/json) or XML (Accept: application/xml). The default format is JSON.
recognition_result
alternatives
result_status
alternatives: a list of the likely recognition results
index: results alternatives index
text: recognition text
score: confidence score or rate
interpretations: a list of the interpretation results, as defined in the grammar In the case of free speech models, the list is empty.
result_status: recognition status It can be one of the following values:
result_status
Description
RECOGNIZED
recognition completed successfully
NO_MATCH
recognition completed successfully but no matches found in the grammar
NO_INPUT_TIMEOUT
the recognition tool was unable to detect the start of the speech before the timer ran out
EARLY_SPEECH
the streamed audio did not have an initial stretch of silence (the speech started before recognition started)
MAX_SPEECH
the server receive more audio than it was able to process
RECOGNITION_TIMEOUT
no final result able to be generated before the timer ran out
NO_SPEECH
unable to detect any speech in the streamed audio
CANCELED
recognition canceled
FAILURE
unknown server error
Result (HTTP status <> 200)
If the HTTP request returns an error with a status code other than ‘200’, the body of the response will have the following structure.
ErrorResponse
code: Error code (Error codes).
message: Complementary message explaining the reason for the failure.
Examples
REST call with JSON result:
curl -X POST \
--header "Content-Type: audio/wav" \
--header "decoder.maxSentences: 1" \
--data-binary '@/opt/cpqd/asr/samples/audio/ptbr/87431_8k.wav' \
http://127.0.0.1:8025/asr-server/rest/recognize?lm=builtin:grammar/digits
Result:
[{
"alternatives": [{
"text": "oito sete quatro três um",
"interpretations": ["87431"],
"words": [{
"text": "oito",
"score": 100,
"start_time": 0.3901262,
"end_time": 0.95921874
}, {
"text": "sete",
"score": 100,
"start_time": 0.99,
"end_time": 1.7068747
}, {
"text": "quatro",
"score": 100,
"start_time": 1.74,
"end_time": 2.28
}, {
"text": "três",
"score": 100,
"start_time": 2.2800765,
"end_time": 2.8498626
}, {
"text": "um",
"score": 100,
"start_time": 2.9167604,
"end_time": 3.2101758
}],
"score": 100,
"lm": "builtin:grammar/digits",
"interpretation_scores": [100]
}],
"segment_index": 0,
"last_segment": true,
"final_result": true,
"start_time": 0.24,
"end_time": 3.52,
"result_status": "RECOGNIZED"
}]
REST call with XML result:
curl -X POST \
--header "Content-Type: audio/wav" \
--header "Accept: application/xml" \
--header "decoder.maxSentences: 1" \
--data-binary '@/opt/cpqd/asr/samples/audio/ptbr/87431_8k.wav' \
http://127.0.0.1:8025/asr-server/rest/recognize?lm=builtin:grammar/digits
Result:
<ArrayList>
<item>
<segment_index>0</segment_index>
<last_segment>true</last_segment>
<final_result>true</final_result>
<start_time>0.24</start_time>
<end_time>3.52</end_time>
<result_status>RECOGNIZED</result_status>
<alternatives>
<alternative>
<text>oito sete quatro três um</text>
<score>100</score>
<lm>builtin:grammar/digits</lm>
<interpretations>
<interpretation>87431</interpretation>
</interpretations>
<interpretation_scores>
<interpretation_score>100</interpretation_score>
</interpretation_scores>
<words>
<word>
<text>oito</text>
<score>100</score>
<start_time>0.3901258</start_time>
<end_time>0.95921737</end_time>
</word>
<word>
<text>sete</text>
<score>100</score>
<start_time>0.99</start_time>
<end_time>1.7068772</end_time>
</word>
<word>
<text>quatro</text>
<score>100</score>
<start_time>1.74</start_time>
<end_time>2.28</end_time>
</word>
<word>
<text>três</text>
<score>100</score>
<start_time>2.2800772</start_time>
<end_time>2.8498623</end_time>
</word>
<word>
<text>um</text>
<score>100</score>
<start_time>2.9167345</start_time>
<end_time>3.210177</end_time>
</word>
</words>
</alternative>
</alternatives>
</item>
</ArrayList>
Result with error (JSON):
{
"code":"ERR_LM_NOT_FOUND",
"message":"Language Model not found: builtin:grammar/booh"
}
Result with error (XML):
<ErrorResponse>
<code>ERR_LM_NOT_FOUND</code>
<message>Language Model not found: builtin:grammar/booh</message>
</ErrorResponse>
INTERPRET¶
Performs semantic interpretation of a text supplied by the client, using the indicated grammar, like RECOGNITION. The text must be sent in the HTTP message body. Recognition is performed synchronously and the result returned in the HTTP response.
Request
POST /asr-server/rest/interpret
HTTP Headers
Accept |
(Optional) Content type of the recognition results. Valid values:
Default value: application/json. |
User-Agent |
ID of the device and/or application generating the audio. Useful for application log purposes. |
Content-Length |
Indicates the number of bytes in the content. |
Content-Type |
Indicates the format of the streamed audio. Valid formats:
|
Request parameters
lm |
Language model URI. If not entered, an error will be returned. The URI must present one of the following prefixes:
|
Result
The recognition result is an object with the same structure of the “recognize’ but only some of the fields make sense and should be used:
recognition_result
alternatives
result_status
alternatives: a list of the likely recognition results
text: recognition text
score: confidence score or rate
interpretations: a list of the interpretation results, as defined in the grammar In the case of free speech models, the list is empty.
result_status: recognition status It can be one of the following values:
result_status
Description
RECOGNIZED
recognition completed successfully
NO_MATCH
recognition completed successfully but no matches found in the grammar
CANCELED
recognition canceled
FAILURE
unknown server error
Examples
REST call with JSON result:
curl -X POST \
--header "Content-Type: text/plain" \
--data 'oito sete quatro três um' \
http://127.0.0.1:8025/asr-server/rest/interpret?lm=builtin:grammar/digits
Result:
{
"alternatives": [{
"text": "oito sete quatro três um",
"interpretations": ["87431"],
"score": 100,
"lm": "builtin:grammar/digits",
"interpretation_scores": [100]
}],
"result_status": "RECOGNIZED"
}
REST call with XML result:
curl -X POST \
--header "Content-Type: application/xml"
--data 'oito sete quatro três um' \
http://127.0.0.1:8025/asr-server/rest/interpret?lm=builtin:grammar/digits
Result:
<recognition_result>
<result_status>RECOGNIZED</result_status>
<alternatives>
<alternative>
<text>oito sete quatro três um</text>
<score>100</score>
<lm>builtin:grammar/digits</lm>
<interpretations>
<interpretation>87431</interpretation>
</interpretations>
<interpretation_scores>
<interpretation_score>100</interpretation_score>
</interpretation_scores>
</alternative>
</alternatives>
</recognition_result>