SRGS grammars¶
The CPQD ASR accepts speech recognition grammars in text file format, following the SRGS format.
Grammar files can be stored in a local folder on the CPQD ASR server, or on a WEB server, using HTTP or HTTPS. No matter what the chosen method, when running recognition, the path of the grammar file must be informed.
As an example, create the /opt/grammar/ directory on the CPQD ASR server:
$ sudo mkdir -p /opt/grammar
$ sudo chmod 777 /opt/grammar
$ cd /opt/grammar
Next, create the grammar, using the SRGS ABNF pizza.gram format:
#ABNF 1.0 UTF-8;
tag-format <semantics/1.0>;
root $pedido;
$pedido = [$gostaria] $pizza [$por_favor] { out = rules.pizza };
$por_favor = por favor | por gentileza;
$gostaria = [eu] (quero | queria | gostaria de) [uma];
$pizza = [pizza | de | pizza de] $sabor { out = rules.sabor };
$sabor = calabresa | queijo | vegetariana;
This grammar will recognize sentences such as these:
eu quero pizza de calabresa por gentileza
pizza calabresa por gentileza
pizza vegetariana por favor
queijo
queria calabresa
queria de vegetariana por gentileza
queria pizza queijo
queria pizza vegetariana por gentileza
quero pizza vegetariana por gentileza
vegetariana por gentileza
Next, we will need a WAV audio file, containing a sentence like those in the above example. The CPQD ASR provides a few audio files we can use in our test. In the case of Brazilian Portuguese, the files can be found in /opt/cpqd/asr/samples/audio/ptbr
. In our test, let’s use the file pizza_veg_audio_8k.wav
with the sentence ‘eu quero uma pizza vegetariana’ (“I want a vegetarian pizza”). In this case, we are using an audio file with an 8 kHz sample rate, and we need to have an acoustic model for Portuguese and 8 kHz audio.
For recognition, we can use the Linux curl
command. Observe how the test result shows the recognized sentence:
$ curl --header "Content-Type: audio/wav" \
--data-binary "@/opt/cpqd/asr/samples/audio/ptbr/pizza_veg_audio_8k.wav" \
"http://127.0.0.1:8025/asr-server/rest/recognize?lm=file:///opt/grammar/pizza.gram"
[{"alternatives":[{"text":"eu quero uma pizza vegetariana","interpretations":["pizza_vegetariana"],"words":[{"text":"eu","score":93,"start_time":3.0829227,"end_time":3.1989636},{"text":"quero","score":97,"start_time":3.208283,"end_time":3.4463},{"text":"uma","score":86,"start_time":3.4472868,"end_time":3.5715106},{"text":"pizza","score":96,"start_time":3.6009455,"end_time":3.990146},{"text":"vegetariana","score":100,"start_time":4.03105,"end_time":4.8281574}],"score":94,"lm":"file:///opt/cpqd/asr/samples/grammar/ptbr/pizza.gram","interpretation_scores":[94]}],"segment_index":0,"last_segment":true,"final_result":true,"start_time":2.91,"end_time":5.12,"result_status":"RECOGNIZED"}]
Now, let’s test an invalid entry, in other words, a sentence not contained in the grammar. Let’s use the pizza_pedra_audio_8k.wav
file, where the speaker orders a stone pizza. Run the same curl
command:
$ curl --header "Content-Type: audio/wav" \
--data-binary "@/opt/cpqd/asr/samples/audio/ptbr/pizza_pedra_audio_8k.wav" \
"http://127.0.0.1:8025/asr-server/rest/recognize?lm=file:///opt/grammar/pizza.gram"
[{"alternatives":[],"segment_index":0,"last_segment":true,"final_result":true,"start_time":2.9,"end_time":5.04,"result_status":"NO_MATCH"}]
This time, the command returns a NO_MATCH result, indicating that the audio was not recognized by the grammar.
When the grammar is stored on a WEB server, the grammar URL must be informed, like so: http://webcontent.com/grammar/pizza.gram
.
The CPQD ASR also accepts SRGS grammars in XML format. By convention, such grammars are stored in files with the .grxml extension. In the above examples, we used the pizza.gram grammar in ABNF format; however, we could have used the equivalent pizza.grxml grammar, listed below, with the exact same results.
<?xml version="1.0" encoding="utf8" standalone="no" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
mode="voice"
root="pedido"
tag-format="semantics/1.0"
version="1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd">
<rule id="pedido">
<item repeat="0-1">
<item>
<ruleref uri="#gostaria"/>
</item>
</item>
<ruleref uri="#pizza"/>
<item repeat="0-1">
<item>
<ruleref uri="#por_favor"/>
</item>
</item>
<tag> out = rules.pedido </tag>
</rule>
<rule id="por_favor">
<one-of>
<item>por favor</item>
<item>por gentileza</item>
</one-of>
</rule>
<rule id="gostaria">
<item repeat="0-1">
<item>eu</item>
</item>
<one-of>
<item>quero</item>
<item>queria</item>
<item>gostaria de</item>
</one-of>
<item repeat="0-1">
<item>uma</item>
</item>
</rule>
<rule id="pizza">
<item repeat="0-1">
<one-of>
<item>pizza</item>
<item>de</item>
<item>pizza de</item>
</one-of>
</item>
<ruleref uri="#sabor"/>
<tag> out = rules.sabor </tag>
</rule>
<rule id="sabor">
<one-of>
<item>calabresa</item>
<item>queijo</item>
<item>vegetariana</item>
</one-of>
</rule>
</grammar>
Note
The ASR library uses the file extension during the process of determining the grammar format (ABNF or XML). Files with the .gram extension are always read as ABNF grammars, and files with the .grxml extension are always read as XML grammars. If the input file comes with any other extension, the ASR library will try to infer the grammar content by means of its content. However, it is always the best practice to use the conventional extensions.