Syntax and available features¶
This chapter describes the grammar types recognized by the ASR library, and lists the SRGS specification features not supported by the library.
SRGS grammars¶
A grammar is a text document with rules that limit the sequences of words (i.e. the sentences) the speech recognition engine can recognize.
The SRGS specification defines two grammar formats:
The ABNF (Augmented Backus-Naur Form) format represents grammar by means of a syntax the extends the original BNF form with constructs that make it easy to express repetition patterns. The format is concise and easily understood and modified by a human user, which makes it adequate for grammar development and testing.
The XML represents the grammar as an XML document. To human users, the format is more verbose and hard to edit, compared to the ABNF format. However, the format is adequate for machine-machine interactions, since the generation and interpretation of XML documents are ubiquitous features in modern computer systems.
The ASR library accepts grammar documents in both formats, with no need to ever convert them. Some features of the SRGS standard are not supported by the CPQD ASR library. These features are:
Repetition probabilities (SRGS, section 2.5.1). Although repetitions are supported, repetition probabilities are not.
Language attachments (SRGS, section 2.7). The current ASR library only supports Brazilian Portuguese. Language attachments in a grammar document will not cause processing errors; such attachments will simply be ignored and a warning message will be issued every time such a construction appears in the grammar.
Recursion. Grammar rules that reference each other directly or indirectly are not supported. The SRGS specification does not require that compatible grammar processors accept recursive grammars. To add a recursive grammar for compilation to the ASR library will cause an error.
DTMF Mode (SRGS, section 4.6). Grammars dual-tone multi-frequency (DTMF) mode are not supported.
We have listed below a list of relevant deployment details for the development of grammars.
Special rules¶
Special rules NULL, VOID, and GARBAGE are supported.
XML metadata¶
XML metadata informed by means of the <metadata> element do not generate processing errors, but are ignored.
XML element <token>¶
In XML format, <token> elements are allowed, but have no effect.
Double quotation marks¶
In XML format, all the lexical content delimited by double quotation marks is treated as if it were in the body of a <token> element (in other words, the double quotation marks have no effect).
In ABNF format, the content delimited by double quotation marks is treated as an atomic sequence of words, with the difference that the reserved characters of the ABNF format (such as ?, +, *, $) are allowed. For example, the two following constructions are equivalent:
$rule1 = isto é " um exemplo de $regra "<1-2> com aspas;
$rule2 = isto é (um exemplo de "$regra")<1-2> com aspas;
Observe that the quotation marks cause “$regra” to be a word that includes the “$” character, and not a reference to the regra. This means that double quotation marks can be used to add special characters to the grammar body. Even so, we recommend avoiding the use of such characters.
References to external grammars¶
References to external grammars are supported. ABNF documents can reference XML documents and vice-versa. However, circular inclusions of grammar documents are not supported (a circular inclusion is when, for example, grammar A references a rule from grammar B, and grammar B references a rule from grammar A). The use of circular references in grammar documents will cause errors.
User-defined lexicons¶
User-defined lexicons – that relate words with their respective pronunciations – are supported. Please see the chapter Working with lexicons for details on how to use this.
Words and rule names¶
Rule names can contain any combination of alphanumeric characters and underscores. No other symbol is allowed. The length of rule names is not limited, but it is not advisable to use lengthy names. The length of words in the grammar body is also unlimited. The section Word treatment describes how words are treated regarding the normalization of blank spaces and the section Lexicons and word pronunciation explains how words are treated in terms of phonetic transcription.
File extensions and media types¶
A grammar file is a text file containing an SRGS grammar in ABNF or XML format. Multiple grammars in the same file are not allowed. The media type for ABNF is, according to the SRGS specification, application/srgs; the media type for XML grammars is application/srgs+xml.
When a grammar is referenced, whether in the body of another grammar (such as an external reference) or as an argument for a tool in a command line, the media type can be entered as part of a URI (SRGS, section 2.2); if no media type is informed, the following conventions are adopted:
A file with a .gram extension will be interpreted as a grammar in ABNF format.
A grammar with a .grxml extensin will be interpreted as a grammar in XML format.
A file with another extension will have its content inspected, and the ASR library will try to determine the format based on its content.
When a URI contains a media type declaration, the ASR library assumes that the file is in the corresponding format, regardless of its extension.
The following table summarizes the information presented above.
Format |
File Extension |
Media Type |
---|---|---|
ABNF |
gram |
application/srgs |
XML |
grxml |
application/srgs+xml |
Note
Although the ASR library scans the content of the grammar libraries to try and determine their format, it is recommendable to always use the above convention.
Character encoding¶
A grammar file can specify its character encoding. In the ABNF format, the encoding is found in the ABNF header. In the following example, the following header is valid for a file using UTF-8 encoding:
#ABNF 1.0 UTF-8;
In the XML format, the encoding is specified by means of the encoding attribute of the XML declaration:
<?xml version="1.0" encoding="utf-8"?>
There is no distinction between upper and lower case characters in the name of the encoding. When the ASR library processes a grammar file, it converts it internally to UTF-8 using the ICU library (International Components for Unicode). Although this means that all encoding schemes supported by the ICU library can be used in grammar files, the UTF-8 is the recommended encoding, and should be used whenever possible. When no encoding is informed, the ASR library assumes UTF-8, and no attempt is made to infer the file’s encoding. Keep this fact in mind when developing grammars on a platform with an encoding standard other than UTF-9 (e.g. Windows). It is recommendable to always inform the grammar file’s encoding in the file heading, even when the file is encoded in UTF-8.
Treating words¶
Sentences recognized by a grammar are sequences of words. In SRGS files, blank spaces are used to delimit these words. For example, consider the following ABNF grammar:
$regra = o livro que eu comprei
em " Nova
York" era bom;
After being processed by the library, this rule will produce a sequence of ten words: “o”, “livro”, “que”, “eu”, “comprei”, “em”, “Nova”, “York”, “era”, “bom”.
For a sentence to be recognized, a pronunciation must be assigned to each word. Working with lexicons explains how this is done, and how users can define customized pronunciations It also describes the grammar-g2p
, that can be used to evaluate the phonetic transcription assigned to each word.
The following list contains some good practices recommended for grammar developers. When followed, they help avoid common errors that normally lead to poor recognitions.
Avoid using words with non-alphanumeric symbols, such as “?”, “!”, “#”, “$”, among others. Depending on the target language, they can be pronounced differently, and the recognition library can generate the wrong pronunciation for a given context. It is preferable to write them out. For example, instead of “$”, write “dollar sign” or whatever you want the pronunciation to be.
Avoid number larger than 9 in numeric format (e.g. 123). The SRGS Standard does not require adherent grammar processors to have the capacity to convert numbers to their written-out form (e.g. transform “123” into “cento e vinte e três”). Furthermore, a sequence like “1234” can be pronounced several different ways in certain languages; for example, in Portuguese it can be pronounced as a cardinal number (“mil duzentos e trinta e quatro”), as a sequence of digits (“um dois três quatro”), or as a sequence of two numbers (“doze trinta e quatro”). In this case, the best practice is to write the number out. Digits from 0-9 can be used, since the SRGS specification requires grammar processors to be able to process such digits.
Avoid abbreviations and acronyms. Such words might have ambiguous pronunciations (for example, in a system that does not differentiate between upper and lower cases, “IT”) can be pronounced as a word - the singular pronoun for objects “it”) - or as an acronym - “aɪ t”), or non-trivial (for example, “jason”) for JSON, or “skuzzy”) for SCSI). In this case, we recommend writing out the acronym the way it should be pronounced (for example “i e, double-e”, instead of “IEEE”). Isolated letters are not allowed. It is also possible to add an entry that corresponds to an acronym in the user lexicon; please see the chapter Working with lexicons for more details.