Using Free Speech

A language model defines how words are combined to form sentences. The language model can be a free speech model or a grammar model. Free speech models and grammar models are built differently and work differently.

A free speech model is a statistical model trained by using large amounts of text:

  • Users can speak more freely;

  • Large, albeit fixed, word vocabulary.

Free speech recognition produces better results when the usage context is close to the training context of the model.

When should you use the free speech model?

  • Message dictation (emails, SMS, etc.);

  • Free answers (e.g. ‘explain what you want’);

  • When searching for text.

Using the recognition model

A free speech model is identified in the speech recognition engine by its URI. Each model has its own identifying URI, which must be previously installed in the CPQD ASR machine.

Each ASR API has a specific way of informing the language model. For example, general use free speech model URI is ‘’builtin:slm/general’. See the following examples in each API.

In the REST API, the model is used as shown in the following example:

POST /asr-server/rest/recognize?lm=builtin:slm/general HTTP/1.1
Host: 127.0.0.1:8025
Accept: */*
Content-Type: audio/wav
Content-Length: ...

<conteúdo binário do áudio>

With the REST API, we can use the curl command, as in the following example:

curl --header "Content-Type: audio/wav" \
    --data-binary @/audio/pizza.wav \
    http://127.0.0.1:8025/asr-server/rest/recognize?lm=builtin:slm/general

In the WebSocket API, the model is used as shown in the following example:

C->S: ASR 2.2 START_RECOGNITION
      Content-Type: text/uri-list
      Content-ID: yes_no
      Content-Length: ...

      builtin:slm/general

In the MRCP v2 API, the model is used as shown in the following example:

C->S: MRCP/2.0 ... RECOGNIZE 543260
      Channel-Identifier:32AECB23433801@speechrecog
      Content-Type:text/uri-list
      Content-Length:...

      builtin:slm/general