Speech Detection

Parameters for configuring the speech segment detection.

endpointer.useToneDetectors

Description: Enables the suppression of telephone tones in the recognition.

Values: “true” or “false.” Default value: “true.”

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.useToneDetectors=true

endpointer.enabled

Description: Enables speech segment detection. If enabled, only the segment containing speech is processed and any surrounding silence is ignored. Otherwise, all the recorded audio is processed, increasing the time spent on recognition. Only when enabled will the starting and ending points of the speech be generated.

Values: “true” or “false.” Default value: “true.”

Location: /opt/cpqd/asr/config/engine/engine.conf

Example:

--endpointer.enabled=true

endpointer.headMargin

Description: Period of silence placed at the beginning of the speech segment.

Values: Integer number in milliseconds. Default value: 200.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.headMargin=200

endpointer.tailMargin

Description: Period of silence placed at the end of the speech segment.

Values: Integer number in milliseconds. Default value: 400.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.tailMargin=400

endpointer.waitEnd

Description: Time of silence to assume end of speech.

Values: Integer number in milliseconds. Default value: 1000.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.waitEnd=1000

endpointer.levelMode

Description: Calculation of the amplitude threshold to be interpreted as silence.

Values: Number (0, 1 or 2). Default value: 2

  1. Off. Ignores amplitude.

  2. Automatic. Uses mean amplitude at the beginning of the audio, with a duration of “endpointer.autoLevelLen”, added to the fixed percentage defined by “endpointer.levelThreshold”.

  3. Fixed. Percentage threshold defined by “endpointer.levelThreshold”.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.levelMode=2

endpointer.levelThreshold

Description: Amplitude percentage of the signal to be considered as silence. Used only when endpointer.levelMode = 2 or endpointer.levelMode = 1. For example, with endpointer.levelMode = 2 and endpointer.levelThreshold=10, we will have speech detected only when the signal is greater than 10% of the maximum amplitude. If levelMode=1, the mean amplitude level of the first audio segment will be added to the 10% of the amplitude.

Values: Integer number between 0 and 100. Default value: 5.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.levelThreshold=5

endpointer.autoLevelLen

Description: Length of the initial audio segment used to calculate the silence threshold. Used if levelMode = 1.

Values: Integer number in milliseconds. Default value: 300.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:

--endpointer.autoLevelLen=300