Speech Detection¶

Parameters for configuring the speech segment detection.

endpointer.useToneDetectors¶

Description: Enables the suppression of telephone tones in the recognition.

Values: “true” or “false.” Default value: “true.”

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.useToneDetectors=true

endpointer.enabled¶

Description: Enables speech segment detection. If enabled, only the segment containing speech is processed and any surrounding silence is ignored. Otherwise, all the recorded audio is processed, increasing the time spent on recognition. Only when enabled will the starting and ending points of the speech be generated.

Values: “true” or “false.” Default value: “true.”

Location: /opt/cpqd/asr/config/engine/engine.conf

Example:
--endpointer.enabled=true

endpointer.headMargin¶

Description: Period of silence placed at the beginning of the speech segment.

Values: Integer number in milliseconds. Default value: 200.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.headMargin=200

endpointer.tailMargin¶

Description: Period of silence placed at the end of the speech segment.

Values: Integer number in milliseconds. Default value: 400.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.tailMargin=400

endpointer.waitEnd¶

Description: Time of silence to assume end of speech.

Values: Integer number in milliseconds. Default value: 1000.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.waitEnd=1000

endpointer.levelMode¶

Description: Calculation of the amplitude threshold to be interpreted as silence.

Values: Number (0, 1 or 2). Default value: 2

Off. Ignores amplitude.

Automatic. Uses mean amplitude at the beginning of the audio, with a duration of “endpointer.autoLevelLen”, added to the fixed percentage defined by “endpointer.levelThreshold”.

Fixed. Percentage threshold defined by “endpointer.levelThreshold”.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.levelMode=2

endpointer.levelThreshold¶

Description: Amplitude percentage of the signal to be considered as silence. Used only when endpointer.levelMode = 2 or endpointer.levelMode = 1. For example, with endpointer.levelMode = 2 and endpointer.levelThreshold=10, we will have speech detected only when the signal is greater than 10% of the maximum amplitude. If levelMode=1, the mean amplitude level of the first audio segment will be added to the 10% of the amplitude.

Values: Integer number between 0 and 100. Default value: 5.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.levelThreshold=5

endpointer.autoLevelLen¶

Description: Length of the initial audio segment used to calculate the silence threshold. Used if levelMode = 1.

Values: Integer number in milliseconds. Default value: 300.

Location: /opt/cpqd/asr/config/engine/engine.conf, API

Example:
--endpointer.autoLevelLen=300