Custom Entities¶
Custom entities are entities defined by the developer, possibly specific to the application. The following types of custom entities can be defined by the developer:
Synonym List
Regular Expressions
Synonym List¶
To use a synonym list, simply specify the list of words or phrases in the ‘synonyms’ field that are synonyms of the term indicated in the ‘value’ field. In the example below, we have an entity called PHONE_NUMBER that utilizes this feature:
{ "name": "PHONE_NUMBER", "synonym_values": [ { "value": "casa", "synonyms": [ "telefone de casa", "minha casa" ] } ] }
Regular Expressions¶
To use regular expressions, simply specify the regular expression in the ‘regex’ field and the desired resulting value in the ‘value’ field.
The ‘regex’ field should follow the Python syntax.
The ‘value’ field can have special values:
‘@text’ = original normalized text
‘@clean’ = original normalized text containing only alphanumeric characters.
If none of these special forms are used, the system considers that the value is a substitution expression for the analyzed content with a regular expression (see Python Match.expand). This expression can also be simply text.
Below is an example of using this feature for the custom entity CEP:
{
"name": "CEP",
"regex_values": [
{
"value": "\\1\\2",
"regex": "(\\d{5})-(\\d{3})"
}
]
}
The ‘value’ field was defined as the text of the two groups captured by the regular expression ‘(\d{5})-(\d{3})’, meaning:
(\d{5}) ==> \1
(\d{3}) ==> \2
In this case, an input text ‘13060-432’ would be converted to ‘value = 13060432’.
For more information about regular expressions in Python, refer to the Regular Expression HOWTO.