Guia de Instalação Standalone

Procedimento de Instalação

1. Instalação do Docker

Siga os passos de instalação do Docker. Por exemplo, para sistema operacional Linux CentOS, o procedimento de instalação se encontra aqui.

2. Instalação do Docker Compose

Siga os passos de instalação do Docker compose.

Aviso

Para obter as imagens docker é necessario ter acesso à rede corporativa do CPqD e ao repositório cpqd-docker-dev. O download das imagens será feito nos passos adiante.

3. Criação da estrutura de diretórios

Nota

Esse procedimento assume que serão utilizado o disco /l/disk0 para a persistência do MongoDB e InfluxDB.

  • Organização de pastas proposta.

    l
    └──disk0
       ├──influxdb
       ├──mongodb
       └──trd
          ├──docker-compose.all.yml
          └──init_all.sh
    
  • Crie a pasta de persistência para o MongoDB em /l/disk0/mongodb.

    $ mkdir -p /l/disk0/mongodb && \
      chmod 775 -R /l/disk0/mongodb
    
  • Crie a pasta de persistência para o InfluxDB em /l/disk0/influxdb.

    $ mkdir -p /l/disk0/influxdb && \
      chmod 775 -R /l/disk0/influxdb
    
  • Crie a pasta a trd onde deveram estar o script de inicialização e o docker-compose do projeto /l/disk0/credentials.

    $ mkdir -p /l/disk0/trd && \
      chmod 775 -R /l/disk0/trd
    

4. Criação do script de inicialização

  • Crie o script de inicialização init_all.sh (/l/disk0/trd) e configure nele as variáveis de ambiente:

    • TRD_MONGODB: Diretório para persistência do MongoDB (/l/disk0/mongodb).
    • TRD_INFLUXDB: Diretório para persistência do InfluxDB (/l/disk0/influxdb).
    • DIARIZATION_LICENSE_ID: ID da licença do Diarizador, fornecido pelo CPqD.
    • SL_TAG: Tag de identificação da instancia utilizada pelo sistema de licença.
    • INFLUXDB_PASSWORD: Senha utilizada na a criação do usuário padrão do InfluxDB.
    • INFLUXDB_ADMIN_PASSWORD: Senha utilizada na a criação do administrador do InfluxDB.
    #!/bin/bash
    
    ### Versão ###
    export TRD_VERSION=latest
    
    ### Parâmetros dos volumes mapeados no host ###
    export TRD_MONGODB=/l/disk0/mongodb
    export TRD_INFLUXDB=/l/disk0/influxdb
    export TRD_LOG=/var/log/cpqd/trd/
    
    ### Parâmetros de licença ###
    export LICENSE_HOST=sl.cpqd.com.br
    export DIARIZATION_LICENSE_ID=dummyasrid
    export ASR_LICENSE_ID=dummydiarizationid
    export ASR_LICENSE_CHANNELS=5
    export SL_TAG="" #Descrição da máquina
    export SL_HOSTNAME=$(hostname)
    
    ### Parâmetros que só serão alterados caso haja alguma mudança no setup padrão ###
    export PROFILE="prod"
    export MONGODB=mongodb://mongo:27017
    export RABBIT_HOST=rabbitmq
    export RABBIT_PORT=5672
    export RABBIT_MANAGEMENT_PORT=15672
    export RABBIT_USER=guest
    export RABBIT_PWD=guest
    export TIMEZONE="America/Sao_Paulo"
    export VALID_EXTENSION_FORMATS='[".wav", ".mp3", ".flac", ".wma", ".mp4", ".avi", ".flv", ".mkv", ".mov", ".mpg", ".wmv"]'
    export MAX_AUDIO_DURATION_SECS=126000   #35h
    export MAX_AUDIO_SIZE_BYTES=2147483648  #2Gbyte
    export MONGODB_CACHE_GB="0.1"
    export EMPTY_QUEUE_TIMEOUT=43200
    export JOB_TTL_IN_MINUTES=7200 # 5 dias
    export ASR_LM_URI=builtin:slm/callcenter-small
    export CLUSTER_TH=-.4
    export BACKEND_CORS_ORIGINS='["http://localhost", "http://localhost:8000", "https://localhost", "https://localhost:8000"]'
    export EMPTY_QUEUE_INTERVAL=60  # 60 seconds
    
    ### Configuração do Nó de processamento ###
    export ASR_SERVER_URL=ws://asr-server:8025/asr-server/asr
    export ASR_WORD_HINTS=""
    export ASR_REMOVE_UNK=false
    
    ### Webhook ###
    export ALLOW_HTTP_WEBHOOKS=false  # Desabilita webhooks HTTP puro
    
    ### InfluxDB ###
    export INFLUX_DATABASE="metrics"
    export INFLUXDB_USER="trd"
    export INFLUXDB_PASSWORD="dummypassword"
    export INFLUXDB_ADMIN_USER="admin"
    export INFLUXDB_ADMIN_PASSWORD="dummypassword"
    export INFLUX_INIT='{"host":"metrics", "port": 8086, "db":"'${INFLUX_DATABASE}'", "active": "true"}'
    
    docker-compose -f docker-compose.all.yml up -d
    
  • Crie o docker-compose docker-compose.all.yml (/l/disk0/trd) a seguir:

    version: '3'
    
    services:
      mongo:
        image: mongo:4.2
        restart: always
        environment:
          TZ: ${TIMEZONE}
        ports:
          - "27017:27017"
        volumes:
          - $TRD_MONGODB:/data/db
        command: --wiredTigerCacheSizeGB ${MONGODB_CACHE_GB}
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
    
      rabbitmq:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/rabbitmq:${TRD_VERSION:-latest}
        restart: always
        environment:
          TZ: ${TIMEZONE}
        ports:
          - "8072:15672"
          - "5672:5672"
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
    
      metrics:
        image: influxdb:1.8.3
        environment:
          INFLUXDB_DB: ${INFLUX_DATABASE}
          INFLUXDB_HTTP_AUTH_ENABLED: "true"
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          INFLUXDB_ADMIN_USER: ${INFLUXDB_ADMIN_USER}
          INFLUXDB_ADMIN_PASSWORD: ${INFLUXDB_ADMIN_PASSWORD}
        ports:
          - "8086:8086"
        volumes:
          - $TRD_INFLUXDB:/var/lib/influxdb
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
    
      transcription-api:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/transcription-api:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          INFLUX_DATABASE: ${INFLUX_DATABASE}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          VALID_EXTENSION_FORMATS: ${VALID_EXTENSION_FORMATS}
          MAX_AUDIO_DURATION_SECS: ${MAX_AUDIO_DURATION_SECS}
          MAX_AUDIO_SIZE_BYTES: ${MAX_AUDIO_SIZE_BYTES}
          TZ: ${TIMEZONE}
          ALLOW_HTTP_WEBHOOKS: ${ALLOW_HTTP_WEBHOOKS}
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
          ASR_LM_URI: ${ASR_LM_URI}
          CLUSTER_TH: ${CLUSTER_TH}
          DIAR_CHUNK_MAX_SIL: 1200.0
          DIAR_CHUNK_MAX_LEN: 3600.0
          BACKEND_CORS_ORIGINS: ${BACKEND_CORS_ORIGINS}
          PORT: "8000"
        ports:
          - "8000:8000"
        logging:
          driver: "json-file"
          options:
           max-size: "10m"
           max-file: "5"
    
      transcription-pipeline:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/transcription-pipeline:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          RABBIT_MANAGEMENT_PORT: ${RABBIT_MANAGEMENT_PORT}
          RABBIT_USER: ${RABBIT_USER}
          RABBIT_PWD: ${RABBIT_PWD}
          TZ: ${TIMEZONE}
          DIARIZATION_LICENSE_HOST: "https://${LICENSE_HOST}"
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
          SL_HOSTNAME: ${SL_HOSTNAME}
          SL_TAG: ${SL_TAG}
          ALLOW_HTTP_WEBHOOKS: ${ALLOW_HTTP_WEBHOOKS}
          EMPTY_QUEUE_TIMEOUT: ${EMPTY_QUEUE_TIMEOUT}
          EMPTY_QUEUE_INTERVAL: ${EMPTY_QUEUE_INTERVAL}
          JOB_TTL_IN_MINUTES: ${JOB_TTL_IN_MINUTES}
        logging:
          driver: "json-file"
          options:
           max-size: "10m"
           max-file: "5"
        volumes:
          - cache:/var/tmp
    
      normalization-executor:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/normalization-executor:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          SL_HOSTNAME: ${SL_HOSTNAME}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          TZ: ${TIMEZONE}
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
        volumes:
          - ${TRD_LOG}:/var/log/cpqd/trd/
    
      vad-executor:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/vad-executor:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          DIARIZATION_LICENSE_HOST: "https://${LICENSE_HOST}"
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
          SL_HOSTNAME: ${SL_HOSTNAME}
          SL_TAG: ${SL_TAG}
          TZ: ${TIMEZONE}
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
        command: "./tini -- ./pyexec --name vad-worker --module /opt/modules.dat \
                  --file queuer_vad --func executor"
        volumes:
          - cache:/var/tmp
    
      clustering-executor:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/clustering-executor:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          DIARIZATION_LICENSE_HOST: "https://${LICENSE_HOST}"
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
          SL_HOSTNAME: ${SL_HOSTNAME}
          SL_TAG: ${SL_TAG}
          TZ: ${TIMEZONE}
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
        command: "./tini -- ./pyexec --name clustering-worker --module /opt/modules.dat \
                  --file queuer_clustering --func executor"
        volumes:
          - cache:/var/tmp
    
      recognition-executor:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/recognition-executor:${TRD_VERSION:-latest}
        restart: always
        environment:
          INFLUX_INIT: ${INFLUX_INIT}
          INFLUXDB_USER: ${INFLUXDB_USER}
          INFLUXDB_USER_PASSWORD: ${INFLUXDB_PASSWORD}
          SL_HOSTNAME: ${SL_HOSTNAME}
          PROFILE: ${PROFILE}
          MONGODB: ${MONGODB}
          RABBIT_HOST: ${RABBIT_HOST}
          RABBIT_PORT: ${RABBIT_PORT}
          ASR_SERVER_URL: ${ASR_SERVER_URL}
          ASR_LICENSE_CHANNELS: ${ASR_LICENSE_CHANNELS}
          TZ: ${TIMEZONE}
          DIARIZATION_LICENSE_ID: ${DIARIZATION_LICENSE_ID}
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
    
      asr-server:
        image: artifactory.cpqd.com.br:8443/docker-dev/cpqd/trd/asr-server:${TRD_VERSION:-latest}
        restart: always
        environment:
          ASR_LICENSE_HOST: ${LICENSE_HOST}
          ASR_LICENSE_ID: ${ASR_LICENSE_ID}
          ASR_LICENSE_CHANNELS: ${ASR_LICENSE_CHANNELS}
          ASR_WORD_HINTS: ${ASR_WORD_HINTS}
          ASR_REMOVE_UNK: ${ASR_REMOVE_UNK}
          TZ: ${TIMEZONE}
          CHECK_CONNECTION_WITH_SL: "true"
          LICENSE_HOST_PING: https://${LICENSE_HOST}/ping
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "5"
        volumes:
          - ${TRD_LOG}:/var/log/cpqd
          - cache:/opt/cpqd/asr/license
    
    networks:
      default:
        driver: bridge
    
    volumes:
      cache:
    

Iniciando e parando os serviços

Para iniciar os serviços do Nó de Controle:

$ ./init_all.sh

Para parar todos os serviços:

$ docker-compose -f docker-compose.all.yml down

Visualizando os logs

Os logs são armazenados pelo docker, normalmente em: /var/lib/docker/containers/<container id>/<container id>-json.log

Para visualizar os logs dos containers Docker, utilize os comandos abaixo.

Na máquina que está executando o transcritor:

$ docker-compose -f docker-compose.all.yml logs -f --tail 50