Capacity Private Cloud Containerized Microservices

Capacity Private Cloud is built on a cloud-native, containerized microservices architecture. Each core capability—speech recognition, text-to-speech, voice biometrics, and natural language understanding—runs as an independent service, enabling granular scaling, resilient deployments, and simplified maintenance.

This section provides a detailed reference for each microservice in the platform. Understanding these services and their responsibilities is valuable for architecture planning, operational troubleshooting, and capacity management.

The platform comprises the following pods (containers):

  • Admin-portal
  • Archive
  • Asr (e.g. asr-en)
  • Audit
  • Binary Storage
  • Biometric-active
  • Biometric-api
  • Biometric-identity
  • Configuration
  • Deployment
  • Deployment-portal
  • Diarization
  • File-store
  • Grammar
  • ITN
  • Language id (lid)
  • License
  • LumenVox-api
  • Management-api
  • Neural-tts (e.g. neural-tts-en-us)
  • Neuron
  • NLU
  • Persistent-volume-directory-setup
  • Reporting
  • Reporting-api
  • Reporting-bio-api
  • Resource
  • Session
  • Transaction
  • Tts (e.g. tts-en-us)
  • Vad

Admin-portal

This service manages the web portal for the cluster admin and supports the setup of each licensed tenant with their connection strings, encryption, and passwords to services and databases.

Archive

This is the archive manager. It is responsible for taking live session data from Redis and saving this to the PostgreSQL database. It also supplies data from the database to the reporting API for retrieval by the client or administration and deployment portals.

Asr

This is the ASR DNN engine. It is linked with the session manager and processes all ASR and Transcription transactions. Results are placed in Redis for retrieval. It receives messages from RabbitMQ on what audio needs to be processed. The DNN ASR engine provides high performance and accuracy, offering features such as aliases, dialect-specific processing, an improved scoring algorithm, enhanced transcription, phrase lists, and weighting. The engine is optimized for efficient resource utilization, making scaling more manageable.

Audit

This service connects to the audit database in PostgreSQL. It is used by the voice biometric services only, auditing modifications to the configuration, deployment, and biometric identity tables.

Binary Storage

Connects to MongoDB and reads/writes from/to the MongoDB. It is also responsible for the encryption/decryption of data stored within Mongo. This includes audio files and grammars.

Biometric-active

This is the Voice Biometric active DNN engine and is responsible for all active voice biometrics processing, receiving requests from the biometric APIs.

Biometric-api

This is the REST interface for active voice biometrics. It provides APIs for enrollment, verification, and identity management. This service communicates with the deployment, configuration, binary service, active verifier, and the lumenvox-api service (for text validation).

Biometric-identity

Responsible for managing the identities for the active engine for voice biometrics. It is also responsible for maintaining the links between identities and enrollments.

Configuration

Responsible for managing the configuration for various deployments for voice biometrics only. It receives input via JSON and manages the retrieval of enrollment and verification configurations from the PostgreSQL database.

Deployment

This service manages deployments within the cluster. For each deployment, it maintains the list of configurations and connection strings along with any other data specific to a deployment. It also handles encryption and encryption keys along with key rotation. The service is also used to manage the deployment, for example creating a new deployment ID and exporting/importing deployment information.

The cluster master key is created when the deployment service starts for the first time and is stored in the database. The cluster master key is encrypted with the cluster GUID. Customer keys are then created for each deployment and encrypted using the cluster master key.

Deployment-portal

This service manages the web portal for the tenant admins. It facilitates tenant access to their own deployment. The service supports the customization of configurations and manages the diagnostic tests.

Diarization

This pod allows users to use the Speaker Diarization service whereby mono audio containing multiple speakers is processed to identify which audio was spoken by which speaker for further transcription or post-processing, such as sentiment analysis.

File-store

This allows clients to store and retrieve files locally, such as grammars, phrase lists, SSML, and Lexicons. This is managed within the deployment portal.

Grammar manager

The Grammar management service coordinates grammar compilation, caching, parsing, and DTMF processing for ASR and Transcription transactions. It is also instrumental in CPA and AMD transactions, grammar storage, retrieval, housekeeping, and handling of global grammars and phrase lists.

ITN

This service manages all text normalization functions including inverse text normalization, punctuation and capitalization, and redaction on Transcription interactions. This service also includes the Sentiment Analysis service.

Language Identification (lid)

Also known as language detection, this service allows users to send audio for processing to determine which language is being spoken. This differs from the language detection function in NLU, which is text-based.

License

This service manages the licensing of the deployments. It validates each deployment against the licensing server. The licensing service is responsible for marking a deployment as valid; if not valid, the various APIs will not function.

It also collects counters for transactions for all products and communicates this to the centralized licensing service for billing and product usage monitoring.

LumenVox-api

This service supports the Speech APIs made available to clients to process transactions for speech products including ASR, TTS, CPA, and AMD using gRPC.

Management-api

This facilitates the use of the REST management APIs and is a wrapper around deployment and configuration services, allowing customers to manage these.

Neural-tts

This is the Neural DNN TTS engine. Over 80 voices across 30 languages/dialects are available for consumption.

Neuron

This is Capacity's NLU intent engine which uses transcribed text to determine intent and perform slot filling.

NLU

The Natural Language Understanding service provides the following text input-based products:

  • Call Summarization including topic detection and outcome detection
  • Language Detection (this differs from the audio-based Language Identification)
  • Language Translation (including auto language detection)

Persistent-volume-directory-setup

This service is used to create the connection to the persistent volume. It is not a service that runs continuously. It is used to update resources such as DNN language models in the persistent storage.

Reporting

This service is used for voice biometrics to generate reports and make the extracted data available to the reporting APIs.

Reporting-api

This facilitates the speech reporting APIs using gRPC and is used for the Analysis portal.

Reporting-bio-api

This REST interface is used to connect to the reporting services and expose reports to the client for voice biometrics.

Resource

This service downloads the public language packs to the cluster for ASR, TTS, and voice biometrics. It stores the language packs into the shared volume for the cluster and makes resources available to each service as needed.

Session

Currently used for speech products only. It manages the sessions and interactions and acts as a dispatcher between the lumenvox-api service and other services.

Transaction

This service is responsible for writing voice biometric transactions into the database. Voice biometric sessions are stored in Redis until the session is completed, when the transaction service receives a message to retrieve data and write the session and transaction data to the PostgreSQL database. This service is also used for operations reporting where each API call to the biometric interface is logged as an operation and stored for audit purposes.

TTS

This is the legacy TTS engine.

VAD

This is the media service that manages the Voice Activity Detection (VAD) algorithm and assists with transcription audio streaming.

Capacity Private Cloud Microservices Diagram

The following is an architecture diagram that includes all services.


Was this article helpful?