Capacity Private Cloud Containerized Microservices
Capacity Private Cloud is built on a cloud-native, containerized microservices architecture. Each core capability—speech recognition, text-to-speech, voice biometrics, and natural language understanding—runs as an independent service, enabling granular scaling, resilient deployments, and simplified maintenance.
This section provides a detailed reference for each microservice in the platform. Understanding these services and their responsibilities is valuable for architecture planning, operational troubleshooting, and capacity management.
The platform comprises the following pods (containers):
- Admin-portal
- Archive
- Asr (e.g. asr-en)
- Audit
- Binary Storage
- Biometric-active
- Biometric-api
- Biometric-identity
- Configuration
- Deployment
- Deployment-portal
- Diarization
- File-store
- Grammar
- ITN
- Language id (lid)
- License
- LumenVox-api
- Management-api
- Neural-tts (e.g. neural-tts-en-us)
- Neuron
- NLU
- Persistent-volume-directory-setup
- Reporting
- Reporting-api
- Reporting-bio-api
- Resource
- Session
- Transaction
- Tts (e.g. tts-en-us)
- Vad
Admin-portal
This service manages the web portal for the cluster admin and supports the setup of each licensed tenant with their connection strings, encryption, and passwords to services and databases.
Archive
This is the archive manager. It is responsible for taking live session data from Redis and saving this to the PostgreSQL database. It also supplies data from the database to the reporting API for retrieval by the client or administration and deployment portals.
Asr
This is the ASR DNN engine. It is linked with the session manager and processes all ASR and Transcription transactions. Results are placed in Redis for retrieval. It receives messages from RabbitMQ on what audio needs to be processed. The DNN ASR engine provides high performance and accuracy, offering features such as aliases, dialect-specific processing, an improved scoring algorithm, enhanced transcription, phrase lists, and weighting. The engine is optimized for efficient resource utilization, making scaling more manageable.
Audit
This service connects to the audit database in PostgreSQL. It is used by the voice biometric services only, auditing modifications to the configuration, deployment, and biometric identity tables.
Binary Storage
Connects to MongoDB and reads/writes from/to the MongoDB. It is also responsible for the encryption/decryption of data stored within Mongo. This includes audio files and grammars.
Biometric-active
This is the Voice Biometric active DNN engine and is responsible for all active voice biometrics processing, receiving requests from the biometric APIs.
Biometric-api
This is the REST interface for active voice biometrics. It provides APIs for enrollment, verification, and identity management. This service communicates with the deployment, configuration, binary service, active verifier, and the lumenvox-api service (for text validation).
Biometric-identity
Responsible for managing the identities for the active engine for voice biometrics. It is also responsible for maintaining the links between identities and enrollments.
Configuration
Responsible for managing the configuration for various deployments for voice biometrics only. It receives input via JSON and manages the retrieval of enrollment and verification configurations from the PostgreSQL database.
Deployment
This service manages deployments within the cluster. For each deployment, it maintains the list of configurations and connection strings along with any other data specific to a deployment. It also handles encryption and encryption keys along with key rotation. The service is also used to manage the deployment, for example creating a new deployment ID and exporting/importing deployment information.
The cluster master key is created when the deployment service starts for the first time and is stored in the database. The cluster master key is encrypted with the cluster GUID. Customer keys are then created for each deployment and encrypted using the cluster master key.
Deployment-portal
This service manages the web portal for the tenant admins. It facilitates tenant access to their own deployment. The service supports the customization of configurations and manages the diagnostic tests.
Diarization
This pod allows users to use the Speaker Diarization service whereby mono audio containing multiple speakers is processed to identify which audio was spoken by which speaker for further transcription or post-processing, such as sentiment analysis.
File-store
This allows clients to store and retrieve files locally, such as grammars, phrase lists, SSML, and Lexicons. This is managed within the deployment portal.
Grammar manager
The Grammar management service coordinates grammar compilation, caching, parsing, and DTMF processing for ASR and Transcription transactions. It is also instrumental in CPA and AMD transactions, grammar storage, retrieval, housekeeping, and handling of global grammars and phrase lists.
ITN
This service manages all text normalization functions including inverse text normalization, punctuation and capitalization, and redaction on Transcription interactions. This service also includes the Sentiment Analysis service.
Language Identification (lid)
Also known as language detection, this service allows users to send audio for processing to determine which language is being spoken. This differs from the language detection function in NLU, which is text-based.
License
This service manages the licensing of the deployments. It validates each deployment against the licensing server. The licensing service is responsible for marking a deployment as valid; if not valid, the various APIs will not function.
It also collects counters for transactions for all products and communicates this to the centralized licensing service for billing and product usage monitoring.
LumenVox-api
This service supports the Speech APIs made available to clients to process transactions for speech products including ASR, TTS, CPA, and AMD using gRPC.
Management-api
This facilitates the use of the REST management APIs and is a wrapper around deployment and configuration services, allowing customers to manage these.
Neural-tts
This is the Neural DNN TTS engine. Over 80 voices across 30 languages/dialects are available for consumption.
Neuron
This is Capacity's NLU intent engine which uses transcribed text to determine intent and perform slot filling.
NLU
The Natural Language Understanding service provides the following text input-based products:
- Call Summarization including topic detection and outcome detection
- Language Detection (this differs from the audio-based Language Identification)
- Language Translation (including auto language detection)
Persistent-volume-directory-setup
This service is used to create the connection to the persistent volume. It is not a service that runs continuously. It is used to update resources such as DNN language models in the persistent storage.
Reporting
This service is used for voice biometrics to generate reports and make the extracted data available to the reporting APIs.
Reporting-api
This facilitates the speech reporting APIs using gRPC and is used for the Analysis portal.
Reporting-bio-api
This REST interface is used to connect to the reporting services and expose reports to the client for voice biometrics.
Resource
This service downloads the public language packs to the cluster for ASR, TTS, and voice biometrics. It stores the language packs into the shared volume for the cluster and makes resources available to each service as needed.
Session
Currently used for speech products only. It manages the sessions and interactions and acts as a dispatcher between the lumenvox-api service and other services.
Transaction
This service is responsible for writing voice biometric transactions into the database. Voice biometric sessions are stored in Redis until the session is completed, when the transaction service receives a message to retrieve data and write the session and transaction data to the PostgreSQL database. This service is also used for operations reporting where each API call to the biometric interface is logged as an operation and stored for audit purposes.
TTS
This is the legacy TTS engine.
VAD
This is the media service that manages the Voice Activity Detection (VAD) algorithm and assists with transcription audio streaming.
Capacity Private Cloud Microservices Diagram
The following is an architecture diagram that includes all services.
