CPA & AMD Integration - gRPC
This article covers all gRPC-based integration scenarios for Capacity Private Cloud CPA and AMD. gRPC does not use grammars — settings are configured directly in protobuf messages. Message playback is handled separately by the application's telephony platform or via a separate TTS request to the gRPC API. For core concepts, see CPA & AMD Overview. For MRCP integration, see CPA & AMD: MRCP Integration.
gRPC session architecture: Audio is streamed into a session representing the phone call. CPA and AMD run as two parallel interactions on the same session, both processing the same audio stream. AudioPush begins before InteractionCreate requests are sent.
gRPC vs MRCP — Key Differences
While MRCP and gRPC achieve the same detection outcomes, they differ significantly in session management, settings configuration, and how results are returned.
| Aspect | MRCP | gRPC |
| Grammar required | Yes — grammar <meta> tags configure CPA and AMD | No — 2 parallel interactions configured in protobuf fields |
| Message playback | MRCP SPEAK command on the same session | Separate — via telephony platform or a TTS request to the gRPC API |
| Session management | SIP session lifecycle | gRPC call session lifecycle (SessionCreate / SessionClose) |
| Parallel CPA + AMD | Both grammars in a single RECOGNIZE request | 2 separate InteractionCreate requests run in parallel on the same session |
| Real-time results | RECOGNITION-COMPLETE event | InteractionResult on gRPC stream |
Message Delivery
Purpose: Same as MRCP message delivery, but using the gRPC Interaction API. No grammars needed — settings configured directly in protobuf messages.
Requires: CPA + AMD
Protocol: gRPC (Capacity Private Cloud Interaction API)
Process Flow
gRPC Message Sequence
The following gRPC message sequence illustrates the interaction between the client application and the LumenVox gRPC service during a message delivery workflow. Unlike MRCP, audio streaming begins before interactions are created, and CPA and AMD run as separate parallel interactions rather than grammars within a single request.
gRPC Agent Connection
Purpose: Same as MRCP agent connection, but using the gRPC Interaction API. Connect a live agent to a human caller; hang up on machines.
Requires: CPA + AMD
Protocol: gRPC
This use case assumes an agent is available when the call is answered. It is the call centre's responsibility to manage agent availability and pacing to ensure agents are ready when human calls connect.
Process Flow
gRPC Message Sequence — Agent Connection
The following gRPC message sequence illustrates the interaction between the client application and the LumenVox gRPC service during a message delivery workflow. Unlike MRCP, audio streaming begins before interactions are created, and CPA and AMD run as separate parallel interactions rather than grammars within a single request.
Apple Call Screening
Minimum version: Capacity Private Cloud 7.0
The logical flow for Apple Call Screening is identical to the MRCP version. The implementation uses the gRPC Interaction API with sessions and interactions instead of MRCP grammars.
Purpose: Detect Apple Call Screening on outbound calls using the gRPC API, deliver a screening payload, and handle the three possible outcomes: call dropped, voicemail, or human answered.
Requires: CPA + AMD (sequential interactions on a gRPC session)
Protocol: gRPC (Capacity Private Cloud Interaction API)
Audio is streamed into a session via AudioPush before interactions are created. Settings are configured directly in protobuf fields rather than grammar meta tags. Message playback is handled separately by the application's telephony platform or via a separate TTS request to the gRPC API.
gRPC vs MRCP — Apple Call Screening Differences
While MRCP and gRPC achieve the same detection outcomes, they differ significantly in how sessions are managed, how settings are configured, and how results are returned. The table below highlights the key architectural differences between the two protocols for standard CPA and AMD workflows.
| Aspect | MRCP | gRPC |
| Screening announcement detection | CPA grammar returns UNKNOWN SPEECH | InteractionCreateRequest with detection_mode=CPA returns UNKNOWN SPEECH |
| Prompt End detection | cpa_prompt_end.grxml grammar | InteractionCreateRequest with cpa_prompt_end_enable=true. If enabled all other CPA settings will be ignored. |
| Screening tone detection | amd_screening_tone.grxml grammar | InteractionCreateRequest with screening_tone_enable=true |
| Wait for answer | cpa_unknown_silence_timeout.grxml grammar | InteractionCreateRequest with cpa_unknown_silence_timeout=30000 |
| Message playback | MRCP SPEAK command on same session | Separate — via telephony platform or TTS request to gRPC API |
| Session management | SIP session lifecycle | gRPC call session lifecycle (SessionCreate / SessionClose) |
Process Flow
gRPC Message Sequence — Apple Call Screening
The following gRPC message sequence illustrates the interaction between the client application and the LumenVox gRPC service during a message delivery workflow. Unlike MRCP, audio streaming begins before interactions are created, and CPA and AMD run as separate parallel interactions rather than grammars within a single request.
Decision Points
| Decision | Condition | Action |
| UNKNOWN SPEECH detected | CPA InteractionResult returns UNKNOWN SPEECH | Apple screening announcement detected. Create new CPA interaction with prompt end detection enabled. |
| PROMPT END detected | CPA InteractionResult returns PROMPT END | Announcement complete. Deliver call screening payload via telephony platform or TTS gRPC request. |
| SCREENING tone detected | AMD InteractionResult returns SCREENING | Apple warble tone confirmed. Create CPA interaction with 30000ms silence timeout and AMD tone detection interaction in parallel. |
| Call Dropped (Result A) | VAD_EVENT_TYPE_BARGE_IN_TIMEOUT returned | Apple user declined. Close interactions and session. End call. |
| Voicemail (Result B) | CPA returns UNKNOWN SPEECH, then AMD returns BEEP | Call went to voicemail. Deliver message via telephony platform. Close interactions and session. |
| Human Answered (Result C) | CPA returns HUMAN RESIDENCE or HUMAN BUSINESS | Apple user accepted. Deliver message or bridge to agent via telephony platform. Close interactions and session. |
| Error | CPA returns UNKNOWN SILENCE or other unexpected result | Close interactions and session. Log error. End call. |
Timing Considerations
- Initial CPA/AMD detection: The first interaction will return UNKNOWN SPEECH typically within 3–5 seconds as the Apple screening announcement plays.
- Prompt End detection: Returned promptly once the Apple announcement speech concludes.
- Screening tone detection: The warble tone is typically detected within a few seconds of the payload completing.
- Wait for answer: A
cpa_unknown_silence_timeoutof 30000ms (30 seconds) is recommended to allow sufficient time for the Apple user to decide. Adjust based on observed user response times for your campaign.
End-to-end campaign flow
This diagram combines all use cases into a single comprehensive flow showing the complete lifecycle of an outbound call — from pre-answer tone detection through post-answer classification to final call disposition.
Timing and Configuration Quick Reference
The table below provides a quick reference for how each detection feature behaves at runtime, including what triggers it to stop listening and when results are returned. This is useful for estimating detection latency and configuring appropriate timeouts.
| Feature | Trigger to Stop | Returns When |
| CPA | Speech ends + VAD_EOS_DELAY, or silence timeout | Speech classified OR timeout |
| AMD (Beep) | BEEP tone detected | Beep detected OR BARGE_IN_TIMEOUT |
| AMD (Fax) | FAX tone detected | Immediately on detection |
| AMD (SIT) | SIT tones detected | Immediately on detection (1–2 seconds) |
| AMD (Busy) | BUSY tone detected | Immediately on detection |
