CPA & AMD Integration - MRCP
This article covers all MRCP-based integration scenarios for Capacity Private Cloud CPA and AMD. MRCP (Media Resource Control Protocol) supports both MRCPv2 via SIP and MRCPv1 via RTSP. Grammar-based configuration provides fine-grained control over detection parameters. For core concepts and architecture, see CPA & AMD Overview. For gRPC integration, see CPA & AMD: gRPC Integration.
Pre-Answer Tone Detection (SIT / Busy)
Purpose: Detect network tones before a call is answered to filter out invalid numbers and busy lines early — before consuming resources on CPA analysis.
Requires: AMD only (Tone detection mode)
Platform requirement: Must deliver early media (audio before answer)
Process Flow
Decision Points
| Decision | Condition | Action |
| SIT detected | Any of 7 SIT subtypes | End call immediately. Number is invalid or disconnected. Remove from dial list. |
| BUSY detected | Busy tone detected | End call. Schedule retry after backoff period. |
| FAX detected | Fax tone after call answered | End call. Flag number as fax in CRM. |
| No tone, call answered | Timeout with no tone + answer event | Proceed to post-answer CPA/AMD analysis. |
| No tone, no answer | Timeout with no tone + no answer event | Dialer handles as "no answer" — retry later. |
Timing
- Busy detection: Returned as soon as a busy tone is detected.
- SIT detection: Immediate — SIT tones appear within the first 1–2 seconds.
- Fax detection: Usually immediate after the call is answered.
BARGE_IN_TIMEOUT: Configurable depending on how long the campaign is prepared to wait.
Message Delivery (using grammars)
Purpose: Deliver a pre-recorded or TTS message to the called party. If a human answers, play the message immediately. If an answering machine answers, wait for the beep then play the message.
Requires: CPA + AMD (parallel grammars in the same MRCP session)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)
Beep During Playback
In all message delivery scenarios — including Unknown Speech — always play the message immediately using MRCP SPEAK while running MRCP RECOGNIZE in parallel to listen for a BEEP. SPEAK and RECOGNIZE run in parallel on the same MRCP session. Set the BEEP timeout to be longer than the message length. If a beep is detected during message playback, stop the current SPEAK and restart the message from the beginning.
This approach means the message starts playing straight away and simply restarts if a beep is detected — there is no idle wait time.
Process Flow
Beep During Playback
In all message delivery scenarios — including Unknown Speech — always play the message immediately using MRCP SPEAK while running MRCP RECOGNIZE in parallel to listen for a BEEP. SPEAK and RECOGNIZE run in parallel on the same MRCP session. Set the BEEP timeout to be longer than the message length. If a beep is detected during message playback, stop the current SPEAK and restart the message from the beginning.
This approach means the message starts playing straight away and simply restarts if a beep is detected — there is no idle wait time.
MRCP Message Sequence — Human Detected
The following MRCP message sequence illustrates the interaction between the client application and the LumenVox media server when CPA detects a live human (HUMAN RESIDENCE or HUMAN BUSINESS). The message is played immediately using SPEAK while RECOGNIZE runs in parallel to detect any BEEP that would indicate the call was actually answered by a machine.
MRCP Message Sequence — Unknown Speech (Machine Detected)
When CPA returns UNKNOWN SPEECH, the application treats the call as a likely answering machine. The sequence below shows how the message is played immediately while listening for a BEEP. If a BEEP is detected during playback, the message is stopped and restarted from the beginning to ensure full delivery after the tone.
Key Decisions and Timeouts
The following table summarizes the key decision points and their associated timeouts for MRCP message delivery scenarios. These timing constraints are critical for ensuring messages are delivered promptly and in compliance with regulatory requirements.
| Decision Point | Timeout / Trigger | Action |
| How long to wait for CPA result? | CPA_UNKNOWN_SILENCE_TIMEOUT (5000ms) | CPA returns within ~5000ms maximum. |
| When to start playing message to a human? | Immediately on HUMAN RESIDENCE / HUMAN BUSINESS result | Message must play promptly after greeting completion. Consult legal counsel regarding FCC TCPA timing requirements for your use case. |
| When to start playing message on UNKNOWN SPEECH? | Immediately — do not wait for beep | Start SPEAK + RECOGNIZE in parallel; RECOGNIZE listens for BEEP during playback. |
| When to restart message? | On BEEP detection during playback | Stop current SPEAK, restart message from beginning. |
| When to give up on the call? | UNKNOWN SILENCE or SPEAK-COMPLETE with no BEEP | End call, schedule retry. |
Messages should be longer than typical voicemail greetings to ensure proper delivery after the beep.
MRCP Agent Connection (using grammars)
Purpose: Connect a live agent to a human caller as quickly as possible. Use CPA to determine if a human answered, then bridge the call to an available agent. If a machine answers, hang up — agents should only handle live humans.
Requires: CPA + AMD (parallel grammars)
Protocol: MRCP
This use case assumes an agent is available when the call is answered. It is the call center's responsibility to manage agent availability and pacing to ensure agents are ready when human calls connect.
Process Flow
MRCP Message Sequence — Agent Connection
The following MRCP message sequence shows the three-party interaction between the predictive dialer, the client application, and the LumenVox media server during an agent connection workflow. The sequence illustrates how CPA classification triggers the bridge to a live agent.
Predictive Dialer Optimization
In a predictive dialing environment, multiple calls are placed simultaneously across available call slots. Each call independently runs CPA analysis to classify the answering party. Only calls identified as live humans are bridged to available agents, while machine-answered calls are terminated immediately to maximize agent utilization.
Key Timing Decisions for Agent Connection
Agent connection workflows are time-sensitive due to regulatory requirements. The table below outlines the critical timing constraints that must be met to ensure compliance and optimal agent utilization.
| Decision Point | Timing Budget | Notes |
| CPA detection complete | ~1800ms (residence) to ~6200ms (silence timeout) | Faster is better for agent utilization. |
| Bridge to agent after CPA | Must be within 2 seconds of greeting completion | FCC TCPA compliance requirement. |
Time budget after VAD_EOS_DELAY | 2000ms – 1200ms = 800ms | This 800ms window is all that is available to bridge the call. |
| Agent queue wait time | Application-dependent | Must track abandonment rate (max 3% over 30 days). |
Apple Call Screening
Minimum version: Capacity Private Cloud 7.0
Apple Call Screening is a feature on iOS devices that intercepts incoming calls from unknown numbers. When an outbound call reaches an Apple device with call screening enabled, the device plays an automated announcement asking the caller to identify themselves. The caller's response is transcribed and displayed to the Apple user, who can then choose to accept the call, send it to voicemail, or drop it.
Purpose: Detect Apple Call Screening on outbound calls, deliver a screening payload for the Apple user to review, then handle the three possible outcomes: call dropped, voicemail, or human answered.
Requires: CPA + AMD (sequential grammar interactions)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)
How Apple Call Screening Works
When an outbound call reaches an iPhone with call screening enabled, the following sequence occurs on the Apple device side:
- The iPhone answers the call automatically and plays an announcement to the caller (e.g. "The person you are calling is screening calls. Please state your name and why you are calling.").
- The caller's response is transcribed using on-device speech recognition and displayed to the Apple user on screen.
- The Apple device plays a distinctive warble tone to signal that transcription is complete and the user is reviewing the message.
- The Apple user decides to: accept the call (human answers), decline (call dropped), or ignore (call goes to voicemail after timeout).
The Capacity CPA/AMD system detects each of these stages using dedicated grammars, enabling the client application to respond appropriately at each step.
Grammars Used
This workflow uses two specialized grammars in addition to the standard CPA and AMD grammars:
cpa_prompt_end.grxml— Detects the end of the Apple screening announcement using the CPA Prompt End feature. This setting is does not work with other cpa settings. It has to be used on its own.amd_screening_tone.grxml— Detects the Apple warble tone. When AMD returnsSCREENING, it confirms that Apple call screening is active and the user is reviewing the transcription.
Process Flow
MRCP Message Sequence — Apple Call Screening
Decision Points
| Decision | Condition | Action |
| UNKNOWN SPEECH detected | CPA returns UNKNOWN SPEECH after initial grammars | Apple screening announcement detected. Invoke cpa_prompt_end.grxml to detect end of announcement. |
| PROMPT END detected | CPA returns PROMPT END | Announcement complete. Deliver call screening payload (message/TTS). |
| SCREENING tone detected | AMD returns SCREENING after payload delivered | Apple warble tone confirmed. Create CPA interaction with 30000ms silence timeout and AMD tone detection to wait for outcome. |
| Call Dropped (Result A) | VAD_EVENT_TYPE_BARGE_IN_TIMEOUT returned | Apple user declined. End call. |
| Voicemail (Result B) | CPA returns UNKNOWN SPEECH, then AMD returns BEEP | Call went to voicemail. Deliver message after BEEP. End call. |
| Human Answered (Result C) | CPA returns HUMAN RESIDENCE or HUMAN BUSINESS | Apple user accepted. Deliver message or transfer to agent. End call. |
| Error | CPA returns UNKNOWN SILENCE or other unexpected result | Log error. End call. |
Timing Considerations
- Initial CPA/AMD detection: The first
RECOGNIZErequest will return UNKNOWN SPEECH typically within 3–5 seconds as the Apple screening announcement plays. - Prompt End detection: Returned promptly once the Apple announcement speech concludes.
- Screening tone detection: The warble tone is typically detected within a few seconds of the payload completing.
- Wait for answer: A
CPA_UNKNOWN_SILENCE_TIMEOUTof 30000ms (30 seconds) is recommended to allow sufficient time for the Apple user to review the transcription and decide. This timeout should be adjusted based on expected user response time.
End-to-end campaign flow
This diagram combines all use cases into a single comprehensive flow showing the complete lifecycle of an outbound call — from pre-answer tone detection through post-answer classification to final call disposition.
Reference
Timing and configuration quick reference
The table below provides a quick reference for how each detection feature behaves at runtime, including what triggers it to stop listening and when results are returned. This is useful for estimating detection latency and configuring appropriate timeouts.
| Feature | Trigger to Stop | Returns When |
| CPA | Speech ends + VAD_EOS_DELAY, or silence timeout | Speech classified OR timeout |
| AMD (Beep) | BEEP tone detected | Beep detected OR BARGE_IN_TIMEOUT |
| AMD (Fax) | FAX tone detected | Immediately on detection |
| AMD (SIT) | SIT tones detected | Immediately on detection (1–2 seconds) |
| AMD (Busy) | BUSY tone detected | Immediately on detection |
Grammar Configuration Templates
The following sample grammars are used with MRCP integration. Each is available for download — adjust the <meta> tag values to tune detection thresholds for your campaign. gRPC users configure equivalent settings directly in protobuf fields — see CPA & AMD: gRPC Integration.
CPA Grammar (CallProgressAnalysis.grxml)
Switches the detection mode to Call Progress Analysis and configures the speech-duration thresholds that classify each call as Human Residence, Human Business, Unknown Speech, or Unknown Silence.
Download: CallProgressAnalysis.grxml
AMD Grammar (ToneDetection.grxml)
Switches the detection mode to Tone and enables AMD, FAX, SIT, and BUSY tone detection — covering the seven SIT subtypes that indicate disconnected or invalid numbers. Use in parallel with the CPA grammar to detect tones across the full call lifecycle.
Download: ToneDetection.grxml
CPA Prompt End Grammar (cpa_prompt_end.grxml)
Detects the end of a recorded prompt — such as the Apple Call Screening announcement — by listening for the moment speech concludes. Used after an initial CPA result of Unknown Speech to signal when the screening payload should be delivered.
Download: cpa_prompt_end.grxml
AMD Screening Tone Grammar (amd_screening_tone.grxml)
Detects the distinctive Apple warble tone that signals call screening is active and the Apple user is reviewing the transcribed message. Returns SCREENING when the tone is detected.
Download: amd_screening_tone.grxml
CPA Unknown Silence Timeout Grammar (cpa_unknown_silence_timeout.grxml)
A variant of the standard CPA grammar with an extended CPA_UNKNOWN_SILENCE_TIMEOUT — used after the Apple screening tone is detected to give the Apple user enough time to review the transcription and decide whether to accept, decline, or send to voicemail.
Download: cpa_unknown_silence_timeout.grxml
