CPA & AMD Integration - MRCP

This article covers all MRCP-based integration scenarios for Capacity Private Cloud CPA and AMD. MRCP (Media Resource Control Protocol) supports both MRCPv2 via SIP and MRCPv1 via RTSP. Grammar-based configuration provides fine-grained control over detection parameters. For core concepts and architecture, see CPA & AMD Overview. For gRPC integration, see CPA & AMD: gRPC Integration.

Pre-Answer Tone Detection (SIT / Busy)

Purpose: Detect network tones before a call is answered to filter out invalid numbers and busy lines early — before consuming resources on CPA analysis.

Requires: AMD only (Tone detection mode)
Platform requirement: Must deliver early media (audio before answer)

Process Flow

Decision Points

DecisionConditionAction
SIT detectedAny of 7 SIT subtypesEnd call immediately. Number is invalid or disconnected. Remove from dial list.
BUSY detectedBusy tone detectedEnd call. Schedule retry after backoff period.
FAX detectedFax tone after call answeredEnd call. Flag number as fax in CRM.
No tone, call answeredTimeout with no tone + answer eventProceed to post-answer CPA/AMD analysis.
No tone, no answerTimeout with no tone + no answer eventDialer handles as "no answer" — retry later.

Timing

  • Busy detection: Returned as soon as a busy tone is detected.
  • SIT detection: Immediate — SIT tones appear within the first 1–2 seconds.
  • Fax detection: Usually immediate after the call is answered.
  • BARGE_IN_TIMEOUT: Configurable depending on how long the campaign is prepared to wait.

Message Delivery (using grammars)

Purpose: Deliver a pre-recorded or TTS message to the called party. If a human answers, play the message immediately. If an answering machine answers, wait for the beep then play the message.

Requires: CPA + AMD (parallel grammars in the same MRCP session)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)

Beep During Playback

In all message delivery scenarios — including Unknown Speech — always play the message immediately using MRCP SPEAK while running MRCP RECOGNIZE in parallel to listen for a BEEP. SPEAK and RECOGNIZE run in parallel on the same MRCP session. Set the BEEP timeout to be longer than the message length. If a beep is detected during message playback, stop the current SPEAK and restart the message from the beginning.

This approach means the message starts playing straight away and simply restarts if a beep is detected — there is no idle wait time.

Process Flow


MRCP Message Sequence — Human Detected

The following MRCP message sequence illustrates the interaction between the client application and the LumenVox media server when CPA detects a live human (HUMAN RESIDENCE or HUMAN BUSINESS). The message is played immediately using SPEAK while RECOGNIZE runs in parallel to detect any BEEP that would indicate the call was actually answered by a machine.

MRCP Message Sequence — Unknown Speech (Machine Detected)

When CPA returns UNKNOWN SPEECH, the application treats the call as a likely answering machine. The sequence below shows how the message is played immediately while listening for a BEEP. If a BEEP is detected during playback, the message is stopped and restarted from the beginning to ensure full delivery after the tone.

Key Decisions and Timeouts

The following table summarizes the key decision points and their associated timeouts for MRCP message delivery scenarios. These timing constraints are critical for ensuring messages are delivered promptly and in compliance with regulatory requirements.

Decision PointTimeout / TriggerAction
How long to wait for CPA result?CPA_UNKNOWN_SILENCE_TIMEOUT (5000ms)CPA returns within ~5000ms maximum.
When to start playing message to a human?Immediately on HUMAN RESIDENCE / HUMAN BUSINESS resultMessage must play promptly after greeting completion. Consult legal counsel regarding FCC TCPA timing requirements for your use case.
When to start playing message on UNKNOWN SPEECH?Immediately — do not wait for beepStart SPEAK + RECOGNIZE in parallel; RECOGNIZE listens for BEEP during playback.
When to restart message?On BEEP detection during playbackStop current SPEAK, restart message from beginning.
When to give up on the call?UNKNOWN SILENCE or SPEAK-COMPLETE with no BEEPEnd call, schedule retry.

MRCP Agent Connection (using grammars)

Purpose: Connect a live agent to a human caller as quickly as possible. Use CPA to determine if a human answered, then bridge the call to an available agent. If a machine answers, hang up — agents should only handle live humans.

Requires: CPA + AMD (parallel grammars)
Protocol: MRCP

This use case assumes an agent is available when the call is answered. It is the call center's responsibility to manage agent availability and pacing to ensure agents are ready when human calls connect.

Process Flow


MRCP Message Sequence — Agent Connection

The following MRCP message sequence shows the three-party interaction between the predictive dialer, the client application, and the LumenVox media server during an agent connection workflow. The sequence illustrates how CPA classification triggers the bridge to a live agent.

Predictive Dialer Optimization

In a predictive dialing environment, multiple calls are placed simultaneously across available call slots. Each call independently runs CPA analysis to classify the answering party. Only calls identified as live humans are bridged to available agents, while machine-answered calls are terminated immediately to maximize agent utilization.


Key Timing Decisions for Agent Connection

Agent connection workflows are time-sensitive due to regulatory requirements. The table below outlines the critical timing constraints that must be met to ensure compliance and optimal agent utilization.

Decision PointTiming BudgetNotes
CPA detection complete~1800ms (residence) to ~6200ms (silence timeout)Faster is better for agent utilization.
Bridge to agent after CPAMust be within 2 seconds of greeting completionFCC TCPA compliance requirement.
Time budget after VAD_EOS_DELAY2000ms – 1200ms = 800msThis 800ms window is all that is available to bridge the call.
Agent queue wait timeApplication-dependentMust track abandonment rate (max 3% over 30 days).

Apple Call Screening

Minimum version: Capacity Private Cloud 7.0

Apple Call Screening is a feature on iOS devices that intercepts incoming calls from unknown numbers. When an outbound call reaches an Apple device with call screening enabled, the device plays an automated announcement asking the caller to identify themselves. The caller's response is transcribed and displayed to the Apple user, who can then choose to accept the call, send it to voicemail, or drop it.

Purpose: Detect Apple Call Screening on outbound calls, deliver a screening payload for the Apple user to review, then handle the three possible outcomes: call dropped, voicemail, or human answered.

Requires: CPA + AMD (sequential grammar interactions)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)

How Apple Call Screening Works

When an outbound call reaches an iPhone with call screening enabled, the following sequence occurs on the Apple device side:

  1. The iPhone answers the call automatically and plays an announcement to the caller (e.g. "The person you are calling is screening calls. Please state your name and why you are calling.").
  2. The caller's response is transcribed using on-device speech recognition and displayed to the Apple user on screen.
  3. The Apple device plays a distinctive warble tone to signal that transcription is complete and the user is reviewing the message.
  4. The Apple user decides to: accept the call (human answers), decline (call dropped), or ignore (call goes to voicemail after timeout).

The Capacity CPA/AMD system detects each of these stages using dedicated grammars, enabling the client application to respond appropriately at each step.

Grammars Used

This workflow uses two specialized grammars in addition to the standard CPA and AMD grammars:

  • cpa_prompt_end.grxml — Detects the end of the Apple screening announcement using the CPA Prompt End feature. This setting is does not work with other cpa settings. It has to be used on its own.
  • amd_screening_tone.grxml — Detects the Apple warble tone. When AMD returns SCREENING, it confirms that Apple call screening is active and the user is reviewing the transcription.

Process Flow

MRCP Message Sequence — Apple Call Screening

Decision Points

DecisionConditionAction
UNKNOWN SPEECH detectedCPA returns UNKNOWN SPEECH after initial grammarsApple screening announcement detected. Invoke cpa_prompt_end.grxml to detect end of announcement.
PROMPT END detectedCPA returns PROMPT ENDAnnouncement complete. Deliver call screening payload (message/TTS).
SCREENING tone detectedAMD returns SCREENING after payload deliveredApple warble tone confirmed. Create CPA interaction with 30000ms silence timeout and AMD tone detection to wait for outcome.
Call Dropped (Result A)VAD_EVENT_TYPE_BARGE_IN_TIMEOUT returnedApple user declined. End call.
Voicemail (Result B)CPA returns UNKNOWN SPEECH, then AMD returns BEEPCall went to voicemail. Deliver message after BEEP. End call.
Human Answered (Result C)CPA returns HUMAN RESIDENCE or HUMAN BUSINESSApple user accepted. Deliver message or transfer to agent. End call.
ErrorCPA returns UNKNOWN SILENCE or other unexpected resultLog error. End call.

Timing Considerations

  • Initial CPA/AMD detection: The first RECOGNIZE request will return UNKNOWN SPEECH typically within 3–5 seconds as the Apple screening announcement plays.
  • Prompt End detection: Returned promptly once the Apple announcement speech concludes.
  • Screening tone detection: The warble tone is typically detected within a few seconds of the payload completing.
  • Wait for answer: A CPA_UNKNOWN_SILENCE_TIMEOUT of 30000ms (30 seconds) is recommended to allow sufficient time for the Apple user to review the transcription and decide. This timeout should be adjusted based on expected user response time.

End-to-end campaign flow

This diagram combines all use cases into a single comprehensive flow showing the complete lifecycle of an outbound call — from pre-answer tone detection through post-answer classification to final call disposition.



Reference

Timing and configuration quick reference

The table below provides a quick reference for how each detection feature behaves at runtime, including what triggers it to stop listening and when results are returned. This is useful for estimating detection latency and configuring appropriate timeouts.

FeatureTrigger to StopReturns When
CPASpeech ends + VAD_EOS_DELAY, or silence timeoutSpeech classified OR timeout
AMD (Beep)BEEP tone detectedBeep detected OR BARGE_IN_TIMEOUT
AMD (Fax)FAX tone detectedImmediately on detection
AMD (SIT)SIT tones detectedImmediately on detection (1–2 seconds)
AMD (Busy)BUSY tone detectedImmediately on detection

Grammar Configuration Templates

The following sample grammars are used with MRCP integration. Each is available for download — adjust the <meta> tag values to tune detection thresholds for your campaign. gRPC users configure equivalent settings directly in protobuf fields — see CPA & AMD: gRPC Integration.

CPA Grammar (CallProgressAnalysis.grxml)

Switches the detection mode to Call Progress Analysis and configures the speech-duration thresholds that classify each call as Human Residence, Human Business, Unknown Speech, or Unknown Silence.

Download: CallProgressAnalysis.grxml

AMD Grammar (ToneDetection.grxml)

Switches the detection mode to Tone and enables AMD, FAX, SIT, and BUSY tone detection — covering the seven SIT subtypes that indicate disconnected or invalid numbers. Use in parallel with the CPA grammar to detect tones across the full call lifecycle.

Download: ToneDetection.grxml

CPA Prompt End Grammar (cpa_prompt_end.grxml)

Detects the end of a recorded prompt — such as the Apple Call Screening announcement — by listening for the moment speech concludes. Used after an initial CPA result of Unknown Speech to signal when the screening payload should be delivered.

Download: cpa_prompt_end.grxml

AMD Screening Tone Grammar (amd_screening_tone.grxml)

Detects the distinctive Apple warble tone that signals call screening is active and the Apple user is reviewing the transcribed message. Returns SCREENING when the tone is detected.

Download: amd_screening_tone.grxml

CPA Unknown Silence Timeout Grammar (cpa_unknown_silence_timeout.grxml)

A variant of the standard CPA grammar with an extended CPA_UNKNOWN_SILENCE_TIMEOUT — used after the Apple screening tone is detected to give the Apple user enough time to review the transcription and decide whether to accept, decline, or send to voicemail.

Download: cpa_unknown_silence_timeout.grxml


Was this article helpful?