CPA & AMD Integration - MRCP

This article covers all MRCP-based integration scenarios for Capacity Private Cloud CPA and AMD. MRCP (Media Resource Control Protocol) supports both MRCPv2 via SIP and MRCPv1 via RTSP. Grammar-based configuration provides fine-grained control over detection parameters. For core concepts and architecture, see CPA & AMD Overview. For gRPC integration, see CPA & AMD: gRPC Integration.

Pre-Answer Tone Detection (SIT / Busy)

Purpose: Detect network tones before a call is answered to filter out invalid numbers and busy lines early — before consuming resources on CPA analysis.

Requires: AMD only (Tone detection mode)
Platform requirement: Must deliver early media (audio before answer)

Process Flow

Decision Points

Decision	Condition	Action
SIT detected	Any of 7 SIT subtypes	End call immediately. Number is invalid or disconnected. Remove from dial list.
BUSY detected	Busy tone detected	End call. Schedule retry after backoff period.
FAX detected	Fax tone after call answered	End call. Flag number as fax in CRM.
No tone, call answered	Timeout with no tone + answer event	Proceed to post-answer CPA/AMD analysis.
No tone, no answer	Timeout with no tone + no answer event	Dialer handles as "no answer" — retry later.

Timing

Busy detection: Returned as soon as a busy tone is detected.
SIT detection: Immediate — SIT tones appear within the first 1–2 seconds.
Fax detection: Usually immediate after the call is answered.
BARGE_IN_TIMEOUT: Configurable depending on how long the campaign is prepared to wait.

Message Delivery (using grammars)

Purpose: Deliver a pre-recorded or TTS message to the called party. If a human answers, play the message immediately. If an answering machine answers, wait for the beep then play the message.

Requires: CPA + AMD (parallel grammars in the same MRCP session)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)

Beep During Playback

In all message delivery scenarios — including Unknown Speech — always play the message immediately using MRCP SPEAK while running MRCP RECOGNIZE in parallel to listen for a BEEP. SPEAK and RECOGNIZE run in parallel on the same MRCP session. Set the BEEP timeout to be longer than the message length. If a beep is detected during message playback, stop the current SPEAK and restart the message from the beginning.

This approach means the message starts playing straight away and simply restarts if a beep is detected — there is no idle wait time.

Process Flow

Beep During Playback

This approach means the message starts playing straight away and simply restarts if a beep is detected — there is no idle wait time.

MRCP Message Sequence — Human Detected

The following MRCP message sequence illustrates the interaction between the client application and the LumenVox media server when CPA detects a live human (HUMAN RESIDENCE or HUMAN BUSINESS). The message is played immediately using SPEAK while RECOGNIZE runs in parallel to detect any BEEP that would indicate the call was actually answered by a machine.

MRCP Message Sequence — Unknown Speech (Machine Detected)

When CPA returns UNKNOWN SPEECH, the application treats the call as a likely answering machine. The sequence below shows how the message is played immediately while listening for a BEEP. If a BEEP is detected during playback, the message is stopped and restarted from the beginning to ensure full delivery after the tone.

Key Decisions and Timeouts

The following table summarizes the key decision points and their associated timeouts for MRCP message delivery scenarios. These timing constraints are critical for ensuring messages are delivered promptly and in compliance with regulatory requirements.

Decision Point	Timeout / Trigger	Action
How long to wait for CPA result?	`CPA_UNKNOWN_SILENCE_TIMEOUT` (5000ms)	CPA returns within ~5000ms maximum.
When to start playing message to a human?	Immediately on HUMAN RESIDENCE / HUMAN BUSINESS result	Message must play promptly after greeting completion. Consult legal counsel regarding FCC TCPA timing requirements for your use case.
When to start playing message on UNKNOWN SPEECH?	Immediately — do not wait for beep	Start `SPEAK` + `RECOGNIZE` in parallel; `RECOGNIZE` listens for BEEP during playback.
When to restart message?	On BEEP detection during playback	Stop current `SPEAK`, restart message from beginning.
When to give up on the call?	UNKNOWN SILENCE or `SPEAK-COMPLETE` with no BEEP	End call, schedule retry.

Messages should be longer than typical voicemail greetings to ensure proper delivery after the beep.

MRCP Agent Connection (using grammars)

Purpose: Connect a live agent to a human caller as quickly as possible. Use CPA to determine if a human answered, then bridge the call to an available agent. If a machine answers, hang up — agents should only handle live humans.

Requires: CPA + AMD (parallel grammars)
Protocol: MRCP

This use case assumes an agent is available when the call is answered. It is the call center's responsibility to manage agent availability and pacing to ensure agents are ready when human calls connect.

Process Flow

MRCP Message Sequence — Agent Connection

The following MRCP message sequence shows the three-party interaction between the predictive dialer, the client application, and the LumenVox media server during an agent connection workflow. The sequence illustrates how CPA classification triggers the bridge to a live agent.

Predictive Dialer Optimization

In a predictive dialing environment, multiple calls are placed simultaneously across available call slots. Each call independently runs CPA analysis to classify the answering party. Only calls identified as live humans are bridged to available agents, while machine-answered calls are terminated immediately to maximize agent utilization.

Key Timing Decisions for Agent Connection

Agent connection workflows are time-sensitive due to regulatory requirements. The table below outlines the critical timing constraints that must be met to ensure compliance and optimal agent utilization.

Decision Point	Timing Budget	Notes
CPA detection complete	~1800ms (residence) to ~6200ms (silence timeout)	Faster is better for agent utilization.
Bridge to agent after CPA	Must be within 2 seconds of greeting completion	FCC TCPA compliance requirement.
Time budget after `VAD_EOS_DELAY`	2000ms – 1200ms = 800ms	This 800ms window is all that is available to bridge the call.
Agent queue wait time	Application-dependent	Must track abandonment rate (max 3% over 30 days).

Apple Call Screening

Minimum version: Capacity Private Cloud 7.0

Apple Call Screening is a feature on iOS devices that intercepts incoming calls from unknown numbers. When an outbound call reaches an Apple device with call screening enabled, the device plays an automated announcement asking the caller to identify themselves. The caller's response is transcribed and displayed to the Apple user, who can then choose to accept the call, send it to voicemail, or drop it.

Purpose: Detect Apple Call Screening on outbound calls, deliver a screening payload for the Apple user to review, then handle the three possible outcomes: call dropped, voicemail, or human answered.

Requires: CPA + AMD (sequential grammar interactions)
Protocol: MRCP (MRCPv2 via SIP and MRCPv1 via RTSP)

How Apple Call Screening Works

When an outbound call reaches an iPhone with call screening enabled, the following sequence occurs on the Apple device side:

The iPhone answers the call automatically and plays an announcement to the caller (e.g. "The person you are calling is screening calls. Please state your name and why you are calling.").
The caller's response is transcribed using on-device speech recognition and displayed to the Apple user on screen.
The Apple device plays a distinctive warble tone to signal that transcription is complete and the user is reviewing the message.
The Apple user decides to: accept the call (human answers), decline (call dropped), or ignore (call goes to voicemail after timeout).

The Capacity CPA/AMD system detects each of these stages using dedicated grammars, enabling the client application to respond appropriately at each step.

Grammars Used

This workflow uses two specialized grammars in addition to the standard CPA and AMD grammars:

cpa_prompt_end.grxml — Detects the end of the Apple screening announcement using the CPA Prompt End feature. This setting is does not work with other cpa settings. It has to be used on its own.
amd_screening_tone.grxml — Detects the Apple warble tone. When AMD returns SCREENING, it confirms that Apple call screening is active and the user is reviewing the transcription.

Process Flow

MRCP Message Sequence — Apple Call Screening

Decision Points

Decision	Condition	Action
UNKNOWN SPEECH detected	CPA returns UNKNOWN SPEECH after initial grammars	Apple screening announcement detected. Invoke `cpa_prompt_end.grxml` to detect end of announcement.
PROMPT END detected	CPA returns PROMPT END	Announcement complete. Deliver call screening payload (message/TTS).
SCREENING tone detected	AMD returns SCREENING after payload delivered	Apple warble tone confirmed. Create CPA interaction with 30000ms silence timeout and AMD tone detection to wait for outcome.
Call Dropped (Result A)	`VAD_EVENT_TYPE_BARGE_IN_TIMEOUT` returned	Apple user declined. End call.
Voicemail (Result B)	CPA returns UNKNOWN SPEECH, then AMD returns BEEP	Call went to voicemail. Deliver message after BEEP. End call.
Human Answered (Result C)	CPA returns HUMAN RESIDENCE or HUMAN BUSINESS	Apple user accepted. Deliver message or transfer to agent. End call.
Error	CPA returns UNKNOWN SILENCE or other unexpected result	Log error. End call.

Timing Considerations

Initial CPA/AMD detection: The first RECOGNIZE request will return UNKNOWN SPEECH typically within 3–5 seconds as the Apple screening announcement plays.
Prompt End detection: Returned promptly once the Apple announcement speech concludes.
Screening tone detection: The warble tone is typically detected within a few seconds of the payload completing.
Wait for answer: A CPA_UNKNOWN_SILENCE_TIMEOUT of 30000ms (30 seconds) is recommended to allow sufficient time for the Apple user to review the transcription and decide. This timeout should be adjusted based on expected user response time.

End-to-end campaign flow

This diagram combines all use cases into a single comprehensive flow showing the complete lifecycle of an outbound call — from pre-answer tone detection through post-answer classification to final call disposition.

Reference

Timing and configuration quick reference

The table below provides a quick reference for how each detection feature behaves at runtime, including what triggers it to stop listening and when results are returned. This is useful for estimating detection latency and configuring appropriate timeouts.

Feature	Trigger to Stop	Returns When
CPA	Speech ends + `VAD_EOS_DELAY`, or silence timeout	Speech classified OR timeout
AMD (Beep)	BEEP tone detected	Beep detected OR `BARGE_IN_TIMEOUT`
AMD (Fax)	FAX tone detected	Immediately on detection
AMD (SIT)	SIT tones detected	Immediately on detection (1–2 seconds)
AMD (Busy)	BUSY tone detected	Immediately on detection

Grammar Configuration Templates

The following sample grammars are used with MRCP integration. Each is available for download — adjust the <meta> tag values to tune detection thresholds for your campaign. gRPC users configure equivalent settings directly in protobuf fields — see CPA & AMD: gRPC Integration.

CPA Grammar (`CallProgressAnalysis.grxml`)

Switches the detection mode to Call Progress Analysis and configures the speech-duration thresholds that classify each call as Human Residence, Human Business, Unknown Speech, or Unknown Silence.

Download: CallProgressAnalysis.grxml

AMD Grammar (`ToneDetection.grxml`)

Switches the detection mode to Tone and enables AMD, FAX, SIT, and BUSY tone detection — covering the seven SIT subtypes that indicate disconnected or invalid numbers. Use in parallel with the CPA grammar to detect tones across the full call lifecycle.

Download: ToneDetection.grxml

CPA Prompt End Grammar (`cpa_prompt_end.grxml`)

Detects the end of a recorded prompt — such as the Apple Call Screening announcement — by listening for the moment speech concludes. Used after an initial CPA result of Unknown Speech to signal when the screening payload should be delivered.

Download: cpa_prompt_end.grxml

AMD Screening Tone Grammar (`amd_screening_tone.grxml`)

Detects the distinctive Apple warble tone that signals call screening is active and the Apple user is reviewing the transcribed message. Returns SCREENING when the tone is detected.

Download: amd_screening_tone.grxml

CPA Unknown Silence Timeout Grammar (`cpa_unknown_silence_timeout.grxml`)

A variant of the standard CPA grammar with an extended CPA_UNKNOWN_SILENCE_TIMEOUT — used after the Apple screening tone is detected to give the Apple user enough time to review the transcription and decide whether to accept, decline, or send to voicemail.

Download: cpa_unknown_silence_timeout.grxml

Was this article helpful?

CPA & AMD Integration - MRCP

Pre-Answer Tone Detection (SIT / Busy)

Process Flow

Decision Points

Timing

Message Delivery (using grammars)

Beep During Playback

Process Flow

Beep During Playback

MRCP Message Sequence — Human Detected

MRCP Message Sequence — Unknown Speech (Machine Detected)

Key Decisions and Timeouts

MRCP Agent Connection (using grammars)

Process Flow

MRCP Message Sequence — Agent Connection

Predictive Dialer Optimization

Key Timing Decisions for Agent Connection

Apple Call Screening

How Apple Call Screening Works

Grammars Used

Process Flow

View full size

MRCP Message Sequence — Apple Call Screening

View full size

Decision Points

Timing Considerations

End-to-end campaign flow

Reference

Timing and configuration quick reference

Grammar Configuration Templates

CPA Grammar (CallProgressAnalysis.grxml)

AMD Grammar (ToneDetection.grxml)

CPA Prompt End Grammar (cpa_prompt_end.grxml)

AMD Screening Tone Grammar (amd_screening_tone.grxml)

CPA Unknown Silence Timeout Grammar (cpa_unknown_silence_timeout.grxml)

CPA Grammar (`CallProgressAnalysis.grxml`)

AMD Grammar (`ToneDetection.grxml`)

CPA Prompt End Grammar (`cpa_prompt_end.grxml`)

AMD Screening Tone Grammar (`amd_screening_tone.grxml`)

CPA Unknown Silence Timeout Grammar (`cpa_unknown_silence_timeout.grxml`)