CPA & AMD Integration Overview

This article introduces the core concepts behind Capacity Private Cloud Call Progress Analysis (CPA) and Answering Machine Detection (AMD) — two complementary technologies that work together to classify outbound call outcomes and optimize campaign performance. For implementation details, see the companion articles on MRCP Integration and gRPC Integration.

CPA vs AMD — What Each Does

CPA and AMD serve different but complementary roles in outbound call processing. CPA uses Voice Activity Detection (VAD) to measure speech duration and classify who answered the call. AMD uses Digital Signal Processing (DSP) to detect specific tones such as beeps, fax signals, and network errors.

Feature	CPA (Call Progress Analysis)	AMD (Answering Machine Detection)
Detection Mode	`DETECTION_MODE=CPA`	`DETECTION_MODE=Tone`
Technology	VAD — measures speech duration	DSP — detects specific tones
What it detects	Human Residence, Human Business, Unknown Speech, Unknown Silence	Beep, Fax, SIT, Busy
When it's useful	Classifying who or what answered the call	Detecting tones before answer (SIT/Busy) and after answer (Beep/Fax)
Grammar example (MRCP)	`CallProgressAnalysis.grxml`	`ToneDetection.grxml`

CPA Classifications

CPA classifies call outcomes based on how long the answering party speaks before pausing. The thresholds below determine which classification is returned. These values are configurable via grammar meta tags (MRCP) or protobuf fields (gRPC).

Classification	Speech Duration	Meaning
HUMAN RESIDENCE	< 1800ms	Short greeting — "Hello?"
HUMAN BUSINESS	1800ms – 3000ms	Longer greeting — "Thanks for calling XYZ, how may I help you?"
UNKNOWN SPEECH	> 3000ms	Likely answering machine or voicemail greeting
UNKNOWN SILENCE	No speech within 5000ms	No one speaking — possibly ringing with no answer

AMD Tone Types

AMD detects specific audio tones using digital signal processing. Each tone type indicates a different call outcome and determines the appropriate next action.

Tone	Meaning	Timing
BEEP	Answering machine ready for message	Post-answer, after voicemail greeting ends
FAX	Fax machine tones	Post-answer — usually immediate after answer
SIT	Special Information Tones (7 subtypes)	Pre-answer — network error or disconnected number
BUSY	Busy signal (400–680 Hz)	Pre-answer — configurable timeout (`BARGE_IN_TIMEOUT`)

Key Timing Parameters

The timing parameters below control how CPA and AMD behave during detection. These values can be tuned per campaign to balance detection speed against accuracy. All parameters are configurable via grammar meta tags (MRCP) or protobuf fields (gRPC).

Parameter	Default	Purpose
`VAD_EOS_DELAY`	1200ms	Silence after speech before end-of-speech is triggered
`CPA_HUMAN_RESIDENCE_TIME`	1800ms	Speech duration threshold for Human Residence classification
`CPA_HUMAN_BUSINESS_TIME`	3000ms	Speech duration above which the call is classified as machine
`CPA_UNKNOWN_SILENCE_TIMEOUT`	5000ms	No-speech timeout before returning Unknown Silence
`BARGE_IN_TIMEOUT`	Auto (CPA)	Maximum listening duration before forced timeout
`VAD_STREAM_INIT_DELAY`	0ms	Background noise calibration period

Pre-Answer vs Post-Answer Detection

Understanding the difference between pre-answer and post-answer detection is fundamental to designing effective outbound call flows. Different features apply at each stage of the call lifecycle.

Parallel CPA + AMD Architecture

CPA and AMD are designed to run in parallel on the same audio stream. Running them sequentially creates detection gaps where tones or speech could be missed. Both MRCP and gRPC support parallel processing.

Best practice: Always run CPA and AMD in parallel to ensure no detection gaps between tone analysis and speech classification.

MRCP: Single Session with Both Grammars

In MRCP, both CPA and AMD grammars are included in a single RECOGNIZE request. The media server analyses the audio with both grammars simultaneously. The first match from either grammar returns the result as a RECOGNITION-COMPLETE event.

gRPC: Single Session with Parallel Interactions

In gRPC, the architecture uses a session-based model. Audio is streamed into a session representing the phone call. Two parallel interactions — one for CPA and one for AMD — are created on the same session, both processing the same audio stream simultaneously. Audio streaming (AudioPush) begins before the interaction requests are sent.

When to Use CPA Only, AMD Only, or Both

Not every outbound call scenario requires both CPA and AMD. The table below provides guidance on which detection mode to use based on common campaign objectives.

Scenario	Recommended Mode	Reason
Pre-answer tone filtering	AMD only	Only tone detection works pre-answer. CPA requires speech.
Human vs machine classification	CPA only	CPA handles speech duration analysis. AMD detects tones, not speech patterns.
Message delivery to humans and machines	Both	CPA classifies who answered. AMD detects beep for voicemail message timing.
Agent connection (humans only)	Both	CPA identifies humans. AMD catches fax/SIT to avoid wasting agent time.
Beep detection only (call already classified as machine)	AMD only	If another system already classified the call as a machine, just wait for the beep.
Full outbound dialling pipeline	Both	Pre-answer: AMD for SIT/Busy. Post-answer: CPA for classification + AMD for beep.

Detection Trigger Reference

Each detection feature has a specific trigger that causes it to return a result. The table below summarizes what causes each feature to stop listening and report its finding.

Feature	Trigger to Stop	Returns When
CPA	Speech ends + `VAD_EOS_DELAY`, or silence timeout	Speech classified or timeout
AMD (Beep)	BEEP tone detected	Beep detected or `BARGE_IN_TIMEOUT`
AMD (Fax)	FAX tone detected	Immediately on detection
AMD (SIT)	SIT tones detected	Immediately on detection (1–2 seconds)
AMD (Busy)	BUSY tone detected	Immediately on detection

Was this article helpful?