CPA & AMD Integration Overview

This article introduces the core concepts behind Capacity Private Cloud Call Progress Analysis (CPA) and Answering Machine Detection (AMD) — two complementary technologies that work together to classify outbound call outcomes and optimize campaign performance. For implementation details, see the companion articles on MRCP Integration and gRPC Integration.

CPA vs AMD — What Each Does

CPA and AMD serve different but complementary roles in outbound call processing. CPA uses Voice Activity Detection (VAD) to measure speech duration and classify who answered the call. AMD uses Digital Signal Processing (DSP) to detect specific tones such as beeps, fax signals, and network errors.

FeatureCPA (Call Progress Analysis)AMD (Answering Machine Detection)
Detection ModeDETECTION_MODE=CPADETECTION_MODE=Tone
TechnologyVAD — measures speech durationDSP — detects specific tones
What it detectsHuman Residence, Human Business, Unknown Speech, Unknown SilenceBeep, Fax, SIT, Busy
When it's usefulClassifying who or what answered the callDetecting tones before answer (SIT/Busy) and after answer (Beep/Fax)
Grammar example (MRCP)CallProgressAnalysis.grxmlToneDetection.grxml

CPA Classifications

CPA classifies call outcomes based on how long the answering party speaks before pausing. The thresholds below determine which classification is returned. These values are configurable via grammar meta tags (MRCP) or protobuf fields (gRPC).

ClassificationSpeech DurationMeaning
HUMAN RESIDENCE< 1800msShort greeting — "Hello?"
HUMAN BUSINESS1800ms – 3000msLonger greeting — "Thanks for calling XYZ, how may I help you?"
UNKNOWN SPEECH> 3000msLikely answering machine or voicemail greeting
UNKNOWN SILENCENo speech within 5000msNo one speaking — possibly ringing with no answer

AMD Tone Types

AMD detects specific audio tones using digital signal processing. Each tone type indicates a different call outcome and determines the appropriate next action.

ToneMeaningTiming
BEEPAnswering machine ready for messagePost-answer, after voicemail greeting ends
FAXFax machine tonesPost-answer — usually immediate after answer
SITSpecial Information Tones (7 subtypes)Pre-answer — network error or disconnected number
BUSYBusy signal (400–680 Hz)Pre-answer — configurable timeout (BARGE_IN_TIMEOUT)

Key Timing Parameters

The timing parameters below control how CPA and AMD behave during detection. These values can be tuned per campaign to balance detection speed against accuracy. All parameters are configurable via grammar meta tags (MRCP) or protobuf fields (gRPC).

ParameterDefaultPurpose
VAD_EOS_DELAY1200msSilence after speech before end-of-speech is triggered
CPA_HUMAN_RESIDENCE_TIME1800msSpeech duration threshold for Human Residence classification
CPA_HUMAN_BUSINESS_TIME3000msSpeech duration above which the call is classified as machine
CPA_UNKNOWN_SILENCE_TIMEOUT5000msNo-speech timeout before returning Unknown Silence
BARGE_IN_TIMEOUTAuto (CPA)Maximum listening duration before forced timeout
VAD_STREAM_INIT_DELAY0msBackground noise calibration period

Pre-Answer vs Post-Answer Detection

Understanding the difference between pre-answer and post-answer detection is fundamental to designing effective outbound call flows. Different features apply at each stage of the call lifecycle.

Parallel CPA + AMD Architecture

CPA and AMD are designed to run in parallel on the same audio stream. Running them sequentially creates detection gaps where tones or speech could be missed. Both MRCP and gRPC support parallel processing.

MRCP: Single Session with Both Grammars

In MRCP, both CPA and AMD grammars are included in a single RECOGNIZE request. The media server analyses the audio with both grammars simultaneously. The first match from either grammar returns the result as a RECOGNITION-COMPLETE event.

gRPC: Single Session with Parallel Interactions

In gRPC, the architecture uses a session-based model. Audio is streamed into a session representing the phone call. Two parallel interactions — one for CPA and one for AMD — are created on the same session, both processing the same audio stream simultaneously. Audio streaming (AudioPush) begins before the interaction requests are sent.

When to Use CPA Only, AMD Only, or Both

Not every outbound call scenario requires both CPA and AMD. The table below provides guidance on which detection mode to use based on common campaign objectives.

ScenarioRecommended ModeReason
Pre-answer tone filteringAMD onlyOnly tone detection works pre-answer. CPA requires speech.
Human vs machine classificationCPA onlyCPA handles speech duration analysis. AMD detects tones, not speech patterns.
Message delivery to humans and machinesBothCPA classifies who answered. AMD detects beep for voicemail message timing.
Agent connection (humans only)BothCPA identifies humans. AMD catches fax/SIT to avoid wasting agent time.
Beep detection only (call already classified as machine)AMD onlyIf another system already classified the call as a machine, just wait for the beep.
Full outbound dialling pipelineBothPre-answer: AMD for SIT/Busy. Post-answer: CPA for classification + AMD for beep.

Detection Trigger Reference

Each detection feature has a specific trigger that causes it to return a result. The table below summarizes what causes each feature to stop listening and report its finding.

FeatureTrigger to StopReturns When
CPASpeech ends + VAD_EOS_DELAY, or silence timeoutSpeech classified or timeout
AMD (Beep)BEEP tone detectedBeep detected or BARGE_IN_TIMEOUT
AMD (Fax)FAX tone detectedImmediately on detection
AMD (SIT)SIT tones detectedImmediately on detection (1–2 seconds)
AMD (Busy)BUSY tone detectedImmediately on detection

Was this article helpful?