How to Track NPS and CSAT from Call Conversations Using AI Voice Agents


Your post-call email survey has a 9% response rate, and the responses you do get skew toward your angriest and happiest customers. The silent majority, the 80% who feel "fine" or "mostly okay," never click through. You are making CX decisions based on a distorted sliver of feedback that misses the customers most likely to churn without warning.
This guide walks you through building AI voice agents that collect NPS and CSAT scores during live phone conversations, analyze sentiment from every call automatically, and route feedback to your CRM in real time. By the end, you will have a working feedback system built on Retell AI that captures scored and qualitative customer insight from 100% of your calls.
A phone-based AI feedback system that captures NPS and CSAT data within live customer conversations, then routes structured insights to your analytics stack without follow-up surveys or manual QA.
By the end of this tutorial, your system will:
Before you start, you'll need:
Before adding feedback logic, you need a working agent that can hold a natural conversation. Sign up at retellai.com and create a new AI voice agent from the dashboard. Choose a voice that fits your brand (the platform supports ultra-realistic ElevenLabs v3 voices with emotional expression). Configure a simple greeting and run a test call from the built-in phone simulator.
You should now hear your agent answer, speak naturally, and complete a basic interaction. This confirms your audio pipeline, latency (expect around 600ms end-to-end), and voice quality before layering on survey logic.
Your feedback questions need to arrive at the right moment: after the primary task is resolved but before the caller disengages. Use the agentic framework's drag-and-drop builder to create a conversation flow that handles the caller's core request first, then transitions into a feedback prompt.
Structure the flow as: greeting, task resolution, satisfaction check, NPS question, optional open-ended follow-up, closing. For CSAT, have the agent ask a single satisfaction question on a 1-to-5 scale after resolving the caller's issue. For NPS, ask the "how likely are you to recommend" question on a 0-to-10 scale. Keep both questions conversational: "Before I let you go, on a scale of 1 to 5, how satisfied were you with the help you received today?" works better than robotic survey language. Store both scores as variables in the call state for extraction later.
Explicit survey questions capture what customers say they feel. Post call analysis captures what they actually felt, through tone, pacing, word choice, and hesitation patterns. In the agent settings, enable the built-in analysis categories: sentiment (positive, neutral, negative), resolution status (resolved, unresolved, escalated), and caller intent. Then add custom categories specific to your feedback use case: "satisfaction score" (numerical extraction), "NPS score" (numerical extraction), "verbatim feedback" (text extraction), and "product mention" (Boolean flag).
The platform processes these after every call and makes them available via API and dashboard. You should now see structured analysis data appearing in your call logs within seconds of each test call ending.
Scores sitting in a dashboard do not improve customer experience. Configure webhook endpoints to push structured feedback data to your CRM and BI tools after every call. In the agent settings, set up a POST webhook that fires on call completion. The payload should include: caller phone number, call duration, NPS score, CSAT score, sentiment classification, resolution status, and the verbatim transcript excerpt containing qualitative feedback.
For teams using Make integration or n8n integration, build a workflow that receives the webhook, parses the JSON payload, and writes the data to your CRM contact record. Set the webhook timeout to 5 seconds, as CRM APIs can be slow during peak hours. You should now see feedback data appearing in your CRM within seconds of a test call ending.
A detractor score without follow-up is a wasted signal. Configure your conversation flow so that any NPS score of 0-6 or CSAT score of 1-2 triggers an immediate call transfer to a customer success rep with full conversation context. For calls outside business hours, have the webhook trigger an urgent ticket in your support system with the caller's details, their score, the verbatim reason, and a suggested response based on the issue category.
Set the escalation threshold carefully. Transferring after every low score overwhelms your team. Start by routing only scores of 0-4 on NPS to live reps, and flag scores of 5-6 for next-business-day outreach. You should now see detractor calls routing correctly to your team with complete context attached.
Callers share more honest feedback when the conversation feels informed and relevant. Connect a knowledge base that auto-syncs from your FAQ pages, product documentation, and policy guides. This gives the agent enough context to resolve the caller's primary issue well, which directly influences the satisfaction score they give afterward.
Upload your top 50 caller questions and answers as a starting set. The streaming RAG system ensures the agent pulls from current information during the call. You should now see the agent answering product and service questions accurately during test calls.
Run simulation tests covering: a satisfied caller who gives a 9 NPS and 5 CSAT, a neutral caller who gives a 7 NPS, a frustrated caller who gives a 3 NPS and triggers escalation, a caller who refuses to answer the survey, and a caller who provides detailed open-ended feedback. For each scenario, verify that scores are captured correctly in the call log, sentiment analysis matches the simulated tone, webhook payloads arrive in your CRM with correct values, and escalation triggers fire at the right thresholds.
Review every test call transcript for awkward transitions between the service portion and the survey portion. The handoff should feel like a natural extension of the conversation, not a mode switch. Fix any transitions that feel robotic before deploying.
Connect your phone system via SIP trunking or assign a Retell number to start handling live calls. During the first two weeks, review call transcripts daily, watching for survey question phrasing that confuses callers, high skip rates on specific questions, and mismatches between sentiment analysis and explicit scores. Set up a weekly review using the AI customer support analytics dashboard to track feedback volume, average scores, and trend lines.
Plan for a 2-week tuning period. Most teams see 70-80% survey completion rates in week one (compared to under 10% for email surveys), improving as you refine question timing and phrasing. By week three, you should have a statistically significant dataset that represents your entire caller population, not a self-selected sliver.
Asking for feedback before the caller's issue is resolved produces inaccurate scores and higher abandonment. Configure your agent to confirm resolution ("Is there anything else I can help with?") before transitioning to feedback. Callers who feel heard rate more honestly.
A caller who gives a CSAT of 4 but shows frustration markers (raised voice, repeated questions, long pauses) throughout the call is a churn risk that a score alone would miss. Cross-reference post-call sentiment with explicit scores weekly. Flag mismatches for manual review. This is where the platform's acoustic sentiment detection through tone, pacing, and pitch adds a layer that traditional surveys cannot replicate.
Asking both questions back-to-back creates survey fatigue. Place the CSAT question immediately after resolution (measures the interaction). Place the NPS question at the end of the conversation as a closing (measures the relationship). Space them by at least two conversational turns.
Do not ask every caller for a detailed verbatim response. Configure the agent to ask the open-ended "What could we improve?" question on 20-30% of calls, randomized. This gives you enough qualitative data without making every caller feel interrogated.
A prompt that sounds like "On a scale of zero to ten, how likely are you to recommend our company to a friend or colleague?" reads like a script from 2005. Rewrite it conversationally: "One last question. If a friend asked about us, how likely would you be to recommend us? Zero means not at all, ten means absolutely." Test multiple phrasings and compare completion rates.
The default sentiment categories (positive, neutral, negative) are a starting point, not a final configuration. A caller negotiating a price might register as "negative" even when they are a perfectly happy customer. Calibrate your sentiment thresholds using 50-100 real calls before trusting the automated scores for escalation triggers.
If your agent handles the primary task poorly, no amount of survey optimization will fix your CSAT. Track resolution rates alongside satisfaction scores. A call center automation strategy that tracks only feedback without tracking resolution is measuring the symptom, not the cause.
NPS measures long-term loyalty (would you recommend us?). CSAT measures immediate satisfaction (was this interaction good?). Tracking both on the same dashboard without separating the analysis leads to confused action plans. Build separate trend views and separate escalation workflows for each metric.
Routing every low score to an automated "we're sorry" email destroys trust. Detractors who score 0-4 need a human call-back within 24 hours with full conversation context. Automate the routing and context delivery. Keep the recovery conversation human.
Matic Insurance deployed AI voice agents for call workflow automation and maintained an NPS of 90 after AI deployment, while automating 50% of low-value tasks and reducing claims handle time from 12.4 to 5.8 minutes (a 53% reduction). The team handled 8,000+ calls in Q1 2025 with consistent satisfaction metrics. Read the full story.
Pine Park Health used AI voice agents for patient scheduling and saw a 38% increase in scheduling NPS while filling previously underutilized provider capacity. The combination of faster call resolution and consistent service quality drove the satisfaction improvement. Read the full story.
Medical Data Systems handles 100% of inbound calls with AI, with only a 30% transfer rate and approximately $280,000 per month collected. The ability to track sentiment and resolution on every call, not a sample, gave the team visibility into financial services customer experience that manual QA never provided.
NPS and CSAT tracking from call conversations means collecting satisfaction scores during or immediately after a phone call, rather than through a separate follow-up survey. AI voice agents ask the rating questions naturally within the conversation and capture both the numeric score and qualitative context from the interaction.
No. The no-code agentic framework includes pre-built templates for survey collection. You can configure feedback prompts, scoring logic, and webhook routing entirely through a drag-and-drop interface. Teams with developers can use the API for deeper customization, but it is not required.
Most teams go from signup to a live feedback collection agent in 3-5 days. Configuring the survey flow takes a few hours. Connecting webhooks to your CRM adds another day. The 2-week tuning period after launch is where you optimize question phrasing and escalation thresholds for your specific caller population.
The platform charges $0.07 per minute with no platform fees. A 3-minute call that resolves an issue and collects both NPS and CSAT costs about $0.21. Compare that to a dedicated AI answering service team member at $15-25 per hour doing the same work manually with a 10% survey response rate. Every account includes $10 in free credits to test the entire workflow.
AI sentiment analysis scores 100% of calls consistently, while manual QA typically reviews 2-5% of calls with inter-rater variability. The platform detects acoustic signals (tone, pacing, pitch) and contextual language patterns that human reviewers often miss at scale. Use the first 50 calls to validate AI sentiment against manual scores and calibrate thresholds.
Yes. For inbound calls handled by an AI IVR or support agent, embed the survey at the end of the resolved interaction. For outbound campaigns using batch call, add a CSAT question after the primary call objective (appointment confirmation, follow-up, or lead qualification). Both call types route feedback through the same post-call analysis pipeline.
The agent accepts the refusal gracefully and closes the call without pressing. Refusal rates are tracked as a separate metric. High refusal rates (above 30%) usually indicate poor question timing or phrasing, not caller unwillingness. Sentiment analysis still extracts satisfaction signals from the rest of the conversation, so you get partial insight even without an explicit score.
Retell AI is SOC 2 Type II certified and offers HIPAA compliance with a self-service BAA. For healthcare deployments, PII redaction can be configured to strip personally identifiable information from stored transcripts while retaining the feedback scores and anonymized sentiment data.
Traditional IVR surveys play after the agent hangs up, catching callers already headed for the exit. Response rates for AI appointment setter calls or support interactions typically fall below 10% via IVR. AI voice agents collect feedback during the conversation, when the caller is still engaged. Teams using this approach consistently report completion rates of 60-80%, capturing the "silent majority" that traditional surveys miss entirely.
You now have an AI voice agent system that collects NPS and CSAT during live phone conversations, analyzes sentiment from 100% of calls, routes structured feedback to your CRM in real time, and escalates detractor responses with full context for human follow-up.
To expand from here, consider deploying the same feedback flow across AI telemarketing outbound campaigns, building predictive churn models using the sentiment data, or connecting feedback trends to product and service changes to measure impact over time.
Start building free with $10 in usage credits at retellai.com.
See how much your business could save by switching to AI-powered voice agents.
Total Human Agent Cost
AI Agent Cost
Estimated Savings
A Demo Phone Number From Retell Clinic Office

Start building smarter conversations today.

