Stronger, Smarter Call Reliability: ASR + LLM Fallbacks

Stronger, Smarter Call Reliability: ASR + LLM Fallbacks
BACK TO BLOGS
ON THIS PAGE
Back to top

Retell's voice system works like a live chain with three core steps. And each one depends on the step before it.

ASR → LLM → TTS

ASR (automatic speech recognition) listens to the caller and turns their speech into text.

LLM (language model) reads that text, understands what was said, and decides what to say back.

TTS (text-to-speech) takes that response and turns it into spoken audio the caller hears.

So the flow is basically: caller speaks → ASR transcribes it → LLM generates a response → TTS speaks it back.

When someone is on a call, both the ASR and LLM systems are the critical systems working in real time. And both can fail, slow down, or behave unpredictably. We need to be able to rely on both systems working, so we've built a setup that's constantly monitoring, backing up, and switching in real time. And here's how the two work together and why these updates matter:

First, we watch for lag (not failure)

We're constantly asking one simple question: "Is the system keeping up with the conversation?" Every 0.1 seconds, we compare:

  • How much audio we've sent
  • How much audio has actually been processed

If the gap grows beyond 5 seconds, that's our signal: This provider is falling behind. Not dead. Not broken. But headed there. And that's when we act.

We keep a live "safety net" of your audio

As audio is being processed, we keep a rolling backup of anything that hasn't been fully handled yet. Think of it like this:

  • If the system confirms it processed something → we discard it
  • If it hasn't yet → we hold onto it

So at any moment, we have a perfect copy of the "in-between" audio, which is the part that's most at risk of getting lost. No guessing. No gaps.

Then we swap in a backup

The moment we detect lag, we don't wait around. We:

  • Spin up a backup provider (there's a priority order: fastest, closest, most reliable first)
  • Send over all that "in limbo" audio so it can catch up
  • Shut down the struggling provider

We also give the new provider a short grace period (~20 seconds) to stabilize before we start judging its performance.

The result?

The transcript just… keeps going. No jump. No rewind. No weird gaps. From the caller's perspective, nothing happened.

Why this matters

Most systems wait for a provider to fully crash before switching.

We don't. We catch the moment it starts to struggle and replace it before it ever becomes a problem.

Bottom line

  • We don't wait for failure
  • We detect slowdowns in real time
  • We preserve every second of audio
  • We switch providers seamlessly

So the conversation keeps flowing exactly the way it should.

LLM: from provider-level to deployment-level routing

How our LLM fallbacks work

Now on the response side, failure isn't so obvious. It's more subtle. For each LLM model (i.e. GPT-4.1) we have a set of "deployments" that serve that model. Think of a deployment as a physical data center. When we want to get the AI response from a model, we need to specify a particular deployment to send that request. Sometimes, requests can fail, whether there's an internet connectivity issue or a ton of traffic being sent by other people to the same deployment, which can get overloaded.

Under the hood, there are a few moving parts and can get a bit complex, but the core idea is simple:

We route to deployments that have lower latency

Ensures that responses are generated faster and reduces lag to keep conversations in sync in real time.

We constantly measure & monitor the error rate of each deployment

And we cut traffic to the ones that have high error rates.

We've got a need for speed when sending a request to a deployment

If it's slow, we don't wait. We send the request somewhere else and keep going until one responds.

May the best AI model win

If a model fails across multiple deployments, we don't keep trying it. We switch to another and keep going.

Bottom line

We're not relying on one model or one provider.

We're constantly:

  • routing to the best option
  • avoiding what's failing
  • racing for faster responses
  • and switching when needed

So even when systems have a bad day, your conversations don't.

ROI Calculator
Estimate Your ROI from Automating Calls

See how much your business could save by switching to AI-powered voice agents.

All done! 
Your submission has been sent to your email
Oops! Something went wrong while submitting the form.
   1
   8
20
Oops! Something went wrong while submitting the form.

ROI Result

2,000

Total Human Agent Cost

$5,000
/month

AI Agent Cost

$3,000
/month

Estimated Savings

$2,000
/month
Live Demo
Try Our Live Demo

A Demo Phone Number From Retell Clinic Office

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Read Other Blogs

Revolutionize your call operation with Retell