Pricing Changes, Latency Improvements, One-Time SMS, and More
Upcoming Pricing Adjustments (Effective June 1st)
We’re making a few changes to our pricing model to better align with usage patterns and infrastructure costs. These updates will go into effect on June 1st:
Token-Based LLM Pricing LLM usage will now be billed based on token count. Prompts up to 3,500 tokens will follow the existing flat rate. Prompts exceeding 3,500 tokens will scale proportionally in cost. For example:
A prompt with 3,200 tokens will still be billed at the base rate.
A prompt with 4,500 tokens will be billed at
4500 / 3500 * base rate
Example with OpenAI GPT-4.1:
0.07 + (4500 / 3500 × 0.045) = 0.1279 per minute
Short Calls with Minimum Charge When Using Dynamic AI First Sentence To account for LLM usage, calls that start with a Dynamic AI First Sentence now have a minimum billable duration of 10 seconds.
Choosing “User speaks first” or “Static AI first sentence” (supports dynamic variables) are not affected.
Calls that last longer than 10 seconds are not affected by the minimum charge, even if you use a Dynamic AI First Sentence.
Unconnected calls (e.g., no answer, busy, or failed dial) will not be charged.
Further Latency and Stability Optimization
Latency and stability have always been core priorities at Retell. By leveraging the latest LLMs, we’re not only benefiting from improved performance and lower costs, but also applying smarter strategies to keep the platform fast, consistent, and reliable.
We’ve recently made additional optimizations to reduce latency and improve stability:
Cutting average response times by 100–200ms;
Proactively avoiding latency spikes during openAI’s peak traffic;
You can now register the SMS function using your Retell number and send SMS messages during a call. Simply add a function node in your conversation flow or use it within a single or multi-prompt.
More tutorials coming soon!
User Keypad Input Detection (User DTMF) 2.0
We’ve enhanced our keypad input feature to better support different data collection scenarios. You can now choose from multiple input-ending methods:
End with a special key: Let users press # or * to finish entering input. Useful for collecting variable-length information like ID numbers or notes.
Fixed digit length: Automatically end input after a specific number of digits. Perfect for fixed fields like the last 4 digits of an SSN.
Timeout-based ending: If no input is received after a short period, input collection will end automatically. This is enabled by default.
These options make it easier to customize voice agent behavior to fit your exact workflow.
You can now choose between two levels of noise cancellation:
Remove noise: Filters out background sounds.
Remove noise + background speech: Filters out both noise and unwanted speech, such as voices from TVs or nearby conversations.
Perfect for boosting transcription accuracy in noisy environments.
Equation-Based Transition for Conversation Flow
Equation-Based Transition for Conversation Flow is now live.
No need to write things like {{dynamic_variable}} === value in your transition prompt anymore.
Instead, use an equation-based transition for better results.
If you gather information during the conversation, simply select {{dynamic variable}} = value or string = value in your equation to help the LLM understand and route properly.
Retell MCP Server
Retell MCP Server now live:
Connect your favorite AI assistant (ChatGPT, Claude, Cursor, Grok, etc.) directly to Retell’s voice agent platform:
Trigger real phone calls through your AI assistant
Automate full conversations with human-like voice agents
Make X Retell: We’ve added native make integration into the Make APP Directory. Feel free to try it out here: https://www.make.com/en/integrations/retell-ai