Decagon Unveils Duet Autopilot, Launching Self-Improving Conversational AI Capabilities for Enterprise Customer Experience

Decagon, a pioneer in conversational AI agents optimized for high-touch customer experience concierge services, has announced the launch of Duet Autopilot. The solution represents an industry first: a customer experience (CX) agent infrastructure engineered to deliver fully automated, verifiable self-improvement capabilities over time.

The commercial rollout introduces a shift in how large enterprises maintain customer-facing AI agents. Historically, refining an LLM-based customer service system has been severely bottlenecked by manual engineering loops. Internal teams have had to manually sift through hundreds of transcript logs, interpret conversational feedback, develop behavioral fixes, and run regression tests by hand. Duet Autopilot removes these constraints by acting directly on production signals, converting live feedback into immediate, system-wide optimization.

Autopilot is a shift from building agents by hand to managing agents that improve themselves,” said Alan Yiu, VP of Product at Decagon. “Teams set the direction and review the work; Autopilot handles the diagnosing, testing, and editing that used to consume their week. Every fix compounds, which ultimately empowers businesses to provide their customers with a 24/7 AI concierge that gets measurably better with every interaction.”

Also Read: T-Mobile Debuts AI-Driven Dynamic CX Technology to Automate High-Density Network Performance

Setting an Evaluation Standard with DuetBench

To mathematically measure and validate the platform’s optimization capabilities, Decagon has simultaneously introduced DuetBench, the market’s first specialized testing benchmark designed to evaluate end-to-end agent self-improvement. Unlike traditional static benchmarks that merely assess whether a chatbot can resolve a fixed checklist of pre-programmed issues, DuetBench tests the system’s ability to safely implement verifiable behavioral changes. In initial performance evaluations against the benchmark, Duet Autopilot successfully passed 93% of complex diagnostic tasks, a threshold that exceeds average human baseline scores.

The platform executes this continuous optimization loop via three interconnected, native capabilities:

  • Automated Agent Improvement: The framework continuously transforms real-world production signals and conversation logs into recommended logic updates, automatically addressing issues ranging from immediate high-priority bottlenecks to micro-adjustments in phrasing.
  • Closed-Loop Self-Validation: Every suggested revision is automatically tested against the specific user interaction that triggered the initial error flag. Changes must also pass extensive regression testing matrices and a curated “golden set” of established corporate customer personas and user intents. If a code change creates an anomaly, the agent continues to iterate autonomously until it passes all verification metrics.
  • Enterprise Governance Guardrails: Operations teams maintain full oversight by embedding brand voice requirements, structural writing constraints, corporate compliance guidelines, and strict, off-limits rules. All agent modifications surface inside a transparent dashboard showing versioned history, identified errors, code diffs, and validation scores, requiring manual human approval before going live.

Because Duet Autopilot operates natively as a Decagon agent, it remains subject to its own recursive improvement engine. Every human reviewer correction, policy change, and successful system outcome feeds directly back into its operational model, allowing the technology to improve exponentially over its lifecycle.

Validated by Enterprise Early Adopters

The commercial release follows extensive field validation across a cohort of enterprise clients and strategic design partners operating in high-volume environments-including financial services, online retail, and consumer technology markets. These initial users have deployed the self-improving agent to track and optimize key call center metrics like resolution speeds, deflection rates, and multi-channel coverage.

“At our scale, manually reviewing conversations for errors isn’t an option,” said Matt McCollum, senior manager of customer experience at Opendoor. “Decagon Autopilot frees our team to focus on decisions rather than digging through logs. It surfaces what changed, what was considered, and why. That transparency is what makes AI actually trustworthy in production.”

Comments are closed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More