
The shape of a great support team is changing. The old playbook - hire a person for every thousand new users, layer in shifts to cover nights, then bolt on overflow vendors during launches - is being quietly retired. In 2026, the teams winning on CSAT, response time, and cost-per-ticket are not the largest. They are the ones that have wired modern AI agents into the front line and kept their human roster focused on judgement-heavy work.
This shift is not theoretical anymore. The frontier closed models from OpenAI, Anthropic, and Google now sit alongside a wave of capable open-weight rivals - DeepSeek V4, Moonshot Kimi K2.6, Z.ai's GLM-5.1, MiniMax M2, Alibaba's Qwen 3.6 family, and Xiaomi's MiMo-V2-Pro. Together they have collapsed the unit economics of automated support and raised the ceiling on what an agent can actually do. A platform like Berrydesk lets you stitch any of them into a branded support agent in a single afternoon.
The cost gap is no longer marginal
A single full-time support representative in North America or Western Europe costs somewhere between $4,500 and $7,500 per month once you include salary, benefits, training, tooling licences, and a slice of office overhead. Provide round-the-clock coverage and you pay for that role two or three times over because of shift premiums and weekend rotations. None of this is wasted spend - humans are still the right answer for ambiguous, sensitive, or high-stakes conversations - but it is an expensive way to handle "where is my order?" at 2 a.m.
AI agents have moved into a different cost regime. Routing routine traffic through DeepSeek V4 Flash at roughly $0.14 per million input tokens and $0.28 per million output tokens, a typical resolution costs a fraction of a cent. MiniMax M2, also open-weight, reports prices around eight percent of Claude Sonnet at roughly twice the throughput. A Berrydesk-style deployment can blend these tiers: cheap, fast open-weight models for the bulk of conversations, and a frontier model - Claude Opus 4.7, GPT-5.5 Pro, or Gemini 3.1 Ultra - held in reserve for the small slice of tickets that genuinely need top-tier reasoning. Companies that adopt this routed approach commonly report support cost reductions in the 40–80 percent range without any hit to quality scores.
What changed under the hood
The reason this is suddenly working is not a single model release. It is the combination of three trends that all matured at the same time.
Long context turned the knowledge base into a single prompt
Claude Opus 4.6 and Sonnet 4.6 ship with a one-million-token context window at no surcharge. Gemini 3.1 Ultra goes to two million. DeepSeek V4 Flash also gives you a million. That means your entire help centre, your refund policy, your last six months of conversation history with a specific customer, and your internal escalation playbook can all sit inside a single request. RAG is no longer a hard requirement to make a support agent answer correctly - it becomes a tuning lever you reach for when you want sharper retrieval, not the only way to keep the agent grounded.
Agentic tool use is finally production-grade
Until recently, "AI Actions" - booking a meeting, processing a refund, looking up an order, updating a billing record - were the part of every demo that quietly broke when traffic showed up. The 2026 generation of models has changed that. Kimi K2.6 can run autonomous sessions of up to twelve hours and orchestrate swarms of sub-agents across thousands of coordinated steps. GLM-5.1 was built explicitly for agentic engineering and runs an eight-hour plan-execute-test-fix loop. Claude Opus 4.7 leads SWE-bench Pro at 64.3 percent and translates that reasoning quality directly into reliable tool calls. For support, this means an AI agent can confidently issue a partial refund, swap a shipping address, or rebook an appointment without the operator-on-rails feeling that earlier bots had.
Open weights opened the door to regulated industries
GLM-5.1 ships under an MIT licence. Qwen3.6-27B is Apache 2.0. Xiaomi's MiMo weights are open. For banks, insurers, healthcare providers, and government, that combination - frontier-class capability plus permissive licensing - finally makes on-prem and air-gapped deployments tractable. A regional bank that could not have considered cloud AI two years ago can now run a Berrydesk agent on hardware that never leaves its data centre.
Performance, not just cost
The economic story is the headline, but it is not the only reason teams are restructuring.
Modern AI agents reply within a second or two and they do not get tired. There is no queue, no escalation back-and-forth on a Friday afternoon, no "I will check with my team and get back to you tomorrow." During launches, sales, or outage spikes, the agent handles ten times the usual traffic without flinching - and without a frantic Slack thread to find more humans.
Resolution rates have caught up to the marketing claims. With a properly trained agent and well-defined AI Actions, 70–85 percent of routine tickets - order status, password resets, plan changes, refund eligibility, scheduling - close without a human ever touching them. Quality stays consistent because the agent does not have a bad day, does not forget the new policy you rolled out last week, and does not vary its tone depending on how busy the queue is. When you change a refund rule, the change is live for every conversation immediately.
The pattern that actually works
Companies getting outsized results are not replacing their support team with one big model. They are running a layered system.
Tier one is an open-weight workhorse - DeepSeek V4 Flash, MiniMax M2, or Qwen3.6-35B-A3B - that handles the long tail of routine traffic at near-zero marginal cost. Tier two is a frontier model - Claude Opus 4.7 for nuanced policy questions, GPT-5.5 Pro for parallel reasoning on complex cases, Gemini 3.1 Ultra when an attached video or screenshot needs to be understood - that takes anything the first tier flags as ambiguous. Tier three is a small, senior human team that picks up the ten or fifteen percent of tickets where empathy, negotiation, or legal exposure means a human voice matters.
This layered model is exactly the shape Berrydesk is built for. You pick the model - GPT, Claude, Gemini, DeepSeek, Kimi, GLM, Qwen, MiniMax, and others are all available - train it on your docs, websites, Notion, Google Drive, or YouTube content, brand the chat widget, wire up AI Actions for booking and payments, and deploy to your website, Slack, Discord, WhatsApp, and beyond. There is no engineering team to assemble, no infrastructure to provision, and no vendor lock to a single model family.
What to watch out for
This is not a "switch it on and walk away" technology, and the teams that struggle are usually the ones who treated it that way.
Three pitfalls show up most often. First, agents trained on stale documentation will confidently give the wrong answer - the fix is to make doc updates the upstream forcing function, not the agent itself. Second, AI Actions need real guardrails: rate limits on refunds, monetary thresholds that require human approval, and audit logs your finance team can reconcile. Third, escalation logic deserves more attention than the happy path. The agent's job in a hard conversation is not to solve it; it is to recognise the moment to step aside, hand the human everything they need to be useful in five seconds, and stay out of the way.
Done well, this gives the human team the conversations that actually need humans, and gives the customer a faster, more consistent experience for everything else.
The new shape of a support org
The old growth curve - every additional 1,000 customers requires another rep - is breaking. The new curve looks more like a step function: you invest once in setting up an AI-first support layer, and your headcount only scales when the complexity of conversations changes, not when the volume does. Founders who used to plan for a fifty-person support team at Series B are running on six to ten people, with an AI agent absorbing the bulk of the load.
The question for support leaders in 2026 is no longer whether AI can do the job. It is which model mix to route to, which actions to wire up first, and how to redesign the human team's day around the work that genuinely needs them. The companies moving on those questions now are quietly setting the cost and quality benchmarks the rest of the market will be measured against next year.
If you are ready to build that layered support stack, Berrydesk is the fastest path from "we should look into this" to a live, branded agent answering tickets in production.
Launch your AI agent in minutes
- Train on your docs, websites, Notion, and Drive - no code required.
- Route routine tickets to open-weight models, escalate the hard ones to Claude or GPT-5.5.
Set up in minutes
Chirag Asarpota is the founder of Strawberry Labs, the team behind Berrydesk - the AI agent platform that helps businesses deploy intelligent customer support, sales and operations agents across web, WhatsApp, Slack, Instagram, Discord and more. Chirag writes about agentic AI, frontier model selection, retrieval and 1M-token context strategy, AI Actions, and the engineering it takes to ship production-grade conversational AI that customers actually trust.



