
The bar for customer service has moved. Buyers compare your support experience to the best they've ever had - not to the average inside your category - and the gap between teams that have rebuilt their support stack around modern AI agents and teams that haven't is now visible in CSAT, deflection rates, and renewal numbers. The companies that get this right aren't bolting a chatbot onto a contact form. They're rewiring how service gets delivered.
The agents driving this shift in 2026 have very little in common with the scripted bots of five years ago. They are not menu trees with a thin language model on top. They run on frontier models like GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Ultra, or on the new wave of open-weight models from DeepSeek, Moonshot, Z.ai, Alibaba, MiniMax, and Xiaomi. They reason over million-token context windows, take actions inside your real systems through tool use, and learn the texture of your business from your docs, your help center, your Notion, your Drive, and your past tickets. The result is a support layer that handles routine work end-to-end and frees humans to do the parts that genuinely require humans.
This piece walks through what that looks like in four industries - banking and finance, healthcare, e-commerce, and travel and hospitality - and what it takes to deploy it without burning trust along the way.
Why 2026 Is The Inflection Point For AI Customer Service
A few things changed in the last twelve months that turned AI support from "interesting pilot" into "table stakes."
The first is reasoning quality. Claude Opus 4.7 leads SWE-bench Pro at 64.3% and brings the same long-horizon reasoning to support workflows; Gemini 3.1 Pro tops GPQA Diamond at 94.3% and is natively multimodal; GPT-5.5 Pro adds parallel reasoning for hard tickets. These models don't just answer questions - they plan multi-step resolutions, interrogate APIs, and decide when to escalate.
The second is context. Claude Opus 4.6 and Sonnet 4.6 ship with a 1M-token window at no surcharge. Gemini 3.1 Ultra goes to 2M tokens. DeepSeek V4 runs 1M. That means an agent can hold an entire knowledge base, a full conversation history with a customer going back two years, and your refund policy in a single prompt. Retrieval is still useful, but RAG is now a tuning lever rather than a hard architectural requirement.
The third - and the one most enterprise teams underestimate - is cost. Open-weight frontier models have collapsed the unit economics of running production support agents. DeepSeek V4 Flash, a 284B-parameter MoE with 13B active and a 1M context, is priced at $0.14 per million input tokens and $0.28 per million output. MiniMax M2 runs at roughly 8% the price of Claude Sonnet at twice the speed. Z.ai's MIT-licensed GLM-5.1 actually beats GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro. The economic implication for support is direct: you can route the long tail of routine tickets - order status, password resets, refund eligibility - to a fast, cheap open-weight model, and reserve Claude Opus 4.7, GPT-5.5, or Gemini 3.1 Ultra for the escalations that actually need them. A well-routed Berrydesk deployment now resolves routine tickets at fractions of a cent each.
The fourth is agentic tool use. Models like Kimi K2.6 (12-hour autonomous coding sessions, swarms of up to 300 sub-agents), GLM-5.1 (8-hour plan-execute-test-fix loops), Claude Opus 4.7, Qwen 3.6, and Xiaomi MiMo-V2-Pro have made multi-step tool calling reliable rather than demoware. AI Actions - booking a flight, processing a refund, looking up an order, scheduling a follow-up, taking a payment - work on the first try in production traffic, not just in scripted demos.
Add it up and you get something the previous generation of chatbots couldn't deliver: a 24/7 support agent that understands nuance, executes real work in your systems, runs at a price that makes blanket coverage feasible, and can be deployed on-prem or in an air-gapped environment when your compliance team requires it.
How Modern AI Agents Are Reshaping Four Customer-Facing Industries
The model layer is general; the value shows up at the seams between models, your data, and your existing systems. Here's what changes in practice in four industries that have moved fastest.
1. Banking And Finance
Financial institutions were among the earliest to see returns from AI support, and the math has only gotten better. Modeled across the sector, AI-driven support is now saving banks billions annually in handle-time costs, with the bigger gain hiding in fraud catch rates and Net Promoter scores. The lobby queue at your local branch is increasingly the long tail of a service journey that used to be the default.
Always-on transactional support. A modern agent handles the high-frequency requests that used to clog a contact center: card lock and unlock, password and PIN resets, transaction disputes, statement requests, account opening flows, KYC document upload, and form completion. With Claude Opus 4.6's 1M-token context window, the agent can hold the customer's last 18 months of transactions in memory while handling a dispute, which removes the awkward "hold while I pull up your account" moment that customers hate.
Real-time spend intelligence. AI agents plug into transaction streams to give customers a live view of where money is going - daily, weekly, monthly, and by merchant category. Push notifications fire when a charge looks anomalous compared to the customer's pattern. Customers can ask, in plain language, "how much did I spend on groceries last quarter and how does that compare to the quarter before?" and get a real answer, not a link to a report.
Personalized cross-sell that doesn't feel slimy. Banks have always had a long product shelf - credit cards, checking, savings, mortgages, auto loans, business lending, brokerage, insurance - and almost no customer knows what's on it. An agent that understands the customer's actual situation can recommend the rare product that genuinely fits, rather than carpet-bombing with generic offers. The personalization gets sharper because the model can ingest the customer's full relationship history in-context.
Goal-based financial guidance. Customers ask AI agents for help building a budget, sizing an emergency fund, or modeling whether they can afford a house in the next eighteen months. The agent isn't replacing a fiduciary advisor, but it's a credible first conversation - and it captures a moment of intent that used to be lost entirely.
Fraud detection on the conversation side. Voice and chat are now an attack surface. AI agents trained on social-engineering patterns spot the tells of an account-takeover attempt faster than a human agent juggling four tickets at once, and they trigger step-up authentication on the spot. Combined with transaction-side anomaly detection, this is one of the strongest ROI levers in the entire stack.
For regulated banks that can't send customer data to a US-hosted model, the open-weight options matter. GLM-5.1 (MIT license), Qwen3.6-27B (Apache 2.0, a dense model that punches well above its weight), and Xiaomi's MiMo-V2-Pro all support genuinely on-prem and air-gapped deployments. That isn't a theoretical option - it's the only way some of this is shippable.
2. Healthcare
Healthcare AI agents save the global system billions in administrative cost, but the more interesting effect is on access. A patient who can't get through to a triage nurse at 9pm gets a meaningful first interaction with an AI agent instead of a voicemail. A patient who feels embarrassed about a sensitive question gets to ask it without having to look someone in the eye. Anonymity, paired with real clinical knowledge in the model, lowers the floor of what counts as "available care."
Appointment scheduling end-to-end. Booking, rescheduling, and cancelling visits is the single highest-volume reason patients contact a health system. An AI agent that can read a provider's calendar, respect facility-specific scheduling rules, handle insurance pre-checks, and confirm by SMS turns a multi-touch process into a sixty-second conversation. Berrydesk customers in healthcare typically wire this up as an AI Action that calls the EHR's scheduling API directly.
Smarter intake and triage. With strong reasoning models, intake is no longer a flat questionnaire. The agent asks adaptive follow-ups based on what the patient just said, distinguishes a likely musculoskeletal issue from something cardiac, and decides whether the right next step is a same-day appointment, a telehealth slot, urgent care, or the ER. The handoff packet sent to the clinician is structured, complete, and saves real time on the other side.
Mental-health first contact. Models like Claude Opus 4.7 are notably better at reading emotional context than their predecessors. AI agents now do credible first-line work for stress, anxiety, and low-mood concerns - guided breathing, cognitive reframing prompts, journaling structures - while being unambiguous about when to route to a human. The bar here is high and the failure modes are real, so the agent's escalation policy needs to be built and tested by a clinical team, not by a vendor.
Post-visit follow-up and outcomes data. AI agents handle the short check-in calls that nobody had time to make - "how's the new medication working?", "any side effects?", "are you tracking against the rehab plan?" - and feed structured signal back into the chart. That data is gold for outcomes research, but it has historically been impossible to collect at scale.
Medication adherence. Refill reminders, dosage clarifications, drug-drug interaction warnings when a new prescription is added, and a patient-friendly explanation of why a medication matters. Multimodal agents can even interpret a photo of a pill bottle to help a confused patient confirm they're taking the right thing.
Billing, coverage, and claims. Insurance is where most patient frustration actually lives. An AI agent that can read a payer's coverage rules, walk a patient through a pre-auth, file a claim, or explain an EOB in plain English removes one of the most time-consuming parts of front-desk work and one of the most demoralizing parts of the patient experience.
3. E-commerce
Online retail has been the loudest adopter of AI agents, partly because the unit economics are so easy to defend: a single agent handles thousands of concurrent shoppers, drives measurable conversion lift, and reduces support headcount needs as catalogs scale. The interesting story in 2026 is that the AI agent is no longer just a support layer - it's a sales channel.
Sales as a conversation. Modern agents run the full top-of-funnel: capturing intent, qualifying the shopper, recommending products, handling objections, applying the right promo, and walking the customer to checkout. With agentic tool use, the agent can hold inventory, apply a discount, generate a personalized bundle, and process payment without ever bouncing the shopper to another page. This collapses the friction that kills conversion in traditional flows.
Recommendations that read context, not just history. Earlier recommender systems leaned on collaborative filtering. A 2026 agent reads the actual conversation: "I'm shopping for my mom, she's 70, she has arthritis, she likes gardening" is a richer signal than any click history. The model fuses that with prior purchase data to produce recommendations that feel like a friend's, not an algorithm's.
Real-time order tracking and proactive updates. Plug the agent into your OMS and 3PL APIs and it answers "where's my order?" with a live ETA, a carrier update, and an apology if a leg has slipped. Better still, it sends a proactive message before the customer asks. Proactive shipping updates are one of the highest-NPS interventions in e-commerce and the easiest to wire up.
Always-on, parallel response. A single Berrydesk agent on a fast open-weight model handles thousands of concurrent shoppers. During Black Friday or a flash sale, that capacity is the difference between captured revenue and a dropped basket. The tail of cheap inference is what makes blanket conversational coverage realistic - it would not pencil out on closed-frontier-only economics.
Returns, refunds, and exchanges. This used to be the slowest, most ticket-heavy part of e-commerce support. With AI Actions wired to the order management system, the agent looks up the order, validates the return window, prints the label, and confirms the refund - typically in under a minute, end-to-end. Berrydesk customers commonly cite returns automation as their first measurable ROI.
Voice of customer at scale. Every conversation is a structured signal. The agent surfaces emerging issues - a sizing problem on a new product, a confusing checkout step, a recurring complaint about a shipping carrier - to the merchandising and product teams. CSAT surveys collected inside conversation context get materially higher response rates than email blasts.
4. Travel And Hospitality
Travel is the natural home for a multi-step agentic experience: planning, booking, modification, in-trip support, post-trip follow-up. It's also a category where natural language is the right interface - nobody enjoys filling out flight-search forms.
Booking flights and rooms in conversation. A traveler describes the trip - "Lisbon for four nights in early September, two adults, prefer boutique hotels under €250 a night, direct flights from JFK if possible" - and the agent does the work, returning a curated set of options, handling availability and pricing checks, and confirming the booking. Multi-leg trips, multi-traveler bookings, and loyalty-program optimizations all stay inside the conversation.
Personalized itineraries. "What should I do in Kyoto for three days?" used to be a search-engine question. Now it's a conversation that surfaces neighborhoods, seasonal events, restaurants that match the traveler's dietary preferences, and a coherent day-by-day plan that fits actual transit times. With multimodal context windows, the agent can read a screenshot of an existing itinerary and improve it.
Visa eligibility and document checks. Visa rules are notoriously opaque and a meaningful share of applications still get rejected on technicalities. An AI agent trained on current consular requirements can pre-check a traveler's eligibility, flag missing documents, and explain edge cases (transit visas, dual citizenship, recent travel history) before the traveler invests a fee in a denied application.
Upsell that actually serves the guest. A late checkout when the agent knows your flight is at 8pm. An airport-transfer offer when it sees you've never been to the city. A spa slot on the rainy day in your itinerary. Marriott and others have publicly cited material upsell-revenue gains from AI-driven concierge agents - and the gains hold because the recommendations are situationally smart, not generic.
Multilingual support without trade-offs. Frontier and open-weight models in 2026 are genuinely fluent in dozens of languages, with reasoning quality that holds up outside English. A guest from São Paulo gets the same quality of service in Brazilian Portuguese that a guest from London gets in English. For an industry whose entire business is cross-border, this is a step change.
Live travel intelligence. The agent watches flight status, weather, strikes, and route disruptions, and reaches out before the traveler notices. "Your 6:40am flight tomorrow is showing a weather risk; here are three rebooking options I've held for fifteen minutes" is the kind of message that turns a one-time guest into a loyal one.
What To Watch Out For
The technology is good enough that the failure modes have moved up the stack. The pitfalls below are the ones that show up most consistently in deployments we see.
Hallucination on policy. A frontier model that's never read your refund policy will confidently invent one. The fix is not "use a smarter model"; it's grounding the agent in your real policy documents, evaluating against a hand-built test set of edge cases, and declining gracefully when the agent isn't sure.
Tool-call sprawl. It's tempting to give the agent twenty AI Actions on day one. Don't. Start with the three or four highest-volume actions, get them tight, and add capability as you build evals. An agent with five reliable tools beats one with twenty flaky ones.
Routing without measurement. Multi-model routing is one of the largest cost levers in this stack - but only if you measure it. Decide which ticket categories go to a cheap open-weight model and which need Claude Opus 4.7 or GPT-5.5, then track resolution rates and CSAT by tier. Without that loop you'll either overspend on frontier inference or underserve hard tickets.
Compliance as an afterthought. If you're in healthcare, financial services, or any regulated category, decide your data-residency and model-hosting posture before you pick a model. The MIT- and Apache-licensed open-weight families - GLM-5.1, Qwen3.6-27B, MiMo-V2-Pro - exist specifically to make on-prem and air-gapped deployments shippable. Use them when the compliance answer requires it.
Treating the agent as set-and-forget. Conversation logs are training data. Teams that review failed conversations weekly, fix the underlying knowledge or tool gap, and re-evaluate are the ones whose agents keep getting better. Teams that ship and forget watch their metrics drift.
Open-Weight Versus Closed Frontier: A Quick Trade-Off
Most production support stacks now use both. The rough rule of thumb:
- Open-weight models (DeepSeek V4 Flash, MiniMax M2, Qwen3.6-27B, GLM-5.1). Fast, cheap, easily self-hosted, strong on routine and high-volume queries. The right default for the bulk of your traffic and the only realistic option for regulated air-gapped environments.
- Closed frontier (Claude Opus 4.7, GPT-5.5 Pro, Gemini 3.1 Ultra). Strongest reasoning, the deepest tool-use reliability for complex multi-step actions, the best behavior on edge cases. Worth the higher cost on the small fraction of tickets that actually need it.
A good agent platform lets you run this routing as a configuration choice, not a re-platforming project.
Build Your AI Support Agent On Berrydesk
Berrydesk is built around the reality of the 2026 model landscape. You pick the model - GPT-5.5, Claude Opus 4.7, Gemini 3.1, DeepSeek V4, Kimi K2.6, GLM-5.1, Qwen 3.6, MiniMax M2, or others - and route between them based on the work. You train the agent on the sources your team already uses: docs, help center URLs, Notion, Google Drive, YouTube videos, and uploaded files. You brand the chat widget so it looks like part of your product. You wire AI Actions for bookings, refunds, payments, order lookups, and any other API your support touches. And you deploy to your website, Slack, Discord, WhatsApp, and the channels your customers actually use.
What you get is the support layer this article describes: 24/7 coverage, parallel conversation handling at near-zero marginal cost on the open-weight tier, multilingual fluency, real action execution, and the auditability your security team will ask for.
If your team is ready to move from a chatbot pilot to a production support agent, start building on Berrydesk. The free tier is enough to wire up a real workflow, and the model choice is yours from day one.
Launch your branded AI support agent in minutes
- Pick from GPT-5.5, Claude Opus 4.7, Gemini 3.1, DeepSeek V4, Kimi K2.6, GLM-5.1, Qwen 3.6, MiniMax M2, and more
- Train on your docs and deploy AI Actions for bookings, refunds, and payments
Set up in minutes
Chirag Asarpota is the founder of Strawberry Labs, the team behind Berrydesk - the AI agent platform that helps businesses deploy intelligent customer support, sales and operations agents across web, WhatsApp, Slack, Instagram, Discord and more. Chirag writes about agentic AI, frontier model selection, retrieval and 1M-token context strategy, AI Actions, and the engineering it takes to ship production-grade conversational AI that customers actually trust.



