Berrydesk

Berrydesk

  • Home
  • How it Works
  • Features
  • Pricing
  • Blog
Dashboard
All articles
InsightsJune 1, 2026· 12 min read

8 Real Ways to Run Customer Support with Modern LLMs (Prompts Included)

Eight battle-tested patterns for using GPT-5.5, Claude Opus 4.7, and open-weight models like DeepSeek V4 in customer support - with copyable prompts.

A support agent's screen split between a human-drafted reply and an AI-generated one, with action buttons for refunds and order lookups

There are three honest ways large language models earn their keep in customer support today: as a private copilot that drafts replies for human agents, as a customer-facing chat agent embedded in your product, and as an automation layer that takes real actions on a customer's behalf. Most articles fixate on the first one because it is the easiest to demo. The interesting wins live in the other two.

This guide walks through eight concrete patterns across all three modes, with prompts you can paste directly into a model playground or wire into a production agent. We will also be specific about which model to reach for, because in May 2026 that decision actually matters - the gap between a $0.14-per-million-token open-weight model and a frontier reasoning model is real, and so is the cost difference at support scale.

Why the model lineup changed everything in 2026

A year ago, "use ChatGPT for support" effectively meant "use one of two OpenAI models, pay retail, and hope for the best." That landscape is gone. By May 2026 the production-ready roster looks more like a stable: GPT-5.5 and GPT-5.5 Pro from OpenAI for parallel-reasoning escalations, Claude Opus 4.7 leading SWE-bench Pro at 64.3% for anything code- or workflow-heavy, Sonnet 4.6 with a 1M-token context window at no surcharge, and Gemini 3.1 Ultra reaching out to 2M tokens with native multimodal input.

The bigger shift sits underneath those flagships. Open-weight frontier models from DeepSeek, Z.ai, Moonshot, MiniMax, Alibaba, and Xiaomi have collapsed the cost of routine traffic. DeepSeek V4 Flash runs at $0.14 / $0.28 per million input/output tokens with a 1M-token context. MiniMax M2 prices at roughly 8% of Claude Sonnet at twice the speed. GLM-5.1 from Z.ai is MIT-licensed and was trained entirely on Huawei Ascend 910B hardware - meaningful if you care about export controls or air-gapped deployment. Moonshot's Kimi K2.6 is built for agentic work and can run 12-hour autonomous coding sessions with up to 300 sub-agents.

For a support team, the upshot is concrete. You no longer have to pick one model and live with the bill. A well-built agent platform routes routine FAQ lookups to a cheap open-weight model, escalates ambiguous tickets to Claude Opus 4.7 or GPT-5.5, and reserves Gemini 3.1 Ultra for the rare cases where you need to ingest a 90-page contract or a screen recording. Berrydesk does this routing as a configuration option, not a custom build.

1. Draft replies faster (the agent-assist pattern)

The most common entry point: a support agent pastes a customer message and gets a draft response back. It is unglamorous and it works. Senior agents typically report 50–70% time savings on first drafts, and onboarding agents lean on it as a coach.

Prompt:

You are a customer support agent for [Company]. The customer wrote:

[paste customer message]

Write a reply that:
- Acknowledges the specific concern (do not paraphrase generically)
- Provides a clear next step or solution
- Offers a follow-up path if the issue might recur
- Stays under 100 words and matches a [warm/neutral/formal] tone

Two practical refinements. First, give the model a short style guide as a system prompt - three or four bullets about voice, banned phrases, and how to handle compensation - and quality jumps immediately. Second, route this drafting workload to a fast, cheap model like DeepSeek V4 Flash or MiniMax M2; you do not need a flagship to draft "your package shipped late, here is a $10 credit." Save the expensive tokens for the harder cases.

2. Handle multilingual tickets without a multilingual team

Modern frontier models cover 50-plus languages with cultural awareness, not just literal translation. A Japanese customer asking about a delayed shipment gets a reply with appropriate honorifics; a Brazilian Portuguese follow-up reads like it came from a São Paulo team. For most teams, that is the difference between hiring four regional support hires and not.

Prompt:

Translate this customer service response into [language] for a customer
in [country/region]. Preserve the warmth and the specific compensation
offer. Adjust formality to local norms. Do not add or remove information.

[paste response]

A subtle point: for Asian languages, Qwen3.6-Plus and the GLM-5.1 family often produce more natural output than Western models, because they were trained with a heavier weighting on Chinese, Japanese, and Korean corpora. A routing-aware platform can send Japanese tickets to Qwen, Spanish tickets to Claude or GPT, and keep quality consistent across regions without you babysitting prompts.

3. Onboard new agents in days, not weeks

A new hire who can ask any question about return policy, escalation paths, or "how do I handle this exact scenario" - without interrupting a senior teammate - ramps faster. The trick is to feed the model your actual policies as context, not let it wing it.

Prompt:

You are an onboarding coach for [Company] customer support.

Our return policy: [paste policy]
Escalation rules: [paste process]
Tone guidelines: [paste voice doc]

A new agent asks: "[paste question]"

Explain how to handle this strictly according to our policies. If a
policy does not cover the scenario, say so and recommend who to escalate
to. Do not invent rules.

The "do not invent rules" line is doing real work. Without it, the model will confidently fill gaps with generic best practices. Inside Berrydesk you would typically wire this up by uploading your handbook and policy docs as a knowledge source so the agent answers the trainee from your real material instead of from the model's training prior.

4. Summarize long ticket threads on handoff

Tickets that bounce between agents, time zones, or specialists become unreadable fast. A 40-message thread is dead weight on the next person. A summary takes seconds.

Prompt:

Summarize this customer support thread in five sections:

1. Customer's original problem (one sentence)
2. What has been tried, in chronological order (bulleted, max 5)
3. What worked and what did not (one sentence each)
4. Current blocker
5. Recommended next action and why

Thread:
[paste conversation]

This is exactly the kind of task where a long-context model earns its keep. Sonnet 4.6 at 1M tokens or Gemini 3.1 Ultra at 2M can hold an entire multi-week ticket history plus your knowledge base in one prompt without any retrieval gymnastics. For shift handoffs, it is the difference between "I read the last three messages" and "I actually know what is going on."

5. Deploy a customer-facing agent that knows your business

This is where the value step-changes. Instead of a human agent copy-pasting prompts all day, you embed an always-on agent in your website, app, Slack, Discord, or WhatsApp that already knows your products, policies, and tone.

A platform like Berrydesk lets you:

  • Train on your sources. Upload PDFs, point it at your help center, sync Notion pages, connect a Google Drive folder, or pull a YouTube channel for transcripts. The agent answers from your material, not from a generic prior.
  • Pick a model - or several. Choose GPT-5.5, Claude Opus 4.7 or Sonnet 4.6, Gemini 3.1, DeepSeek V4, Kimi K2.6, GLM-5.1, Qwen 3.6, or MiniMax M2 depending on your accuracy, latency, and cost targets. Switch any time.
  • Brand the chat surface. Match your widget's colors, copy, avatar, and voice. The agent introduces itself as part of your team, not as a third-party bot.
  • Watch the analytics. See what customers actually ask, where the agent hesitates, what content is missing, and which conversations end in escalation.

The contrast with raw ChatGPT is stark. A vanilla model will confidently invent a return policy, recommend a feature you do not ship, or quote pricing that has not existed for two years. A grounded agent answers from your material and says "I'm not sure, let me get a teammate" when it should. Teams typically see ticket volume drop 40–60% within the first month, with most of the deflection coming from the same 20 questions a human team answered a thousand times that month.

6. Automate email and in-app support proactively

A customer-facing chat widget is one channel. The same agent, addressed via API, can do more:

  • Email auto-replies. Parse the incoming email, classify the intent, draft a response grounded in your knowledge base, and either send automatically for high-confidence categories (shipping status, password reset) or queue for human review on anything sensitive (refunds, complaints, account changes).
  • In-app contextual help. When a user lingers on a complicated screen, opens settings three times in a row, or hits an error state, trigger an inline help message that already knows what they were trying to do. The model gets the user's recent actions as context and proactively offers the right answer instead of waiting for them to type a question.
  • Proactive outreach on confusion patterns. A user who has searched the help center twice and not contacted support is a churn risk. A scheduled job can ask the agent to draft a personalized check-in.

This is the kind of multi-channel coverage no human team can run at scale. The economics work because routine traffic - easily 70–80% of inbound - gets resolved by a $0.14-per-million-token model that never sleeps.

7. Take real actions, not just answer questions

The pattern most teams underuse. With function calling, your agent can do things in your real systems instead of telling the customer where to click.

With Berrydesk's AI Actions, an agent can:

  • Cancel or pause subscriptions through Stripe, Recurly, or your billing system
  • Issue refunds within your policy thresholds, escalating anything outside them
  • Look up order status in Shopify, Magento, or a custom commerce backend
  • Update customer records in HubSpot, Salesforce, or your CRM
  • Open or update tickets in Zendesk, Freshdesk, Intercom, or Linear
  • Book appointments through your calendar provider
  • Process payments for upgrades, add-ons, or reactivations
  • Hand off to a human with a full conversation summary, the customer's account state, and any actions already taken

A representative end-to-end flow:

Customer: "I want to cancel my subscription."

Agent:
1. Verifies identity via email magic link or order number
2. Pulls account state - plan, billing cycle, recent usage
3. Asks the cancellation reason (one short multi-choice question)
4. Offers a retention path that fits the reason (pause, downgrade, credit)
5. If the customer still wants to cancel, calls the Stripe API to cancel
   at end of billing period
6. Confirms in-chat and sends a written receipt by email
7. Logs the reason in the analytics dashboard for the retention team

No human touched the conversation. The customer got an instant resolution. The retention team got a clean data point. This was demoware in 2024; in 2026, agentic models like Claude Opus 4.7, Kimi K2.6, GLM-5.1, and Qwen3.6 have made multi-step tool use reliable enough to put in production, with safeguards. The honest caveat: you still need to set policy boundaries (refund caps, escalation triggers, action whitelists) and audit a sample of action runs in the first weeks. Berrydesk surfaces every action call in the conversation log so this audit is straightforward, not a forensic exercise.

8. Move the metrics that matter

When the patterns above are deployed together, the support KPIs move in a coordinated way. Realistic ranges from teams running grounded, agentic AI support in 2026:

  • First response time: hours → seconds
  • Average resolution time: 24+ hours → minutes for routine tickets, hours for escalations
  • First contact resolution: ~65% → 80%+
  • CSAT: mid-70s → upper 80s, sometimes higher
  • Ticket volume reaching humans: baseline → 40–60% lower
  • Cost per resolution: $4–$8 → cents for AI-resolved tickets

The shape of these numbers matters more than the exact figures. AI does not replace humans; it absorbs the high-volume, low-judgment tickets so humans can focus on the conversations where their judgment actually compounds - angry escalations, billing edge cases, retention saves, and anything that needs empathy.

ChatGPT as a tool vs. an API-powered agent platform

Pasting prompts into a chat window is fine for occasional drafting. Running production support requires a platform. The difference, in practice:

Raw ChatGPT (or any chat UI)API-powered agent (Berrydesk)
TriggerManual paste, every timeAlways on, every channel
MemoryNone across sessionsFull conversation + customer history
KnowledgeGeneric training dataGrounded in your docs, sites, Notion, Drive, YouTube
ChannelsOne person, one browserWebsite, Slack, Discord, WhatsApp, email, in-app
ActionsNoneRefunds, bookings, lookups, CRM writes, payments
Model choiceOne vendorRoute across GPT-5.5, Claude, Gemini, DeepSeek, Kimi, GLM, Qwen, MiniMax
AnalyticsNoneConversations, deflection, gaps, escalations
BrandOpenAI'sYours

Think of the model as the engine. The platform is the car: chassis, dashboard, safety, and the wheel you actually steer with.

A few prompts you can copy directly

Apologize for a delay:

Write a sincere apology for a shipping delay. The order was supposed to
arrive [date] but is now expected [new date]. The reason is [reason] -
include or omit it depending on whether it reflects well on us. Offer
[compensation, if applicable]. 80 words or fewer. Do not over-apologize.

Handle an angry customer:

A customer is frustrated because [issue]. Write a reply that:
- Validates their frustration in one specific sentence (no generic
  "we're sorry to hear that")
- Takes responsibility without blame-shifting to a partner or carrier
- Provides three concrete next steps with clear ownership
- Offers compensation if the policy allows up to [limit]
Tone: empathetic, professional, never sycophantic.

Explain a technical issue to a non-technical customer:

Explain [technical issue] to a non-technical customer. No jargon. Use a
concrete analogy if it helps. If they need to take steps, list them
clearly numbered. End with what they should expect to see when it works.

Suggest an upgrade without being pushy:

A customer asked about [feature/product]. They might benefit from
[higher tier or add-on] because [reason]. Write a reply that answers
their question fully first, then mentions the upgrade in one optional
sentence at the end. If the answer is "no upgrade needed," say that.

Common pitfalls (and how to avoid them)

"The model wasn't built for support." Correct, in the same way a brand-new chef was not built for your kitchen. The fix is not to hire a different chef but to give them your menu, your tools, and your house rules. In LLM terms: ground the agent in your real content, set guardrails, restrict actions to a whitelist, and add a human handoff path. This is the platform's job, not the model's.

"What if it hallucinates?" Hallucinations almost always come from one of three places: (1) the model is answering from prior knowledge instead of your sources, (2) your sources are missing the answer and the model is improvising, or (3) the question is genuinely ambiguous. Grounded retrieval and a "say I don't know" instruction handle the first two. For the third, route to a human. Properly configured agents in 2026 sit comfortably above 95% answer accuracy on the questions they choose to answer.

"Customers prefer humans." Surveys consistently show the opposite for routine questions: most customers prefer a fast, accurate bot for "where is my order" and a human for "I need to talk about my account." Smart routing gives them both. The mistake is forcing one experience on every ticket.

"This will replace our team." It will not, and if your vendor pitches it that way, push back. The realistic outcome: your existing team handles the same headcount of complex, judgment-heavy tickets while the routine queue gets absorbed by the agent. Headcount typically holds steady or shifts toward escalation specialists, content writers (who keep the knowledge base sharp), and AI operations.

"Open-weight models are too risky for production." They were a year ago. In May 2026, GLM-5.1, Qwen3.6, DeepSeek V4, MiniMax M2, and Kimi K2.6 are sitting at or above frontier closed models on multiple benchmarks. The right question is not "open or closed" but "which model for which workload." Berrydesk's routing makes this a setting, not a re-architecture.

Open-weight vs. closed-frontier: pick a routing strategy

A simple, durable pattern that holds for most support workloads:

  • Tier 1 - high-volume, low-stakes traffic. FAQ answers, order lookups, password resets, routine status updates. Send to a cheap open-weight model: DeepSeek V4 Flash, MiniMax M2, or Qwen3.6-27B. Cents per thousand resolutions, sub-second latency.
  • Tier 2 - judgment calls and policy edges. Refund decisions, account changes, multi-step troubleshooting. Send to Claude Sonnet 4.6 or GPT-5.5 - strong reasoning, clean tool use, predictable.
  • Tier 3 - escalations and rare hard cases. Complex disputes, sensitive complaints, anything legally adjacent. Reserve for Claude Opus 4.7 or GPT-5.5 Pro, and route to a human if confidence is below threshold.
  • Specialty - long-context or multimodal. Customer sends a 60-page contract or a screen recording? Gemini 3.1 Ultra at 2M tokens, native video.

This approach typically cuts inference cost by an order of magnitude versus running everything on a frontier model, with no measurable accuracy drop on the routine tier.

Get started

You can stand up an agent on Berrydesk in four steps:

  1. Pick a model. Or pick several and let Berrydesk route. GPT-5.5, Claude Opus 4.7, Gemini 3.1 Ultra, DeepSeek V4, Kimi K2.6, GLM-5.1, Qwen 3.6, MiniMax M2 are all one click.
  2. Train on your data. Upload docs, point it at your website, sync Notion, connect Google Drive, or pull YouTube transcripts. No copy-paste, no embedding pipeline to build.
  3. Brand the widget. Colors, copy, avatar, voice. Make it feel like part of your product, not a bolt-on.
  4. Add AI Actions and deploy. Wire up refunds, bookings, lookups, payments, and CRM writes. Ship to your website, Slack, Discord, WhatsApp, or all of them at once.

Most teams have a working agent live the same afternoon. From there, the work is the same as any support program: watch the analytics, fix the gaps, and let the model take more off your team's plate every week.

If you want to try it on your own content, head to berrydesk.com and build one for free - no credit card, and you can swap models whenever the landscape moves again. Which, in 2026, it will.

#ai-customer-support#llm-prompts#support-automation#ai-agents#ai-actions

On this page

  • Why the model lineup changed everything in 2026
  • 1. Draft replies faster (the agent-assist pattern)
  • 2. Handle multilingual tickets without a multilingual team
  • 3. Onboard new agents in days, not weeks
  • 4. Summarize long ticket threads on handoff
  • 5. Deploy a customer-facing agent that knows your business
  • 6. Automate email and in-app support proactively
  • 7. Take real actions, not just answer questions
  • 8. Move the metrics that matter
  • ChatGPT as a tool vs. an API-powered agent platform
  • A few prompts you can copy directly
  • Common pitfalls (and how to avoid them)
  • Open-weight vs. closed-frontier: pick a routing strategy
  • Get started
Berrydesk logoBerrydesk

Launch your AI agent in minutes

  • Pick from GPT-5.5, Claude Opus 4.7, Gemini 3.1, DeepSeek V4, Kimi K2.6, and more - Berrydesk routes between them for you
  • Train on docs, websites, Notion, Drive, and YouTube, then ship AI Actions for refunds, bookings, and lookups
Build your agent for free

Set up in minutes

Share this article:

Chirag Asarpota

Article by

Chirag Asarpota

Founder of Strawberry Labs - creators of Berrydesk

Chirag Asarpota is the founder of Strawberry Labs, the team behind Berrydesk - the AI agent platform that helps businesses deploy intelligent customer support, sales and operations agents across web, WhatsApp, Slack, Instagram, Discord and more. Chirag writes about agentic AI, frontier model selection, retrieval and 1M-token context strategy, AI Actions, and the engineering it takes to ship production-grade conversational AI that customers actually trust.

On this page

  • Why the model lineup changed everything in 2026
  • 1. Draft replies faster (the agent-assist pattern)
  • 2. Handle multilingual tickets without a multilingual team
  • 3. Onboard new agents in days, not weeks
  • 4. Summarize long ticket threads on handoff
  • 5. Deploy a customer-facing agent that knows your business
  • 6. Automate email and in-app support proactively
  • 7. Take real actions, not just answer questions
  • 8. Move the metrics that matter
  • ChatGPT as a tool vs. an API-powered agent platform
  • A few prompts you can copy directly
  • Common pitfalls (and how to avoid them)
  • Open-weight vs. closed-frontier: pick a routing strategy
  • Get started
Berrydesk logoBerrydesk

Launch your AI agent in minutes

  • Pick from GPT-5.5, Claude Opus 4.7, Gemini 3.1, DeepSeek V4, Kimi K2.6, and more - Berrydesk routes between them for you
  • Train on docs, websites, Notion, Drive, and YouTube, then ship AI Actions for refunds, bookings, and lookups
Build your agent for free

Set up in minutes

Keep reading

Support team dashboard showing an AI agent resolving conversations across web chat, WhatsApp, and Slack with live escalation to a human teammate

Automated Customer Service in 2026: An Operator's Field Guide

A practical, end-to-end guide to automated customer support in 2026: what works, what fails, the model landscape, KPIs, and how to actually ship an AI agent that resolves tickets.

Chirag AsarpotaChirag Asarpota·May 28, 2026
A support team evaluating AI chatbot options on a dashboard with model logos and metrics

Picking the Right AI Support Agent: A 2026 Buyer's Guide

A practical 2026 framework for choosing an AI customer support agent - model choice, customization, languages, AI Actions, and what to actually evaluate.

Chirag AsarpotaChirag Asarpota·May 22, 2026
A customer support agent interface backed by an AI model picker, with chat threads resolving in real time

Choosing an AI Customer Support Agent in 2026: The Practical Guide

A practical 2026 guide to picking an AI customer support agent: what's changed with GPT-5.5, Claude Opus 4.7, DeepSeek V4, and how to evaluate vendors.

Chirag AsarpotaChirag Asarpota·May 21, 2026
Berrydesk

Berrydesk

Deploy intelligent AI agents that deliver personalized support across every channel. Transform conversations with instant, accurate responses.

  • Company
  • About
  • Contact
  • Blog
  • Product
  • Features
  • Pricing
  • ROI Calculator
  • Open in WhatsApp
  • Legal
  • Privacy Policy
  • Terms of Service
  • OIW Privacy