15 AI Builds Worth Shipping in 2026

Maybe you are an engineer who finally has a quiet weekend. Maybe you are a founder hunting for the next thing. Maybe you have been reading model release notes for six months and want to actually use one of them for something other than a chat window.

Either way, you need an idea - not a vibes-based "AI for X" pitch deck, but a concrete project you can scope, build, and put in front of a real user.

The good news is that 2026 is an absurdly forgiving moment to build. The frontier closed models - GPT-5.5 and GPT-5.5 Pro, Claude Opus 4.7 and Sonnet 4.6, Gemini 3.1 Ultra and Pro - are stronger than anything we had a year ago, and the open-weight wave from DeepSeek V4, Moonshot Kimi K2.6, Z.ai's GLM-5.1, Alibaba's Qwen 3.6 family, MiniMax M2.7, and Xiaomi's MiMo-V2 has collapsed unit economics for almost every category below. You can hold an entire product manual in a 1M-token Sonnet 4.6 context window. You can route routine traffic to DeepSeek V4 Flash at $0.14 per million input tokens. You can wire up an agent that books a meeting, refunds an order, or files a ticket and trust it to actually do the thing.

Below are fifteen AI projects worth your weekend. Each one is small enough to ship and substantive enough to charge for. Where it makes sense, we note the model that fits best and how to assemble it on Berrydesk without writing glue code.

1. Customer Support Agent for a Real Product

The most obvious build is still the most valuable one: a customer support agent that lives on your site and actually resolves tickets instead of routing them. The bar in 2026 is no longer "answers FAQs" - it is "reads the user's order, checks the policy, performs the refund, and writes back in your brand voice."

Train the agent on your help center, product docs, Notion runbooks, and a year of resolved tickets. Pin it to Claude Opus 4.7 for deep reasoning on edge cases, or split traffic so DeepSeek V4 Flash handles the 70% of conversations that are simple lookups while Opus 4.7 takes the escalations. Wire AI Actions for the three or four jobs your support team does over and over - issue refund, change shipping address, cancel subscription, look up order status. Deploy to your site, Slack, Discord, and WhatsApp from the same agent.

A SaaS company doing 4,000 tickets a month can typically take 60–80% of those off human queues with this build alone. Start one on Berrydesk.

2. Self-Updating FAQ Generator

Hand-curating FAQ pages is a dead format. The smarter version is a generator that ingests your docs, support transcripts, and product release notes, clusters the recurring questions, writes clean answers, and republishes the page on a schedule.

Use Gemini 3.1 Pro for the clustering pass - its 2M-token context lets you stuff hundreds of transcripts into a single call without chunking gymnastics - and a cheaper model like MiniMax M2.7 or Qwen3.6-35B-A3B for the answer rewrites. The interesting design choice is the cadence: weekly is usually enough for stable products, daily for anything in active beta. Add a diff view so the docs team can approve changes before they go live.

This pairs naturally with a support agent: the FAQ becomes the agent's training surface, the agent's unanswered questions become next week's FAQ entries, and the loop closes itself.

3. AI Email Triage and Reply Drafter

Inbox AI has matured past "summarize this thread." The 2026 build is an agent that classifies incoming mail by intent, drafts a reply in your voice, and - for narrow categories where you trust it - sends without asking.

The mechanics are straightforward. Connect to Gmail or Outlook via their respective APIs. On each new message, run a classifier (intent, urgency, sender relationship) and a reply drafter on the same model call. Claude Sonnet 4.6 is the sweet spot here - its 1M context lets you include the full thread plus a long memory of how you've replied to this person before, which is what makes the drafts actually sound like you.

The trick to making this useful instead of annoying is graduated trust: start with "draft only," graduate categories to "auto-send" once their false-positive rate stays at zero for a couple of weeks. For teams, layer in Make.com or Zapier hooks so an AI-classified "demo request" auto-creates a Calendly link and a CRM record.

4. Personal Finance Copilot

Budgeting apps are a graveyard of well-intentioned pie charts. The category that finally works is a finance agent that talks back - one that watches your transactions, notices the things a human advisor would notice, and answers questions in plain English.

Connect via Plaid or a comparable transaction feed, categorize spend with a small fine-tuned model, then hand the categorized history to GPT-5.5 or Gemini 3.1 Pro for the conversational layer. Long context matters here: you want a year of transactions in the prompt so the agent can say "your grocery spend is up 22% since you moved in March" rather than just "you spent $640 on groceries."

The product wedge is proactive insight, not passive Q&A. The agent should ping you when something changes, suggest a single action, and shut up otherwise.

5. Customer Feedback Synthesizer

Every growing company drowns in feedback: App Store reviews, Intercom transcripts, NPS surveys, sales call notes, Twitter mentions, the support agent's "stuck" pile. A synthesizer agent ingests all of it, deduplicates the themes, ranks them by frequency and revenue impact, and writes the weekly product-team digest you would otherwise pay an analyst to produce.

This is a long-context play. Gemini 3.1 Ultra's 2M-token window can hold a quarter of mixed feedback in one call, which means you skip a whole class of vector-search bugs. For the sentiment + theme extraction pass, GLM-5.1 or Qwen3.6-Plus are strong and substantially cheaper than the closed frontier.

The interesting upgrade is closing the loop: pipe the top three themes back into your support agent's knowledge base each week, so the bot starts proactively addressing the things customers are actually complaining about.

6. Contract Reading Agent

Lawyers charge by the hour partly because reading is slow. An agent that ingests a contract, surfaces every non-standard clause against a baseline you define, and answers questions like "what is my liability cap if the vendor breaches?" is genuinely useful to small businesses, ops teams, and freelancers - and it is now reliable enough to ship.

Use Claude Opus 4.7 as the reasoning model. It currently leads SWE-bench Pro at 64.3%, which is a coding benchmark, but the same characteristic - careful, structured reasoning over long structured documents - is exactly what contract review demands. Feed it your standard templates as the comparison baseline, then let users upload counterparty drafts.

The product surface is two views: a redlined document with risk flags and a chat panel where the user can ask questions. Optional add-ons: clause library, version diff, integration with DocuSign or Ironclad.

7. Resume and Cover Letter Tailor

Resume builders are commoditized. The interesting niche is a tailoring agent: paste a job description, upload your existing resume, get back a version rewritten to mirror the role's language and emphasize the right two or three projects.

GPT-5.5 is excellent at this - strong at restructuring without fabricating. The harder design problem is preventing hallucination of credentials. Constrain the model with a strict "you may rephrase but never invent" instruction and validate the output against the source resume token-by-token. The cover letter is the easier half; the resume tailor is what people will actually pay for.

Plug the agent into LinkedIn job alerts so it generates tailored variants overnight while the user sleeps, and you have a product, not a feature.

8. Plain-English Legal Drafter

Same model layer as the contract reader, different product surface. Here the user describes what they need - "freelance contract for a $5k logo project, US-based client, 50% upfront" - and the agent produces a draft they can edit, sign, and send.

Use Claude Opus 4.7 for the drafting pass and a smaller model like Qwen3.6-27B or Sonnet 4.6 for the explainer pop-overs that translate every clause into a one-sentence plain-English version. The Qwen3.6-27B option is interesting if you care about on-prem deploys - it's Apache 2.0 and beats much larger MoE rivals on agentic benchmarks, which makes it viable for legal-tech customers who can't ship documents to a third-party API.

Wire the final document to e-signature via a single AI Action and you have shipped the whole loop.

9. Personal Shopping Concierge

Recommendation systems have been around forever; what's new is a recommender you can talk to. The user says "I need a winter jacket, I bike to work, under $300, no synthetic insulation," and the agent comes back with three options, the trade-offs between them, and a buy link.

The build is mostly retrieval and tool use, not raw model intelligence. Index a product catalog (yours, or one you have permission to scrape), then use Kimi K2.6 or GLM-5.1 as the agent - both are agentic-first models built to chain searches, comparisons, and writeups. Kimi can run swarms of up to 300 sub-agents, which is more than this project needs, but the underlying architecture means it handles "search, compare, narrow, present" without falling over.

For e-commerce sites, this is also a high-converting variant of the support agent in idea #1. Same plumbing, different prompt.

10. Productivity and Calendar Agent

The calendar agent that actually works in 2026 looks less like a scheduler and more like a chief of staff. It reads your inbox, your calendar, your tasks, and your goals; it proposes a week; it defends your focus blocks; it reschedules low-priority meetings when something urgent lands.

This is a job for a long-context, agentic model - Claude Opus 4.7 if you want closed-source reliability, MiMo-V2-Pro if you want open-weight (it ships with 1M context, 42B active params, and a reasoning-first orientation). Connect Google Calendar, Notion or Linear, and the user's email. Run a planning pass each morning and a reactive pass on every new event.

The wedge is teaching the agent your defaults: "no meetings before 10am," "Friday afternoons are for writing," "I'll always say yes to meetings tagged 'sales-call.'" Capture those once and the agent earns its keep.

11. Interview Coach

A coach that runs you through twenty role-specific mock interviews the night before a real one is a product people will actually pay for. The build has matured because two things finally work: voice-in voice-out at low latency, and reasoning models that can score answers on substance rather than just keywords.

Use GPT-5.5 Pro's parallel reasoning for the scoring rubric - it can evaluate answer structure, evidence, and clarity in parallel rather than serially, which is what makes the feedback feel useful instead of generic. For the interviewer persona, Claude Sonnet 4.6 is warm and natural in voice. For technical interviews specifically, lean on Claude Opus 4.7 for code-review-style feedback on the candidate's approach.

The differentiator is realism: scrape job descriptions, pull common interview loops for that company from public sources, and have the agent role-play the actual interview format the candidate is about to face.

12. Mental Health Support Companion

This category requires more care than any other on the list. Done badly it is harmful; done well it is a meaningful supplement to professional care for users who can't access or afford it.

Build defensively. Use a frontier model with strong refusal behavior - Claude Opus 4.7 is the safest default - and ground every interaction in evidence-based modalities (CBT prompts, journaling structures, grounding exercises). Hard-code escalation: if the user signals crisis, the agent surfaces hotline numbers and stops trying to be the therapist. Log nothing without explicit consent and store what you do log encrypted at rest.

The honest framing matters too. This is not therapy. It is a structured journal you can talk to. Built that way, it helps. Marketed as a therapist, it gets people hurt and you sued.

13. Reading Recommender That Knows You

Goodreads-style recommenders fail because they only know what you've rated. A 2026 recommender knows what you've highlighted, what you abandoned, what you said about a book in your notes, and what you've been searching for.

Index the user's reading history (Kindle exports, Readwise, Notes). Use Gemini 3.1 Pro to build a continually updated taste model - the long context lets you pass the entire reading history as prompt rather than embedding-and-searching. Pair it with a catalog of book metadata and reviews, and recommendations stop feeling like a vending machine.

The fun extension is the conversational layer: "I want something like Project Hail Mary but more grounded, and I have a six-hour flight tomorrow."

14. Writing Assistant That Has a House Style

Generic writing assistants are everywhere; the gap is one that learns your voice. Feed it 50 of your past blog posts, emails, or essays, and have it generate drafts, suggest edits, and rewrite sections in a style indistinguishable from yours.

Sonnet 4.6 with 1M context is ideal - you can fit the entire style corpus in the system prompt rather than fine-tuning, which means iteration is cheap. For the actual generation, A/B with GPT-5.5; some voices come out better on one model than the other and the only way to know is to try.

The product surface that wins is the "rewrite this paragraph in my voice" inline edit, not the blank-page generator. Most writers don't want a ghostwriter; they want a copy editor who has internalized their tics.

15. Adaptive Personal Tutor

Tutoring is a perfect fit for 2026 models because the job is exactly what they're now good at: explain a concept, watch the student attempt a problem, diagnose the misunderstanding, adjust the explanation, repeat.

The model layer depends on the subject. For math and code, Claude Opus 4.7 or GPT-5.5 Pro. For language learning, Sonnet 4.6 with voice. For broad K-12 across subjects, Gemini 3.1 Pro is excellent and cheap enough at the scale a real ed-tech product needs. Open-weight option for districts that need on-prem: Qwen3.6-27B for the dense variant or GLM-5.1 if you can run a larger MoE.

What makes a tutor agent good versus mediocre is memory. The agent should remember every concept the student has stumbled on, every analogy that worked, and every mistake pattern that recurs - and pull that context into every session. Don't ship an amnesiac.

A Few Things to Watch Out For

A handful of pitfalls show up in nearly every project on this list, and they are worth naming up front.

Don't pin to a single model. The frontier moves every six weeks. A new DeepSeek release or a Claude point bump can halve your cost or double your quality overnight. Build with model-routing in mind from day one - production traffic on a cheap open-weight model, hard cases on a frontier closed model, and the ability to swap either side without a rewrite. Berrydesk does this routing for you out of the box.

Long context is not a substitute for retrieval, but it changes the calculus. With 1M–2M token windows now standard, the question shifts from "how do I chunk my docs?" to "do I need to chunk at all?" For a knowledge base under a few hundred thousand tokens, just stuff the whole thing into the prompt and skip the vector store entirely. Reserve RAG for catalogs that genuinely don't fit.

AI Actions are where projects become products. A chatbot that talks is a demo; an agent that acts is a product. Every idea above gets meaningfully better when you wire in a real tool - Stripe for payments, Calendly for booking, your CRM for record creation. The tool-use models from late 2025 onward (Kimi K2.6, GLM-5.1, Opus 4.7, MiMo-V2) are reliable enough to put on the critical path for real workflows.

Hallucination is now an architecture problem, not a model problem. Modern models hallucinate less than their predecessors, but the failure mode has shifted from "makes things up" to "confidently summarizes incorrectly." The fix is structural: ground generation in retrieved sources, validate outputs against the source on extraction tasks, and require citations on anything that would harm a user if it were wrong.

Open-Weight vs Closed Frontier: Pick Per Use Case

A meta-decision worth making before you start any of these projects: where do you sit on the open-weight versus closed-frontier spectrum?

The closed frontier - GPT-5.5, Claude Opus 4.7, Gemini 3.1 Ultra - is still the ceiling on the hardest tasks: deep reasoning, multi-step agentic work, anything where the cost of being wrong is high. Use it where quality dominates cost.

The open-weight frontier has caught up to within a hair on many benchmarks while costing 5–20× less. DeepSeek V4 Flash at $0.14 per million input tokens makes ideas like the FAQ generator and feedback synthesizer profitable at scales where closed-model pricing would have killed them. GLM-5.1 (MIT license, beats GPT-5.4 and Opus 4.6 on SWE-Bench Pro at 58.4) and Qwen3.6-27B (Apache 2.0, dense) make on-prem and air-gapped deploys viable for healthcare, finance, and government use cases that simply cannot send data to a US API. MiniMax M2.7 hits 56.22 on SWE-Pro at 8% of Sonnet pricing and 2× the speed.

For most projects above, the right answer is hybrid: cheap open-weight for the high-volume routine path, frontier closed for the rare hard cases, and a router in front of both.

Pick One. Ship It This Week.

The most common reason an AI side project dies is not technical - it is that the builder spends three weeks comparing eight models, four embedding strategies, and twelve frameworks, and never ships. Don't do that.

Pick one idea from the list. Pick the model that's good enough, not the one that's optimal. Get a v1 in front of a real user inside a week. Iterate based on what they actually do, not what you imagined they would.

If you want to skip the plumbing - model routing, knowledge ingestion from docs and Notion and Drive and YouTube, a chat widget that doesn't look like 2019, AI Actions for booking and refunds and payments, and deployment to your site, Slack, Discord, and WhatsApp from the same agent - that is what Berrydesk is for. Spin up your first agent in a few minutes and spend your week on the part that's actually yours: the idea.

Either way, you need an idea - not a vibes-based "AI for X" pitch deck, but a concrete project you can scope, build, and put in front of a real user.

1. Customer Support Agent for a Real Product

A SaaS company doing 4,000 tickets a month can typically take 60–80% of those off human queues with this build alone. Start one on Berrydesk.

2. Self-Updating FAQ Generator

This pairs naturally with a support agent: the FAQ becomes the agent's training surface, the agent's unanswered questions become next week's FAQ entries, and the loop closes itself.

3. AI Email Triage and Reply Drafter

4. Personal Finance Copilot

The product wedge is proactive insight, not passive Q&A. The agent should ping you when something changes, suggest a single action, and shut up otherwise.

5. Customer Feedback Synthesizer

6. Contract Reading Agent

7. Resume and Cover Letter Tailor

Plug the agent into LinkedIn job alerts so it generates tailored variants overnight while the user sleeps, and you have a product, not a feature.

8. Plain-English Legal Drafter

Wire the final document to e-signature via a single AI Action and you have shipped the whole loop.

9. Personal Shopping Concierge

For e-commerce sites, this is also a high-converting variant of the support agent in idea #1. Same plumbing, different prompt.

10. Productivity and Calendar Agent

11. Interview Coach

12. Mental Health Support Companion

This category requires more care than any other on the list. Done badly it is harmful; done well it is a meaningful supplement to professional care for users who can't access or afford it.

The honest framing matters too. This is not therapy. It is a structured journal you can talk to. Built that way, it helps. Marketed as a therapist, it gets people hurt and you sued.

13. Reading Recommender That Knows You

The fun extension is the conversational layer: "I want something like Project Hail Mary but more grounded, and I have a six-hour flight tomorrow."

14. Writing Assistant That Has a House Style

15. Adaptive Personal Tutor

A Few Things to Watch Out For

A handful of pitfalls show up in nearly every project on this list, and they are worth naming up front.

Open-Weight vs Closed Frontier: Pick Per Use Case

A meta-decision worth making before you start any of these projects: where do you sit on the open-weight versus closed-frontier spectrum?

For most projects above, the right answer is hybrid: cheap open-weight for the high-volume routine path, frontier closed for the rare hard cases, and a router in front of both.

1. Customer Support Agent for a Real Product

2. Self-Updating FAQ Generator

3. AI Email Triage and Reply Drafter

4. Personal Finance Copilot

5. Customer Feedback Synthesizer

6. Contract Reading Agent

7. Resume and Cover Letter Tailor

8. Plain-English Legal Drafter

9. Personal Shopping Concierge

10. Productivity and Calendar Agent

11. Interview Coach

12. Mental Health Support Companion

13. Reading Recommender That Knows You

14. Writing Assistant That Has a House Style

15. Adaptive Personal Tutor

A Few Things to Watch Out For

Open-Weight vs Closed Frontier: Pick Per Use Case

Pick One. Ship It This Week.

Stop prototyping. Ship the first one this week.

Keep reading

47 Claude Code Tips, Tricks, and Power-User Patterns

Zendesk's Chatbot in 2026: The True Costs, the Hard Limits, and the AI Layer That Actually Closes Tickets

Enterprise AI in 2026: A Field Guide to Rolling It Out Without Wrecking Your Org

1. Customer Support Agent for a Real Product

2. Self-Updating FAQ Generator

3. AI Email Triage and Reply Drafter

4. Personal Finance Copilot

5. Customer Feedback Synthesizer

6. Contract Reading Agent

7. Resume and Cover Letter Tailor

8. Plain-English Legal Drafter

9. Personal Shopping Concierge

10. Productivity and Calendar Agent

11. Interview Coach

12. Mental Health Support Companion

13. Reading Recommender That Knows You

14. Writing Assistant That Has a House Style

15. Adaptive Personal Tutor

A Few Things to Watch Out For

Open-Weight vs Closed Frontier: Pick Per Use Case

Pick One. Ship It This Week.

Stop prototyping. Ship the first one this week.

Keep reading

47 Claude Code Tips, Tricks, and Power-User Patterns

Zendesk's Chatbot in 2026: The True Costs, the Hard Limits, and the AI Layer That Actually Closes Tickets

Enterprise AI in 2026: A Field Guide to Rolling It Out Without Wrecking Your Org