A Local AI for Local People (We'll Have No Trouble Here)

I'll be honest with you — when I first heard "AI in the browser," my eyes glazed over. Another marketing stunt. Another "we put AI in it" checkbox on a feature matrix somewhere. I've seen enough AI hype to last several lifetimes, and I wrote a whole article about being sick of it.

But then I actually used Chrome's built-in AI APIs. And they're genuinely interesting.

The reason is simple: everything runs on the user's device. No API keys. No server calls. No data leaving the browser. Nothing. It's a fundamentally different privacy model.

What Are We Actually Talking About?

Chrome has been quietly shipping a suite of AI APIs that run Gemini Nano (Google's smallest model) directly in the browser. It handles downloading the model, managing updates, and optimising for whatever hardware the user has: CPU, GPU, or NPU.

There are currently seven APIs, and they fall into two categories:

The Task APIs (these do one specific thing well):

Summarizer: condenses text into TL;DRs, key points, teasers, or headlines
Writer: generates content from a prompt
Rewriter: rephrases existing content
Translator: translates between languages
Language Detector: identifies what language text is in
Proofreader: checks grammar and correctness

The General Purpose API:

Prompt API: the classic "send a prompt, get a response" LLM interaction, with session management and streaming

The task-specific APIs are the ones that impressed me most. They're not trying to be everything to everyone. A Summarizer summarises. A Translator translates. They've got constrained, well-defined jobs and they do them reliably. The Prompt API is more freeform and, as you'd expect from a tiny on-device model, more prone to wandering off into confident nonsense.

The Summarizer, Writer, Rewriter, Translator, and Language Detector APIs have been adopted by the W3C WebML Working Group for cross-browser standardisation. So this isn't just a Chrome thing. It's heading towards becoming a web standard.

The Privacy Angle Is The Real Story

Here's what makes this genuinely compelling: the data never leaves the device.

Think about what that means for a moment. You could build a medical form that validates and summarises patient notes without any of that data touching a server. You could build a translation tool for sensitive legal documents that works entirely offline. You could add AI features to a progressive web app that works on a plane.

No GDPR headaches about where your AI provider is processing data. No API bills. No latency from round-trips to a server. No "oops, we accidentally trained on your data" incidents.

The browser handles the model lifecycle (downloading, caching, updating) so you don't need to worry about model hosting or distribution. It's like the browser became your ML infrastructure team, and they work for free.

I Built Something With It

Because I'm incapable of reading about technology without immediately wanting to build something, I added an AI reading companion to the blog articles on this very site. If you're reading this in Chrome with the right flags enabled, you might have noticed the sparkly button in the bottom-right corner.

It does three things:

Summarise: uses the Summarizer API to generate a TL;DR of the article you're reading
Key Points: the Summarizer API again, but configured for bullet-point extraction
Free-form questions: uses the Prompt API to let you ask anything about the article

The Summarizer API is the star of the show. Here's all it takes:

const summarizer = await Summarizer.create({
  type: 'tldr',
  format: 'plain-text',
  length: 'short'
});

const summary = await summarizer.summarize(articleText);

That's it. No API keys. No fetch calls. No environment variables. The model is already on the user's machine. You just... use it.

The Prompt API is more involved because you need to give it context. Gemini Nano has a small context window, so I feed the article text directly into each prompt rather than relying on the system prompt (which it tends to cheerfully ignore for longer content):

const session = await LanguageModel.create({
  systemPrompt: 'You are a helpful reading companion. Be concise.'
});

const stream = session.promptStreaming(
  `${articleText}\n\n---\nQuestion: ${userQuestion}`
);

The streaming API returns delta chunks. Each read() gives you just the new text, so you accumulate them yourself.

The Honest Bit: Limitations

Let's not pretend this is all sunshine and rainbows. Gemini Nano is a small model. It's around 1.7 GB and it's designed to run on consumer hardware. That means:

It hallucinates. Confidently. I asked it about WebMCP and it cheerfully invented "Web Managed Cloud Platform" as the acronym expansion, complete with a fake URL to a hosting company that doesn't exist. The Summarizer API doesn't have this problem because it's constrained to extracting from the input text, but the Prompt API will absolutely make things up.

The first use triggers a model download. 1.7 GB isn't nothing. Your users will stare at a loading spinner for a few minutes the first time. After that it's cached and instant, but that first experience needs managing with good UX.

It requires recent Chrome. The APIs need Chrome 138+ with flags enabled, or Chrome stable for the APIs that have graduated. That rules out a good chunk of your audience right now. You need graceful degradation.

The system prompt is unreliable. I initially put the entire article text in the system prompt and the model just... ignored it. Moving the context into each user message fixed it, but it means every prompt is heavier than it should be.

Hardware requirements are real. You need 22 GB of free storage and either a GPU with 4+ GB VRAM or a CPU with 16+ GB RAM. That's most modern laptops and desktops, but it's not everyone.

When Would You Actually Use This?

The sweet spot is clear: privacy-sensitive features where good-enough AI beats no AI.

Spell-checking and proofreading. Auto-summarisation of long content. Language detection and translation for international users. Form validation and assistance. Content tagging and categorisation. Accessibility enhancements like simplified language alternatives.

For anything requiring serious reasoning, factual accuracy, or large context windows, you still want a proper server-side model. Chrome's built-in AI is competing with "we didn't add AI at all because the API costs and privacy implications weren't worth it."

And honestly? That's a much more interesting competition.

Getting Started

If you want to have a play, you'll need Chrome 138+ and to enable the flag:

chrome://flags/#prompt-api-for-gemini-nano

Then check availability in your code:

// For the Prompt API
if (typeof LanguageModel !== 'undefined') {
  const availability = await LanguageModel.availability();
  // 'available', 'downloadable', 'downloading', or 'no'
}

// For the Summarizer
if (typeof Summarizer !== 'undefined') {
  const availability = await Summarizer.availability();
  // 'available', 'downloadable', or 'unavailable'
}

The Chrome developer docs are thorough and well-written. The APIs themselves are refreshingly simple. Most of them follow the same create() then doTheThing() pattern.

The Bigger Picture

What I find most interesting about all this is the distribution model. The browser has become the runtime for on-device ML, and Google is pushing these APIs through W3C standardisation so other browsers can implement them too.

We've seen this pattern before. WebGL gave us GPU access. WebRTC gave us peer-to-peer communication. WebAssembly gave us near-native performance. Now WebML is giving us on-device inference. Each time, the browser absorbs a capability that previously required native apps or external services.

Whether Gemini Nano is the right model for the job today is almost beside the point. The infrastructure is being built. The APIs are being standardised. And when the models get better (and they will) the plumbing will already be there.

For now, have a play with the reading companion on this site. Ask it something daft. Watch it confidently get things wrong. And then try the Summarizer and be mildly impressed that a 1.7 GB model running on your laptop can actually do a decent job.

The future of AI on the web goes well beyond cloud APIs and server farms. Sometimes the best place to run a model is exactly where the data already lives: on the user's device, in the browser, with the door firmly shut to the outside world.