You know when you're trying to explain to your nan how to use your website? You say things like "click the blue button, no the other blue button, no scroll down a bit, there, yes, that one." That's essentially how AI agents have been interacting with websites up until now. They take a screenshot, squint at it through a vision model, and hope for the best.
It's about as elegant as trying to eat soup with a fork.
Enter WebMCP
Google dropped an early preview of WebMCP back in February and honestly, it's one of those things that makes you go "why didn't we have this already?" The concept is beautifully simple: instead of AI agents guessing what your website can do by looking at it, your website tells the agent exactly what's available.
It's the difference between handing someone a menu and making them sniff the kitchen.
WebMCP stands for Web Model Context Protocol and it's been published as a W3C Draft Community Group Report. It's currently available in Chrome 146 Canary, so we're firmly in "early days but the foundations are solid" territory.
How It Actually Works
The API lives on navigator.modelContext and it's refreshingly straightforward. You register "tools" — which are just JavaScript functions with a name, a description, and optionally a JSON Schema for the inputs. Here's what that looks like:
navigator.modelContext.registerTool({
name: 'searchProducts',
description: 'Search the product catalogue by keyword, category, or price range',
inputSchema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search keywords' },
category: { type: 'string', description: 'Product category' },
maxPrice: { type: 'number', description: 'Maximum price in GBP' }
},
required: ['query']
},
async execute(input, client) {
const results = await fetchProducts(input)
return { products: results, count: results.length }
}
})
That's it. You've just told any AI agent that visits your site: "Hey, I can search products for you. Here's what I need from you, and here's what you'll get back." No screen scraping. No guessing which div is a search box.
Why This Is a Big Deal
I've been building enterprise AI systems for a while now and one of the most frustrating aspects has been the fragility of agent-to-web interactions. You build a beautiful agentic workflow and then it falls over because a website redesigned their checkout page and moved a button 20 pixels to the left.
WebMCP fixes this in three ways that actually matter:
Lower latency. No more uploading screenshots to a vision model and waiting for it to figure out what it's looking at. The agent gets structured JSON. It's like the difference between OCR-ing a PDF and just... reading the text.
Higher accuracy. When you're passing structured schemas instead of pixel data, the error rate drops to nearly zero. The agent knows exactly what parameters a tool expects and what it'll get back.
Reduced cost. Sending a JSON schema is astronomically cheaper than sending high-resolution screenshots through a multimodal model. If you're running agents at scale, this is the difference between a reasonable cloud bill and needing to remortgage your house.
The Two Flavours
WebMCP offers two complementary approaches:
Declarative — for standard actions that map cleanly to HTML forms. Think login, search, contact forms. Things where the structure is already there in the markup.
Imperative — for the complex, dynamic stuff that needs JavaScript. Think multi-step booking flows, interactive configurators, or anything where the next action depends on the result of the previous one.
The imperative API is where it gets interesting. You can use the ModelContextClient to request user interaction mid-flow:
navigator.modelContext.registerTool({
name: 'bookAppointment',
description: 'Book a consultation appointment',
inputSchema: {
type: 'object',
properties: {
date: { type: 'string', description: 'Preferred date (ISO 8601)' },
service: { type: 'string', description: 'Service type' }
},
required: ['date', 'service']
},
async execute(input, client) {
const slots = await getAvailableSlots(input.date, input.service)
// Ask the user to confirm a slot
const chosen = await client.requestUserInteraction(async () => {
return await showSlotPicker(slots)
})
return await confirmBooking(chosen)
}
})
See that requestUserInteraction call? The agent can hand control back to the user for confirmation or selection, then carry on with the result. It's collaborative — the agent does the heavy lifting, the human makes the decisions. That's how it should work.
What This Means for Design Systems
I'll put my design systems hat on for a moment. WebMCP is going to need new thinking about how we structure our component APIs.
Right now, we design components for human consumption. A button has visual states, hover effects, focus rings. But with WebMCP, we're also designing for agent consumption. Your component library needs to think about what tools it exposes, what schemas those tools use, and how they compose together.
I can already see a future where design system teams ship not just component documentation but also a set of WebMCP tool definitions that map to their component interactions. A DatePicker component wouldn't just render a calendar — it'd register a selectDate tool with the right schema.
The Read-Only Hint
One small but clever detail: tools can be annotated with a readOnlyHint boolean. This tells agents "this tool just reads data, it doesn't change anything." It's a safety valve that lets agents be more confident about invoking certain tools without explicit user confirmation.
navigator.modelContext.registerTool({
name: 'getAccountBalance',
description: 'Retrieve the current account balance',
annotations: { readOnlyHint: true },
async execute() {
return { balance: await fetchBalance(), currency: 'GBP' }
}
})
An agent could freely check your balance but would know to pause and confirm before executing a transfer. Sensible defaults.
The Elephant in the Room
Security. Obviously.
The API requires HTTPS (secure context only) and the spec has clearly thought about the trust model. But there are open questions. If my website registers a transferMoney tool, what stops a rogue agent extension from invoking it? The requestUserInteraction pattern is part of the answer, but we'll need robust permission models as this matures.
I expect we'll see something similar to the permissions API emerge specifically for WebMCP tools — letting users grant or deny agent access to specific tool categories.
Should You Care Right Now?
If you're building anything where AI agents might interact with your product — and let's be honest, that's increasingly everything — yes, you should be paying attention.
You don't need to ship WebMCP support tomorrow. It's in Canary, it's a draft spec, and the API surface will probably evolve. But the pattern is clear and the direction is set. The web is getting a structured API for AI agent interaction, and it's going to make the current generation of screen-scraping agent tools look like cave paintings.
Start thinking about what tools your application would expose. What are the core actions? What inputs do they need? What do they return? Even if you don't write a line of WebMCP code today, that exercise will make your application better for humans too.
Because at the end of the day, if you can describe what your app does in a clean JSON schema, you probably understand your own product better than most.
