If an agent gets a copy of the screen using browser_screenshot and then wants to click somewhere on that screen, how is it meant to find the right css selector to pass to browser_click?
There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot.
For right now, the MCP server doesn’t expose quite enough to navigate on its own.
I’ve added a browser_evaluate tool in my fork—though I haven’t committed or pushed a PR yet. With that, the agent can call JavaScript to get the accessibility tree and then use that to navigate via browser_find.
one of the wild things about vibe coding is... i want to add that feature, but i'm slightly more interested in using the prompt/spec you might have used to create it, not the patch itself.
Yeah. Let me see if I can find or reconstitute that prompt. Ultimately I wanted to have a system for automagically keeping Java up-to-date with JavaScript.
There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot.
What have I missed or misunderstood?