So I built a CLI that lets the agent open a browser, interact with the page, record what happens, and collect any errors. Then it bundles everything — video, screenshots, logs — into a self-contained HTML file I can review in seconds.
proofshot start --run "npm run dev" --port 3000 # agent navigates, clicks, takes screenshots proofshot stop
It works with whatever agent you use (Claude Code, Cursor, Codex, etc.) — it’s just shell commands. It's packaged as a skill so your AI coding agent knows exactly how it works. It's built on agent-browser from Vercel Labs which is far better and faster than Playwright MCP.
It’s not a testing framework. The agent doesn’t decide pass/fail. It just gives me the evidence so I don’t have to open the browser myself every time.
Open source and completely free.
but its great to see some other open source alternatives within this space as well.
my claude drive his own brave autonomously, even for ui ?
From the OP, i don't think this is what is meant for what you are saying.
Tools like Claude and the like can, and do. This is just a utility to make the process easier.
If coding agents are given the Playwright access they can do it better actually because using Chrome Developer Tools Protocol they can interact with the browser and experiment with things without having to wait for all of this to complete before making moves. For instance I've seen Claude Code captures console messages from a running Chrome instance and uses that to debug things...
I'd love to see an agent doing work, then launching app on iOS sim or Android emu to visually "use" the app to inspect whether things work as expected or not.
That's very different from scripting together what is effectively a whitebox test against document ids which is what people do with things like playwright. Replacing manual QA like that could be valuable.
This is sick OP based on what's in the document, it looks really useful when you need to quickly fix something and need to validate the changes to make sure nothing has changed in the UI/workflow except what you have asked.
Also looks useful for PR's, have a before and after changed.
A few days ago I had a interaction with codex that roughly went as follows, "this chat window is scrolling off screen, fix", "I've fixed it", "No you didn't", "You are totally right, I'm fixing it now", "still broken", "please use a headless browser to look at the thing and then fix it", "....", "I see the problem now, I'm implementing a fix and verifying the fix with the browser", etc. This took a few tries and it eventually nailed it. And added the e2e test of course.
I usually prompt codex with screenshots for layout issues as well. One of the nice things of their desktop app relative to the cli is that pasting screenshots works.
A lot of our QA practices are still rooted in us checking stuff manually. We need to get ourselves out of the loop as much as possible. Tools like this make that easier.
I think I recall Mozilla pioneering regression testing of their layout engine using screenshots about a quarter century ago. They had a lot of stuff landing in their browser that could trigger all sorts of weird regressions. If screenshots changed without good reason, that was a bug. Very simple mechanism and very effective. We can do better these days.
I give agent either a simple browser or Playwright access to proper browsers to do this. It works quite well, to the point where I can ask Claude to debug GLSL shaders running in WebGL with it.