I think a good abstractions design and good test suite will make it break success of future coding projects.
Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.
The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.
I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.
I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.
Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.
I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.
Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.
But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.
Essentially a bet that the rate of model improvement is going to be faster than the rate of decay from bad coding.
Now this hurts me personally to see as someone who actually enjoys having quality code but I don't see why it doesn't have a decent chance of holding
It's hard to imagine a human developer misses something so obvious.
AI makes it cheap (eventually almost free) to traverse the already-discovered and reach the edge of uncharted territory. If we think of a sphere, where we start at the center, and the surface is the edge of uncharted territory, then AI lets you move instantly to the surface.
If anything solved becomes cheap to re-instantiate, does R&D reach a point where it can’t ever pay off? Why would one pay for the long-researched thing when they can get it for free tomorrow? There will be some value in having it today, just like having knowledge about a stock today is more valuable than the same knowledge learned tomorrow. But does value itself go away for anything digital, and only remain for anything non-copyable?
The volume of a sphere grows faster than the surface area. But if traversing the interior is instant and frictionless, what does that imply?
In a stage interview (a bit after the "sparks of agi in gpt4" paper came out) he made 3 statemets:
a) llms can't do math. They can trick us with poems and subjective prose, but at objective math they fail.
b) they can't plan
c) by the nature of their autoregressive architecture, errors compound. so a wrong token will make their output irreversibly wrong, and spiral out of control.
I think we can safely say that all of these turned out to be wrong. It's very possible that he meant something more abstract, and technical at its core, but in the real life all of these things were overcome. So, not a luddite, but also not a seer.
The harnesses have helped in training the models themselves (i.e. every good trace was "baked in" the model) and have improved in enabling test time compute. But at the end of the day this is all put back into the models, and they become better.
The simplest proof of this is on benchmarks like terminalbench and swe-bench with simple agents. The current top models are much better than their previous versions, when put in a loop with just a "bash tool". There's a ~100LoC harness called mini-swe-agent [1] that does just that.
So current models + minimal loop >> previous gen models with human written harnesses + lots of glue.
> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
You know this is a false dichotomy right? You can treat and consider LLMs statistical parrots and at the same time take advantage of them.
It's nearly frictionless, not frictionless because someone has to use the output (or at least verify it works). Also, why do you think the "shape" of the knowledge is spherical? I don't assume to know the shape but whatever it is, it has to be a fractal-like, branching, repeating pattern.
What helped for me was forcing the agent into short, explicitly scoped steps. Each step declares what it can read, what it can do, and what it’s allowed to output, then that context gets torn down before the next step.
I’ve been using GTWY for this kind of setup and it made long-running coding agents much more boring and predictable, which is exactly what you want at scale.
Curious how you’re handling state reset and permission drift as runtimes get longer.