TLDR this is, intentionally or not, an industry puff piece that completely misunderstands the problem.
Also, even if everyone is effectively running a a dishwasher cycle every day, this is still a problem that we can't just ignore, that's still a massive increase in ecological impact.
Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).
[1]: https://github.com/lino-levan/wubus-1 [2]: https://huggingface.co/lino-levan/qwen3-1.7b-smoltalk
It is true that there are always more training runs going, and I don't think we'll ever find out how much energy was spent on experimental or failed training runs.
Constant until the next release? The battle for the benchmark-winning model is driving cadence up, and this competition probably puts a higher cost on training and evaluation too.
Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently).
Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales.
> So, if I wanted to analogize the energy usage of my use of coding agents, it’s something like running the dishwasher an extra time each day, keeping an extra refrigerator, or skipping one drive to the grocery store in favor of biking there.
That's for someone spending about $15-$20 in a day on Claude Code, estimated at the equivalent of 4,400 "typical queries" to an LLM.
electricity and cooling incur wider costs and consequences.
I'm all for regulation that makes businesses pay for their externalities - I'd argue that's a key economic role that a government should play.
If that's so common then what's your theory as to why Anthropic aren't price competitive with GPT-5.2?
We've only launched to friends and family but I'll share this here since its relevant: we have a service which actually optimizes and measures the energy of your AI use: https://portal.neuralwatt.com if you want to check it out. We also have a tools repo we put together that shows some demonstrations of surfacing energy metadata in to your tools: https://github.com/neuralwatt/neuralwatt-tools/
Our underlying technology is really about OS level energy optimization and datacenter grid flexibility so if you are on the pay by KWHr plan you get additional value as we continue to roll new optimizations out.
DM me with your email and I'd be happy to add some additional credits to you.
At some point, we might end up in a steady state where the models are as good as they can be and the training arms race is over, but we're not there yet.