Right now I mostly have it work off markdown files for project context/rules, plus get-shit-done and frontend skill occasionally. It works, but token usage climbs fast.
How are people actually using it without wasting tokens?
Are you keeping instructions tiny, using cheaper models, implementing specific skills/plugins, or doing something smarter?
What made the biggest difference for you in real projects?
If I need to refactor, I still use the excelent refactor facilities of my IDE, if I am going to generate a lot of repetitive code, I prefer to ask the gent to write a small generator than having it generate it, actually have already dozens of such small scripts, as doing boring stuff with code like visiting the AST and referencing stuff is the kind of code that LLMs love to generate and generate pretty well.
Instead, using the APIs and paying per token directly. I built a custom agent stack and can optimize in ways they will not. Limits are also much higher, especially if you spend more. Anecdotally, it doesn't seem like the serve quant'd models when using the API directly, during busy times.
2. Staying in the loop.
The underlying LLMs still make too many mistakes and churn to be left to their own devices yet. You have to think and evaluate deeply if your current setup is actually producing benefits while saving time.