How does it compare to popular local inference engines, e.g. ollama, lm studio, or handrolled llama.cpp? I saw a brief benchmark in the readme but wasn't sure if there was more.
speu a day ago |
I've been trying deepseek-v4-flash in OpenCode (via OpenRouter) and I'm blown away. It's no Opus, obviously, but it had zero issues with any regular coding task I threw at it. v4-flash is remarkably "good enough" for what I needed. The whole evening of coding cost me $0.52 in API credits.
jiehong a day ago |
Using it in Kagi Assistant is stupidly slow. I get like 10 t/s.
While it’s pretty fast in the official app for example.
Kagi Assistant is also kind of broken when using Qwen 3.6 Plus.