Between the all the em-dashes, this:
> Zero API costs, full data privacy, all local.
and the way your comments have completely different voices it's pretty clear that you're letting AI write some of your HN comments, too.
Is there some place we can quickly go see what's actually being tested? The landing page has non-clickable entries for the categories
https://github.com/SharpAI/DeepCamera/tree/master/skills/ana...
R u running the GPU at full throttle 24x7? Have you encounters silicon failures over time?
Why would you run this on your M5 instead of a dedicated machine for it? A Jetson Orin would be faster at prefill and decode, as well as cheaper for home installation.
That's why most professional inference solutions reach for GPU-heavy hardware like the Jetson. Apple Silicon seems like a strange and overly expensive fit for this use cae.
The Jetson hardware is targeted to low power robotics implementations.
The Jetson Orin is currently marketed as prototyping platform, and I believe it does not generally challenge recent Apple Silicon for inference performance, even considering prefill.
In the latest Blackwell based Jetson Thor, the key advantage over Apple Silicon is its capable FP4 tensor cores, which do indeed help with prefill. However, it also has half the memory bandwidth of an M4 Max, so this puts a big bottleneck on token generation with large context. If your use case did some kind of RAG lookup with very short responses then you might come out ahead using an optimized model, but for straightforward inference you are likely to lag behind Apple Silicon.
At this stage, professional inference solutions ideally use discrete GPUs that are far more capable than either, but those are a different class of monetary expense.
I've got a 3060 myself, which is nice to play around with the smaller models for free (minus electricity) and with 100% uptime, but I was not able to program anything with them yet that I didn't want to rewrite completely. A heavily quantized Qwen3.5-27B model is getting close though. Maybe in a few months.
Benchmarks: https://old.reddit.com/r/LocalLLaMA/comments/1rpw17y/ryzen_a...
The price hike has been crazy. The Bosgame M5 Mini is $2400 now. I didn't get one last year when they were $1500 because I thought the memory bandwidth was mediocre. However, it doesn't look like we'll get anything better for that price anytime soon.
That was bargain basement for that era. IBMs, Compaqs and the like were ~$5k similarly configured, and the first 486s were in the $7-9k area.
https://images.prismic.io/frameworkmarketplace/Z7aVJZ7c43Q3f...
Look this isn't an ad. I've been building my own desktops since I was 14. It's always been a CPU and motherboard and memory separate type of deal but this thing has it all integrated. Look how small it is. I use Gentoo. I compile all the things. I know exactly how long it takes to compile gcc because I do it all the time.
This thing compiles the linux kernel in 62 seconds. And it uses less power than my current machine to do it. I am jealous. The computer age is not slowing down. It's in fact speeding up. Am I the only one excited as fuck about what's coming?
You don't even need a GPU because it handles gaming tasks like it's nothing.
In 1984 he bought a TRS-80 for almost a thousand dollars. 32kB RAM, around 1 MHz 8 bit CPU.
I bought a Pentium 90 in the late 90's for several thousand dollars. It had the FDIV bug in it.
After experiencing a lifetime of high depreciation in electronics, I'm extremely price sensitive when buying it. I feel that if I wait a few years everything will become much cheaper. Maybe that's not the case with the slow down in Moore's law and the AI datacenter build out.
9B = 9 billion parameters. Q4_K_M is the quantization which will come in somewhere around 4.5 bits per weight.
It will run well on a $500 Mac Mini.
Especially if you want other apps to run at the same time, I think it's safer to stick with something more like 9b. You can see a table with quantized sizes here [0] -- yes, there are smaller quants than Q4_K_XL, but then you're down in the weeds with nickel-and-diming things, and if you want to even keep something like a (memory-hungry) instance of VSCode running, good luck.
IMO -- if 9b is doing the job, stick with 9b.
My intuition is that OpenClaw-like systems still make too many mistakes to be trusted with security. And that it will take more months or years until the models and harnesses are truly ready.
“Hey, my mother-in-law is coming today. She drives a blue Ford pickup. Let her in and record the car plate for future use.”
“There are servicemen coming today around noon. They should check the electricity box and leave in a few minutes. Let me know if they do something else.”
https://news.ycombinator.com/item?id=47438675
Edit: and while the parent comment and this are made in at least part jest, the discovery of bugs and emergence of adversarial and secondary uses will be interesting.
For example, imagine being able to run gait analysis for neurological disorders against yourself from your own security cameras.
It also helps to download video clips from BLINK/RING cameras, so you have persistent memory of all your video clips locally.
- This is a benchmark for "home security" workflows. I.e., extremely simple tasks that even open weight models from a year ago could handle.
- They're only comparing recent Qwen models to SOTA. Recent Qwen models are actually significantly slower than older Qwen models, and other open weight model families.
- Specific tasks do better with specific models. Are you doing VL? There's lots of tiny VL models now that will be faster and more accurate than small Qwen models. Are you doing multiple languages? Qwen supports many languages but none of them well. Need deep knowledge? Any really big model today will do, or you can use RAG. Need reasoning? Qwen (and some others) love to reason, often too much. They mention Qwen taking 435ms to first token, which is slow compared to some other models.
Yes, Qwen 3.5 is very capable. But there will never be one model that does everything the best. You get better results by picking specific models for specific tasks, designing good prompts, and using a good harness.
And you definitely do not need an M5 mac for all of this. Even a capable PC laptop from 2 years ago can do all this. Everyone's really excited for the latest toys, and that's fine, but please don't let people trick you into thinking you need the latest toys. Even a smartphone can do a lot of these tasks with local AI.
You are very correct, I just have 2 days of the MBP PRO 64GB on hands, so the test is just covering LLM part -- the logic handling.
For VLM, LFM is the best, even 450M works, I'll update soon :) Thanks again for your deep understanding of LLM/VLM domain and your suggestion.
Will extend the test to more model and thanks again for your insight.
Machine hardware evolution is slowing down, pretty soon you can buy one big ass server that will last potentially decades as it would be purpose built for ai.
Things like 'context based home security' yeah thats just, automatic, free, part of the ai system.
Everyone will talk to the ai through their phones and it'll be connected to the house, it'll have lineage info of the family may be passed down through generations etc, and it'll all be 100% owned, offline, for the family; a forever assistant just there.
I mean I envision analog/custom/bespoke ai hardware that is fundamentally 'good enough'. I mean as the market increases its need for these systems and as time progresses at some point it'll like warhammer 30k where these 'standard template constructs' are smart enough to basically teach you anything.
This feels like a very, very weak prediction (though certainly possible).
Since at least the 640kb quip, betting against progress or the appetite for progress has been a losing bet.
In the late 90s and early 2000s the mantra was "why waste time optimizing your software? By the time you're done the next gen of CPUs will have made up the difference."
Now the increase is more about moving to GPUs and power efficiency etc. We still have increases, but the rate of speedup has slowed down a lot.
- 6× faster CPU/GPU performance
- 6× faster AI performance
- 7.7× faster AI video processing
- 6.8× faster 3D rendering
- 2.6× faster gaming performance
- 2.1× faster code compiling
Over the span of 5 years.Plus, realistically what makes an "ai" server different from a computer? This "lineage info of the family may be passed down through generations" sounds nice but do you know anyone passing down a Commodore 64 or Apple II that remains in daily use? I fail to see how "ai" would protect something from obsolescence.
The GPUs have become much larger, so 6.8x is believable there, as is the inclusion of a matmul unit boosting AI.
The 2.x numbers are the most realistic, especially because they represent actual workloads.
That being said I feel like were gonna get to that point for most other stuff way sooner than AI (and already have for many pieces of software)
I have a good analogy. 10 years ago, I was convinced that a 24-inch 1080p monitor at arm's length was perfection. There could never be any reason to improve over it. I could do everything I ever wanted to, to a standard I would never need to improve upon.
Yet here we are. The simplest and most obvious improvement is a 24" 4k monitor at 200% scaling. Basically, better in every way.
There's a discussion to be had about whether you need the better setup, which I think is your point, but there's no denying you'd want it (all other variables the same).
All I care about is: do they work, are they ‘safe’, are they comfortable, etc.
Overall system performance is better at about 2x improvement thanks to extra cores/other improvements/changes. I could see other more specialized benchmarks improving more thanks to different improvements/core/power/size improvements in other components (GPU/NPU/etc...).
A home appliance like a toaster would be in the case of an AI server are ready to go appliance that’s preloaded and confined and connect to everything in your home and help you manage it likely by just voice chat or some amount of interface.
In a way, it already exists at an equipment level - a Mac Mini or Mac Studio is very power efficient and adding capabilities to it is at an app level.
Since a solution like this would be at the level of a group of apps, that might be something to bridge.
My elderly parents have asked me about "local backups" of their cloud stuff, their Facebook history etc..
If they're thinking about the risks/tradeoffs of being in the cloud..
I think people use the cloud because there's no better/easier option today.
But at some point there might be. A home appliance (which may be similar to a homelab under the hood but the user experience is where things change) that provides a bunch of automation and home services could be quite attractive if it got to a point of being very turnkey for the average family.
Just like a TV or a gaming console is today.
My Raspberry Pi pi-hole is a Pi 2b that has been running for over 5 years and it's totally fine. It has automatic security upgrades turned on but nothing else, and it doesn't need any time or attention. It just does its job.
I have a Homelab that's a mini-PC that's quiet and does not suck lots of power and is tucked away neatly in a closet.
I think it would be completely possible to provide an appliance-like machine that would not have the problems you're outlining.
Impossible is absolutely the wrong qualifier.
Maybe even subsidized by the government. This will be a fundamental need.
I'm not sure that really gives confidence hardware has really slowed down enough to invest in it for decades. Single core CPU performance has but that's not really what new things are using.
Like the PC in the 80s starting to eat up "get a mainframe" or "rent time on a mainframe" uses.
Of course, similar to a 10 year old car or appliance, you will be missing any new features or bells and whistles that have become available in the meantime.
My NAS is about 13 years old, the network switches it connects through are even older, and while 2.5GbE now exists I have no need throw out my "good enough" equipment to replace with something marginally faster or more power efficient. I don't even really need to expand the storage of that NAS anytime soon, because my music collection could never come close to filling it, my movie/TV collection isn't growing much anymore due to the shift to streaming, and the volume of other stuff that I need to back up from my other computers just isn't growing much over the years.
AI models are changing every other day. I have to rebuild llama.cpp from source regularly. We are no where close to a personal "AI mainframe."
Of course one can always upgrade components piecewise as requirements change, but I don't see why you need to invest in a big ass server to do that. It'd be cheaper to go that route everyone has for decades at this point - upgrade with normal sized stuff as needed and not try to make it an up front multi-decade home investment out of it.
On the flip-side, if you intentionally plan to lock in the capabilities to the kinds of things one can run today and know you'll never therefore need to upgrade it then you can get whatever sized system makes sense for today's needs. You just need to be really sure you'll not be interested in "the next big thing" when it comes too.
Another data point? All the GPUs i’m looking at buying for my home llm explorations is +5 years old.
and an oxide rack
Most people don't even think about running network cables or mesh wifi when building a house, no one will buy a server to run ai in their physical home
10 years ago I couldn't do alexa at my house, now I'm pretty close with a Qwen3:8b / Ollamma LLM (I mean I never really wanted alexa to do anything other then play music, automate stuff, etc. zero interest in it teaching me how to code).
I'm even thinking at some point we'll consider ai to be a fundamental human right to have access too as otherwise you are inherently in a disadvantaged position in terms of wealth prospects to those who do have access.
ie, something like this fake future apple device page: https://speculate-mai.pages.dev/
Seems like trying to make a need from the tools. My security system front page shows me every event that happened at my house, and I don't have to interrogate it on every happenstance, and I don't see what the value of that is.
It is still incredibly impressive of course! I just wish it was jailbroken
https://github.com/SharpAI/DeepCamera/releases/download/v202...
This is the classic issue in tech right now - it's becoming easier to build the systems, but the compliance/legal hurdles are still real, slow, and human. Even if the monitoring is best in class (which I'd argue it likely is - this is a fantastic application of AI), if the compliance isn't there it wont be a real product.
Do you want to have connect to your existing HA instance or okay with a new docker instance? I was planning to have both but would like to know which one makes better sense.
I don't know if that's why other people are interested. I'm probably weird. But that's what drives my interest.
Look at how much Google has changed over the years in the pursuit of profit. What will ChatGPT and Claude look like when they are pushed further down the profit maximization path?
the analysis is very suspicious: “gpt 5 mini had api failures due to wrong temp setting”? wtf?
whatever you used to slop your benchmark didt even take the time to set the temp to 1 (which the docs say is required)
https://github.com/SharpAI/DeepCamera/blob/c7e9ddda012ad3f8e...