If this works, Google and OpenAI will be copying their homework in six months just to save billions on their own clusters
What call my attention here more than anything is that mHC isn't simply "doing more with less". I would say MoE would better fit this description. Instead, it introduces a geometric constraint that changes the optimization landscape itself. I find this to be more profound.
If the approach survives independent replication, it won't just be China's workaround to sanctions. It would potentially be an architectural contribution the whole field adopts for cost reasons. Efficiency breakthroughs tend to travel fast.
500 Internal Server Error