Making LLM Training Faster with Unsloth and NVIDIA

125 points by segmenta 2 days ago | 24 comments

stared a day ago |
While I do admire Unsloth (especially their https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF binarizations), the linked blog post looks like written by AI from notes (unless a human author acquired this taste from interactions with chatbots).
adityamwagh a day ago |
What’s with the all the hate for AI assisted writing on HackerNews? It’s a tool and people use tools all the time. It saves TIME and helps in improving coherence of one’s articles.
saberience a day ago |
Because AI writing is lazy and moreover, I don’t want to know the AIs opinion on something, I can get that myself, if I want to read someone’s article I want to hear that persons words and that persons opinions.
If someone has no opinions or unique insight then why would I listen to them or read their content.
Again, if I want the AIs view on something I can open up Claude and ask them myself, why bother reading generated articles that took 10 seconds for someone else to prompt?
danielhanchen a day ago |
Update - Just got rid of the spiced up intro
stingraycharles a day ago |
It destroys the previously implicit contract that the writer actually spent a decent amount of thought and time into the writing, and that the ideas expressed are theirs and original.
I don’t mind good usage of LLM assisted writing, but if the author can’t even be bothered identifying the most obvious AI tells, I use it as a proxy that the author probably but very little effort into the article.
It’s also often a horribly verbose style, where the same ideas could be presented with 20% of the prose.
It’s also ruining the entire experience on web communities (although here on HN the moderation team seems to get a hold of keeping them at bay at this point, much appreciated).
All in all, it’s objectively a net negative for the readers, and serves only the author.
I prefer original, less coherent articles that are genuine and where I know the ideas expressed are really the author’s and not the LLM’s inference.
Last but not least, I don’t think the grandparent you’re replying to was particularly hateful in the grand scheme of things.
vardalab a day ago |
Why would you prefer less coherent article? If article has a utility, I will read it, no matter what the source is.
stingraycharles 18 hours ago |
For the same reason, people prefer authenticity over mass-produced, generic stuff.
But authentic writing takes a lot of effort, and nobody wants to do that anymore in 2026, so the status quo is more mass-produced, generic content, which is frustrating and (to me) a regression.
toraway 12 hours ago |
The problem with AI written articles is still feeling uncertain whether there's actually any utility after reading 2000 words as you realize that it's been 90% filler so far but think maybe it will lead somewhere soon? But it doesn't and you wasted ten minutes reading glorified blog spam that was micro targeted at whatever niche you were researching.
After a while you pick up on the warning signs and just bail early without any guilt about false positives. It's really the only sustainable strategy in a world where it takes 5 seconds to absorb 5 minutes of your attention span.
embedding-shape a day ago |
> What’s with the all the hate for AI assisted writing on HackerNews?
I don't think it's specifically for "AI assisted writing", any lazy writing gets hate on HN, the bar for quality just sits higher for better or worse.
> It saves TIME and helps in improving coherence of one’s articles.
I agree that it saves time for the author, but for the reader it has the opposite effect, and if you're unable to write coherent articles without the use of LLMs, maybe solve that first instead of patching over the problem.
wat10000 a day ago |
When used well, it's not noticeable and nobody complains. The problem is only when it's used badly.
SwellJoe a day ago |
LLM prose is unfocused and extremely verbose. It wastes the readers time and is insulting. If you don't care enough about something to write about it, I certainly can't be expected to care enough to read it.
LLMs don't want anything. Thus, they have no taste. It's not merely a style question, it wastes readers time trying to find the point the author was trying to make; a fruitless search, as the LLM wasn't trying to make a point, it was completing one probable sentence after another.
danielhanchen a day ago |
Oh thanks :) We're also going to add MTP support soon for Qwen3.6!
95% of it is fully human done - the maths, algos, code snippets, screenshots & benchmarks are done / conducted by us and NVIDIA :)
We did use AI to fix spelling errors + made some nice plots using Chat (ours would look horrible lol)
Update - Just got rid of the spiced up intro
stared a day ago |
Thanks!
To be clear, I use AI for editing all the time. Actually, diagrams are nice.
Just some pieces like that look like copy-paste (I mean, empty lines before, code get no special typography, etc):
If we write the boundary information for a packed batch as: B = { lengths, cu_seqlens, max_seqlen, mask structure } then every transformer layer in that forward pass consumes the same B. If the model has L layers, rebuilding or re-synchronizing on B once per layer is not new work. It is the same information being reconstructed again and again. In other words, the useful work is: build B once, use it L times. The wasteful version is: build B + build B + ⋯ + build B (L times)
giancarlostoro a day ago |
> Actually, diagrams are nice.
I especially use AI to generate code for things like Mermaid[0]. It's just easier to describe the flow I want to outline than to remember all the nuances of Mermaid or similar code -> graph / diagram tooling. The output still looks nice too.
[0]: https://mermaid.js.org/
electroglyph a day ago |
nice writeup! looking forward to doing some more training as soon as i get some more data sorted. it'll be a custom arch, but i'll probably shoehorn it into unsloth for a speed boost.
danielhanchen a day ago |
Thank you!
wiradikusuma a day ago |
Quick question, for average joe do we still need to "train" LLM or we can just use off the shelf model and use it ("inference"?) for normal use cases like business process augmentation (e.g. helping read paper receipts, or generate cat videos)?
minimaxir a day ago |
You can use modern off-the-shelf models for those types of tasks, however a smaller-but-bespoke model will usually be more cost-efficient if used at scale.
najarvg a day ago |
And smaller bespoke models running locally are better for regulated workflows (healthcare, banking etc) as well
magicalhippo a day ago |
Modern smaller LLMs like Qwen3.6 27B is quite good at visual tasks like describing images. I wouldn't trust it on receipts unless you're fine with a bit less than 100% accuracy, say 90-ish%. For descriptions of images and such I've found they do quite well indeed. A key change was the introduction of more or even dynamic visual tokens, that really helped the model "see" more details.
Generating cat videos is the domain of diffusion models. If you have at least a 16GB GPU and a fair bit of patience you can get quite good results, check out ComfyUI reddit for example.
magicalhippo a day ago |
Just as example, here's what Qwen3.6 27B Q5_K_XL can do given this[1] image. I didn't do any prompt engineering here just a dead simple prompt: "Transcribe the following receipt. Put line items in a separate section, each line item separated by a double newline". Temperature set to 0.5.
Here's the output:
Publix. Bradenton Commons Shopping Center 4651 Cortez Rd. W. Bradenton, FL 34210 Store Manager: Joe Galati 941-792-7195 N/O LF WHEAT BREAD 3.99 F PBX THCK L/S BACON 7.82 F PUBLIX BROWN GRAVY 0.83 F TOP SIRLOIN STEAK 11.74 F You Saved 3.92 VITA PRTY SNK WINE 6.99 F You Saved 3.00 ORGANIC CARROTS 1.69 F BRC FLRT EAT SMART 3.34 F 1 @ 3 FOR 10.00 You Saved 0.15 GINGER ROOT 0.65 F 0.13 lb @ 4.99/ lb POTATOES RUSSET 0.84 F 0.65 lb @ 1.29/ lb POTATOES SWEET 0.49 F 0.49 lb @ 0.99/ lb DELECT BSQUE CK/TN 10.99 T FS OUTSTRETCH UNSC 15.99 T Order Total 65.36 Sales Tax 1.89 Grand Total 67.25 Credit Payment 67.25 Change 0.00 Savings Summary Special Price Savings 7.07 ************************************************************ * Your Savings at Publix * * 7.07 * ************************************************************ Receipt ID: 5957 6249 2191 1277 712 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - PRESTO! Trace #: 766630 Reference #: 0098440513 Acct #: XXXXXXXXXXXX2034 Purchase VISA
[1]: https://i.pinimg.com/originals/41/08/dc/4108dcf51f15af464bb6...
sebazzz 8 hours ago |
What is the difference between this and using normal OCR and then running that output through a LLM? It seems such a bazooka way to kill a fly to me using a modelime Qwen.
magicalhippo 7 hours ago |
For most tasks I agree. However once you've done your OCR you already have lost a lot of positional and context information, so for some tasks it might not be good enough.
If you have scanned PDFs that follow a template, like an invoice from a repeat supplier, then yeah OCR is definitely the way to go.
jiehong a day ago |
I think nowadays a lot of models are trained more at doing this than at knowing things, while being smaller. So I’d say yes!
At least that’s my impression.