Show HN: Lance – image/video generation and understanding in one model

62 points by cleardusk 2 days ago | 15 comments

The model has 3B active parameters. We put the code, homepage, paper and model links here:

- Code: https://github.com/bytedance/Lance

- Homepage: https://lance-project.github.io/

- Paper: https://arxiv.org/abs/2605.18678

- Model: https://huggingface.co/bytedance-research/Lance

p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.

Tsarp 2 days ago |
Nice work. Wish they had picked another name given how popular lance/lancedb is.
asadm 2 days ago |
last dance for lance vance!
cleardusk 2 days ago |
:D
popalchemist 2 days ago |
Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated.
Why do that? Seems strange to be building sub-hd resolution video models in 2026.
jadbox 2 days ago |
Sure, but again, it's a micro 3B model. Perhaps it can't be used for general video work, but it might be able to do basic edits like remove an object from a table in a shot.
MattRix 2 days ago |
It’s not a micro model at all, it requires 40gb of VRAM. The 3B is just the active parameters.
nkvdev 2 days ago |
Great quality, forked and going to try
bguberfain 2 days ago |
Any plans to port to sglang or vLLM?
cleardusk a day ago |
vllm-omni support is on the way : )
embedding-shape 2 days ago |
Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)
wxw 2 days ago |
What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.