The future of version control

647 points by c17r 2 days ago | 369 comments

bos 2 days ago |
This is sort of a revival and elaboration of some of Bram’s ideas from Codeville, an earlier effort that dates back to the early 2000s Cambrian explosion of DVCS.
Codeville also used a weave for storage and merge, a concept that originated with SCCS (and thence into Teamware and BitKeeper).
Codeville predates the introduction of CRDTs by almost a decade, and at least on the face of it the two concepts seem like a natural fit.
It was always kind of difficult to argue that weaves produced unambiguously better merge results (and more limited conflicts) than the more heuristically driven approaches of git, Mercurial, et al, because the edit histories required to produce test cases were difficult (at least for me) to reason about.
I like that Bram hasn’t let go of the problem, and is still trying out new ideas in the space.
dboreham 2 days ago |
Note that CRDT isn't "a thing". The CRDT paper provides a way to think about and analyze eventually consistent replication mechanisms. So CRDTs weren't "introduced", only the "CRDT way of discussing replication". Every concrete mechanism described in the CRDT paper is very old, widely used for decades beforehand.
This means that everything that implements eventual consistency (including Git) is using "a CRDT".
hrmtst93837 2 days ago |
If you stretch "CRDT" to mean any old eventually consistent thing, almost every Unix tool morphs into one under a loose enough definition. That makes the term much less useful, because practical CRDTs in 2024 usually mean opaque merge semantics, awkward failure modes, and operational complexity that has very little in common with the ancient algorithms people point at when they say "Git is a CRDT too". "Just Git" is doing a lot of work there.
mweidner 2 days ago |
While this is technically correct, folks discussing CRDTs in the context of text editing are typically thinking of a fairly specific family of algorithms, in which each character (or line) is assigned an immutable ID drawn from some abstract total order. That is the sense in which the original post uses the term (without mentioning a specific total order).
gritzko 2 days ago |
In 2007 Bram said to me that my Causal Tree algorithm is a variant of weave. Which is broadly correct. In these 20 years, the family of weave-class algos grew quite big. In my 2020 article, I devoted the intro to making their family portrait https://arxiv.org/abs/2002.09511 Could have been a separate article.
bramcohen a day ago |
The whole point of using a proper CRDT is that it's easy to reason about what it does. It took me a while to figure out the details of how to build one.
logicprog 2 days ago |
This seems like an excellent idea. I'm sure a lot of us have been idly wondering why CRDTs aren't used for VCS for some time, so it's really cool to see someone take a stab at it! We really do need an improvement over git; the question is how to overcome network effects.
righthand 2 days ago |
Well over half of all people can’t tell you the difference between git and Github. The latter being owned by a corporation that needs the network effect to keep existing.
vishvananda 2 days ago |
This is actually a very interesting moment to potentially overcome network effects, because more and more code is going to be written by agents. If a crdt approach is measurably better for merging by agent swarms then there is incentive to make the switch. It also much easier to get an agent to change its workflow than a human. The only tricky part is how much git usage is in the training set so some careful thought would need to be given to create a compatibility layer in the tooling to help agents along.
NetOpWibby 2 days ago |
Overcoming network effects cannot be the goal; otherwise, work will never get done.
The goal should be to build a full spec and then build a code forge and ecosystem around this. If it’s truly great, adoption will come. Microsoft doing a terrible job with GitHub is great for new solutions.
ZoomZoomZoom 2 days ago |
The key insight in the third sentence?
> ... CRDTs for version control, which is long overdue but hasn’t happened yet
Pijul happened and it has hundreds - perhaps thousands - of hours of real expert developer's toil put in it.
Not that Bram is not one of those, but the post reads like you all know what.
simonw 2 days ago |
I hadn't heard of Pijul. My first search took me to https://github.com/8l/pijul which hasn't been updated in 11 years, but it turns out that's misleading and the official repo at https://nest.pijul.com/pijul/pijul had a commit last month.
... and of course it is, because Pijul uses Pijul for development, not Git and GitHub!
idoubtit 2 days ago |
The canonical website is https://pijul.org. The homepage has a link to the pijul source repository.
ozten 2 days ago |
They should mirror on GitHub for marketing purposes
nicoty 2 days ago |
How would they do that if they don't use git for version control? Does GitHub allow other forms of version control other than git?
simonw a day ago |
SQLite does it despite using Fossil - their mirror is at https://github.com/sqlite/sqlite
Git is so established now that it's sensible for alternative VCS to have a mode where they can imitate the Git protocol - or seven without that you can still checkout the latest version of your repo and git push that on a periodic basis.
verdverm a day ago |
Similarly, CUE uses Gerrit and has two way sync. If you are building a VCS today, git interop is a must.
adastra22 a day ago |
What if the whole point of your VCS is that it its core data structure is nothing like git's at all?
verdverm 18 hours ago |
As a user, why do I care how the internals work?
What I do care about is an easy path to progressive adoption and migration. Without that, I cannot convince my team / org to force everyone over.
adastra22 8 hours ago |
It solves problems that you dont encounter if you are asking that question. I’ve lost a literal year or more of my life, in aggregate, to rebasing changes against upstream that could have been handled automatically by a sufficiently smart VCS.
adastra22 a day ago |
Git is not a protocol, it is a data format. That only makes sense when your VCS system is similar enough to git to easily allow converting between the two representations.
simonw a day ago |
I mean things like git-svn, hg-git, git-p4, git-remote-fossil, git-tfs, jj.
adastra22 19 hours ago |
Every single one of those is following variations on the exact same data structure, or is actually git in a trenchcoat.
ozten a day ago |
pijul clone https://nest.pijul.org/pijul/pijul
pijul log --hash-only > all_changes.txt
pijul unrecord --all
git init
``` for HASH in $(cat all_changes.txt); do pijul apply "$HASH" pijul reset # sync working copy to channel state git add -A git commit -m "pijul change: $HASH" done ```
git remote add origin git@github.com:you/pijul-mirror.git git push -u origin main
codethief 2 days ago |
> I hadn't heard of Pijul
I'm surprised! Pijul has been discussed here on HN many, many times. My impression is that many people here were hoping that Pijul might eventually become a serious Git contender but these days people seem to be more excited about Jujutsu, likely because migration is much easier.
simonw 2 days ago |
Looks like it makes the homepage only once or twice a year (using points>50 as a proxy for that), had more buzz around five years ago: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
jedberg 2 days ago |
I too am here all the time and have never heard of it. But it looks interesting.
vova_hn2 2 days ago |
I have a weird hobby: about once a year I go to the theory page [0] in pijul manual and see if they have fixed the TeX formatting yet.
You would think that if a better, more sound model of storing patches is your whole selling point, you would want to make as easy as possible for people who are interested in the project to actually understand it. It is really weird not to care about the first impression that your manual makes on a curious reader.
Currently, I'm about 6 years into the experiment.
Approximately 2 years in (about 4 years ago), I've actually went to the Pijul Nest and reported [1] the issue. I got an explanation on fixing this issue locally, but weirly enough, the fix still wasn't actually implemented on the public version.
I'll report back in about a year with an update on the experiment.
[0] https://pijul.org/manual/theory.html
[1] https://nest.pijul.com/pijul/manual/discussions/46
AceJohnny2 2 days ago |
> It is really weird not to care about the first impression that your manual makes on a curious reader.
On the contrary, I think this is an all-too-familiar pitfall for the, er... technically minded.
"I've implemented it in the code. My work here is done. The rest is window dressing."
oofbey a day ago |
Hilarious. I agree that it says a lot how a project handles reports like these.
onoesworkacct a day ago |
I mean, I personally just saw some stuff inside dollar signs and went "huh, weird choice of delimiter"
rbsmith 2 days ago |
Do you use Pijul?
From time to time, I do a 'pijul pull -a' into the pijul source tree, and I get a conflict (no local work on my part). Is there a way to do a tracking update pull? I didn't see one, so I toss the repo and reclone. What works for you in tracking what's going on there?
adastra22 a day ago |
From time to time I get curious about Pijul, attempt to pull the Pijul repo from the nest, and encounter a no-workaround-possible bug in the network sync. I have never been able to do a fresh clone of Pijul.
It is very hard to take a project like this seriously.
mkj a day ago |
Are you saying Bram hasn't worked on VCS problems much? https://web.archive.org/web/20071213090008/http://codeville.... is 20 years.
zamalek a day ago |
Pijul isn't a CRDT is it? It's theory of patches (i.e. DARCS++) alongside native conflicts.
gnufx 13 hours ago |
Its author says it implements a CRDT in its theory documentation.
radarsat1 2 days ago |
Is it a good thing to have merges that never fail? Often a merge failure indicates a semantic conflict, not just "two changes in the same place". You want to be aware of and forced to manually deal with such cases.
I assume the proposed system addresses it somehow but I don't see it in my quick read of this.
mikey-k 2 days ago |
This
recursivecaveat 2 days ago |
It says that merges that involve overlap get flagged to the user. I don't think that's much more than a defaults difference to git really. You could have a version of git that just warns on conflict and blindly concats the sides.
hungryhobbit 2 days ago |
They address this; it's not that they don't fail, in practice...
the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails.
bigfishrunning 2 days ago |
Isn't that how the current systems work though? Git inserts conflict markers in the file, and then emacs (or whatever editor) highlights them
The big red block seems the same as "flagged", unless I'm misunderstanding something
computably a day ago |
With git, conflicts interrupt the merge/rebase. And if you end up in a situation with multiple rebases/merges/both, it's easy to get a "bad" state, or be forced to resolve redundant conflict(s) over and over.
In Jujutsu and Pijul, for example, conflicts are recorded by default but marked as conflict commits/changes. You can continue to make commits/changes on top. Once you resolve the conflict of A+B, no future merges or rebases would cause the same conflict again.
bigfishrunning 21 hours ago |
Thank you for the explanation, i did misunderstand! seems like a neat feature
rectang 2 days ago |
I agree. Nevertheless I wonder if this approach can help with certain other places where Git sometimes struggles, such as whether or not two commits which have identical diffs but different parents should be considered equivalent.
In the general case, such commits cannot be considered the same — consider a commit which flips a boolean that one branch had flipped in another file. But there are common cases where the commits should be considered equivalent, such as many rebased branches. Can the CRDT approach help with e.g. deciding that `git branch -d BRANCH` should succeed when a rebased version of BRANCH has been merged?
gojomo 2 days ago |
Should you be counting on confusion of an underpowered text-merge to catch such problems?
It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
Post-merge syntax checks are better for that purpose.
And imminently: agent-based sanity-checks of preserved intent – operating on a logically-whole result file, without merge-tool cruft. Perhaps at higher intensity when line-overlaps – or even more-meaningful hints of cross-purposes – are present.
skydhash 2 days ago |
> It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.
That has not been my experience at all. The changes you introduced is your responsibility. If you synchronizes your working tree to the source of truth, you need to evaluate your patch again whether it introduces conflict or not. In this case a conflict is a nice signal to know where someone has interacted with files you've touched and possibly change their semantics. The pros are substantial, and it's quite easy to resolve conflicts that's only due to syntastic changes (whitespace, formatting, equivalent statement,...)
gojomo 2 days ago |
If you're relying on a serialized 'source of truth', against which everyone must independently ensure their changes sanely apply in isolation, the. you've already resigned yourself to a single-threaded process that's slower than what improved merges aim to enable.
Sure, that works – like having one (rare, expensive) savant engineer apply & review everything in a linear canonical order. But that's not as competitive & scalable as flows more tolerant of many independent coders/agents.
yammosk 2 days ago |
And yet after all these year of git supporting no source of truth we still fall back on it. As long as you have an authoritative version and authoritative release then you have one source of truth. Linus imagined everyone contributing with no central authority and yet we look to GitHub and Gitlab to centralize our code. Git is already decentralized and generally we find it impractical.
skydhash 2 days ago |
Decentralization in this case means one can secede easily from the central authority. So anyone working on a project can easily split away from the main group at any time. But every project have a clear governance where the main direction is set and the canonical version of the thing being under version control is stored.
That canonical version is altered following a process and almost every project agrees that changes should be proposed against it. Even with independent agents, there should be a way to ensure consensus and decides the final version. And that problem is a very hard one.
IshKebab 2 days ago |
He's not saying you shouldn't have conflicts; just that it's better to have syntax-aware conflict detection. For example if two people add a new function to the end of the same file, Git will always say that's a conflict. A syntax-aware system could say that they don't conflict.
radarsat1 19 hours ago |
> Should you be counting on confusion of an underpowered text-merge to catch such problems?
This does not really follow from my statement.
I said that underpowered text merge should not silently accept such situations, not that it is the only way to catch them. It doesn't replace knowing something about what you are merging, but it is certainly a good hint that something may be wrong or unexpected.
> Post-merge syntax checks are better for that purpose.
Better, yes, but I was addressing semantic issues, not syntactical. I have seen syntactically valid merges result in semantic inconsistency, it does happen.
I do agree with your last statement.. unit & integration tests, agent checks or whathaveyou, these all contribute to semantic checking, which is a good thing.
Can they be relied on here? Maybe? I guess the jury is still out. My testing philosophy is "you can only test for what you think of testing". And tests and agent checks have a signal to noise ratio, and are only as useful as their SNR allows.
There is no guaranteed way to stop bugs from happening, if there were it likely would have been discovered by now. All we can do is take a layered approach to provide opportunities for them to get caught early. Removing one of those layers (merge conflicts) is not clearly a good thing, imho, but who knows.. if agent checks can replace it, then sure, I'm all for it.
jwilliams 2 days ago |
Indeed. And plenty of successful merges end up with code that won't compile.
FWIW I've struggled to get AI tools to handle merge conflicts well (especially rebase) for the same underlying reason.
layer8 2 days ago |
Code not compiling is still the good case, because you’ll notice before deployment. The dangerous cases are when it does compile.
jwilliams 2 days ago |
Very true.
I realized recently that I've subconsciously routed-around merge conflicts as much as possible. My process has just subtly altered to make them less likely. To the point of which seeing a 3-way merge feels jarring. It's really only taking on AI tools that bought this to my attention.
Gibbon1 2 days ago |
I've seen merged code where the memory barriers were misplaced.
skydhash 2 days ago |
I'm surprised to see that some people sync their working tree and does not evaluate their patch again (testing and reviewing the assumptions they have made for their changes).
conradludgate 2 days ago |
My understanding of the way this is presented is that merges don't _block_ the workflow. In git, a merge conflict is a failure to merge, but in this idea a merge conflict is still present but the merge still succeeds. You can commit with conflicts unresolved. This allows you to defer conflict resolution to later. I believe jj does this as well?
Technically you could include conflict markers in your commits but I don't think people like that very much
rightbyte 2 days ago |
> You can commit with conflicts unresolved.
True but it is not valid syntax. Like, you mean with the conflict lines?
ericpauley 2 days ago |
Yeah this seems silly. You can do the same thing in git (add and commit with the conflict still there)! Why you would want to is a real mystery.
fweimer 2 days ago |
It allows review of the way the merge conflict has been resolved (assuming those changes a tracked and presented in a useful way). This can be quite helpful when backporting select fixes to older branches.
furyofantares 2 days ago |
The conflict lines shown in the article are not present in the file, they are a display of what has already been merged. The merge had changes that were too near each other and so the algorithm determined that someone needs to review it, and the conflict lines are the result of displaying the relevant history due to that determination.
In the example in the article, the inserted line from the right change is floating because the function it was in from the left has been deleted. That's the state of the file, it has the line that has been inserted and it does not have the lines that were deleted, it contains both conflicting changes.
So in that example you indeed must resolve it if you want your program to compile, because the changes together produce something that does not function. But there is no state about the conflict being stored in the file.
theK a day ago |
Isnt that a bit dangerous in its own? If the merge process can complete without conflicts being resolved, doesnt it just push the Problem down the road? All of a sudden you have to deal with failing CI or ghost features that involve multiple people where actually you just should has solved you conflict locally at merge time.
furyofantares 18 hours ago |
The tooling can force resolution at any step desired.
theK 17 hours ago |
What's the point of options when there only is one correct answer?
furyofantares 15 hours ago |
The conflict is no longer an ephemeral part of the merge that only ever lives as markup in the source files and is stomped by the resolution that's picked, but instead a part of history.
I think it is also not true that there's only one correct answer, although I don't know how valuable this is.
For committing let's say yes, only one correct answer. Say the tool doesn't let you commit after you've merged without resolving conflicts.
But continuing to work locally you may want to put off resolving the conflict temporarily. Like person A changed the support email to help@example.com and person B changed it to support@example.com - obviously some wires go crossed and I will have to talk to A or B before committing the merge and pushing, but I can also go ahead and test the rest of the merge just fine.
And heck, maybe even committing after merging is fine but pushing requires resolving. Then I can continue working and committing locally on whatever else I was working on, and I'll only resolve it if I need to push. Which may mean I never need to resolve it, because A or B resolve it and push first.
Someone 2 days ago |
In this model, conflicts do not exist, so there are no conflict markers (the UI may show markers, but they get generated from what they call “the weave”)
Because of that, I think it is worse than “but it is not valid syntax”; it’s “but it may not be valid syntax”. A merge may create a result that compiles but that neither of the parties involved intended to write.
anentropic a day ago |
If other systems are doing it too then I guess it must be useful
But why is it useful to be able to defer conflict resolution?
I saw in a parallel comment thread people discussing merge commit vs rebase workflow - rebase gives cleaner git history but is a massive pain having to resolve conflicts on every commit since current branch diverged instead of just once on the final result with merge commit.
Is it that? Deferred conflict resolution allows you to rebase but only resolve conflicts at the end?
tcoff91 20 hours ago |
Deferred conflict resolution is amazing in jj because I may never return to some of the branches that are in conflict and therefore might never bother resolving them.
I rebase entire trees of commits onto main daily. I work on top of a dev-base commit and it has all kinds of anonymous branches off it. I rebase it and all its subbranches in 1 command and some of those sub branches might now be in a conflicted state. I don’t have to resolve them until I need to actually use those commits.
dev-base is an octopus merge of in-flight PRs of mine. When work is ready to be submitted it moves from being a descendent of dev-base to a parent of dev-base.
Rebasing all my PRs and dev-base and all its descendents is 1 command. Just make sure my @ is a descendent of dev-base and then run: jj rebase -d main
h14h a day ago |
Probably depends on what is in the merge. Lately I've been collaborating a ton on PRDs and software specs in markdown (now that agents have gotten pretty good at turning it into usable code) and using git had been pretty painful. Especially when working with a domain expert whose not as technical, git is proving to almost be more of a barrier than an aid.
For this kind of work (which I suspect will only get more common), a CRDT-based VCS makes a lot of sense.
gitmwnkdkc a day ago |
Yes and no. Most often conflicts could have been handled automatically with better tools. For example I have a script that makes a copy of the whole folder and tries to merge each commit using all of git’s different merge stategies, and all sub stategies, and presents which ones can merge without any conflicts. It has been mind opening. Why git doesn’t have this built-in I don’t understand.
Git also writes (non-logs) to the .git folder for operations that you would assume should have been r/o, but that’s another problem (that affects things later on).
Levitating 14 hours ago |
> Conflicts are informative, not blocking. ... Conflicts are surfaced for review when concurrent edits happen “too near” each other, but they never block the merge itself.
So conflicts are still surfaced for review.
simonw 2 days ago |
This thing is really short. https://github.com/bramcohen/manyana/blob/main/manyana.py is 473 lines of dependency-free Python (that file only imports difflib, itertools and inspect) and of that ~240 lines are implementation and the rest are tests.
zahlman 2 days ago |
It's really impressive what can be done in a few hundred lines of well-thought-out Python without resorting to brutal hacks. People complain about left-pad incidents etc. in the JS world but I honestly feel like the Python ecosystem could do with more, smaller packages on balance. They just have to be put forward by responsible people who aren't trying to make a point or inflate artificial metrics.
josephg 2 days ago |
I bet you can make a small, beautiful implementation of this algorithm in most languages. Most algorithms - even ones that take generations of researchers to figure out - end up tiny in practice if you put the work in to understand them properly and program them in a beautiful way. Transformers are the same. Genius idea. But a very small amount of code to implement.
This is an implementation of FugueMax (Weidner and Kleppmann) done using a bunch of tricks from Yjs (Jahns). There’s generations of ideas here, by lots of incredibly smart people. And it turns out you can code the whole thing up in 250 lines of readable typescript. Again with no dependencies.
https://github.com/josephg/crdt-from-scratch/blob/master/crd...
zahlman a day ago |
I'm not familiar with CRDT but the code does look pretty nice. I actually have been thinking myself of streaming my development, but just the terminal without camera or microphone. (So I think I want to wait until I'm doing something that will look pretty in the terminal.)
sayrer a day ago |
The joke is there in the name. It is wrong (on purpose): it should be "Mañana". That term means "tomorrow" in your Spanish class, but it can mean "later/future/morning" or even "later this afternoon".
In English, you might think of "procrastination" or "we'll get to it."
In Portuguese, you would say "proxima semana", literally "next week", but it means "we'll get to it" (won't get to it).
mikey-k 2 days ago |
Interesting idea. While conflicts can be improved, I personally don't see it as a critical challenge with VCS.
What I do think is the critical challenge (particularly with Git) is scalability.
Size of repository & rate of change of repositories are starting to push limits of git, and I think this needs revisited across the server, client & wire protocols.
What exactly, I don't know. :). But I do know that in my current role (mid-size well-known tech company) is hitting these limits today.
layer8 2 days ago |
One solution is to decompose your code into modules with stable interfaces and reference them as versioned dependencies.
procaryote a day ago |
What kind of scalability issues have you had with git?
Is it because of a monorepo?
mikey-k a day ago |
yes - monorepo. Git (and associated service providers) have a lot of work to do to scale out to large organizations working in a single code base.
"Better Merge Conflicts" is not on this list.
Although I'm sympathetic to the problem, and I've personally worked on "Merge Conflicts at Scale". Some of what's being suggested here is interesting. I question if it makes a material difference in the "age of ai", where an AI can probably go figure out enough context to "figure things out".
adastra22 a day ago |
Merge conflict avoidance is not a monorepo issue. In fact, the whole purpose of a monorepo is to avoid these sorts of issues, so it's not surprising.
Merge conflict hell shows up when, for example, you maintain a long-lived feature branch periodically rebased against an indifferent upstream that has its own development priorities.
I've maintained a project for years that was in this sort of situation. About ~100 commits on top of upstream, but invasive ones that touched nearly every file. Every six months upstream did a new tagged release. It would take me literally weeks of effort to rebase our patches on top, as nearly every commit triggered its own merge conflict hell.
You don't encounter these sorts of issues in a monorepo.
lifeformed 2 days ago |
My issue with git is handling non-text files, which is a common issue with game development. git-lfs is okay but it has some tricky quirks, and you end up with lots of bloat, and you can't merge. I don't really have an answer to how to improve it, but it would be nice if there was some innovation in that area too.
miloignis 2 days ago |
I really think something like Xet is a better idea to augment Git than LFS, though it seems to pretty much only be used by HuggingFace for ML model storage, and I think their git plugin was deprecated? Too bad if it ends up only serving the HuggingFace niche.
rectang 2 days ago |
Has there ever been a consideration for the git file format to allow storage of binary blobs uncompressed?
When I was screwing around with the Git file format, tricks I would use to save space like hard-linking or memory-mapping couldn't work, because data is always stored compressed after a header.
A general copy-on-write approach to save checkout space is presumably impossible, but I wonder what other people have traveled down similar paths have concluded.
zahlman 2 days ago |
What strategies would you like to use to diff the binaries? Or else how are you going to avoid bloat?
Is it actually okay to try to merge changes to binaries? If two people modify, say, different regions of an image file (even in PNG or another lossless compression format), the sum of the visual changes isn't necessarily equal to the sum of the byte-level changes.
jayd16 18 hours ago |
The best solution I've seen is prevention.
What you can do in P4 is work in trunk, make sure you have latest and lock binary files you're working on. If you do that you won't have conflicts (at the file level anyway). Unlike git's design, this collaborative model is centralized and synchronous but it works.
Git is missing the built in binary support, the locking, and the efficient information sharing of file status tracking. With LFS you can cobble something together but it's not fast or easy.
I'm all for other solutions but I wish git would at least support this flow more whole heartedly until something else is invented.
jayd16 2 days ago |
Totally agree. After trying to flesh out Unreal's git plugin, it really shows how far from ideal git really is.
Partial checkouts are awkward at best, LFS locks are somehow still buggy and the CLI doesn't support batched updates. Checking the status of a remote branch vs your local (to prevent conflicts) is at best a naive polling.
Better rebase would be a nice to have but there's still so much left to improve for trunk based dev.
gregschoeninger 2 days ago |
We're working on this project to help with the non-text file and large file problem: https://github.com/Oxen-AI/Oxen
Started with the machine learning use case for datasets and model weights but seeing a lot of traction in gaming as well.
Always open for feedback and ideas to improve if you want to take it for a spin!
samuelstros 2 days ago |
Improving on "git not handling non-text files" is a semantic understanding aka parse step in between the file write.
Take a docx, write the file, parse it into entities e.g. paragraph, table, etc. and track changes on those entities instead of the binary blob. You can apply the same logic to files used in game development.
The hard part is making this fast enough. But I am working on this with lix [0].
[0] https://github.com/opral/lix
jayd16 18 hours ago |
What's the plan for large files that can't be merged? Images, executable binaries, encrypted files, that sort of thing?
samuelstros 18 hours ago |
Simple left or right merge. One overwrites the other one.
The appeal or structured file formats like .docx, .json, etc. Images are unstructured and simple "do you want to keep the left or right image" is good enough.
jayd16 10 hours ago |
That doesn't really address the game dev use case then. Artists and designers want to prevent conflicts, not just throw away half the work and redo it.
samuelstros 8 hours ago |
track the source of the asset and it works. take ui design. dont track the svg. track the design file itself
gnarlouse 2 days ago |
I think something like this needs to be born out of analysis of gradations of scales of teams using version control systems.
- What kind of problems do 1 person, 10 person, 100 person, 1k (etc) teams really run into with managing merge conflicts?
- What do teams of 1, 10, 100, 1k, etc care the most about?
- How does the modern "agent explosion" potentially affect this?
For example, my experience working in the 1-100 regime tells me that, for the most part, the kind of merge conflict being presented here is resolved by assigning subtrees of code to specific teams. For the large part, merge conflicts don't happen, because teams coordinate (in sprints) to make orthogonal changes, and long-running stale branches are discouraged.
However, if we start to mix in agents, a 100 person team could quickly jump into a 1000 person team, esp if each person is using subagents making micro commits.
It's an interesting idea definitely, but without real-world data, it kind of feels like this is just delivering a solution without a clear problem to assign it to. Like, yes merge-conflicts are a bummer, but they happen infrequently enough that it doesn't break your heart.
CuriouslyC 2 days ago |
Team scale doesn't tend to impact this that much, since as teams grow they naturally specialize in parts of the codebase. Shared libs can be hotspots, I've heard horror stories at large orgs about this sort of thing, though usually those shared libs have strong gatekeeping that makes the problem more one of functionality living where it shouldn't to avoid gatekeeping than a shared lib blowing up due to bad change set merges.
tasuki 2 days ago |
> What kind of problems do 1 person, 10 person, 100 person, 1k (etc) teams really run into with managing merge conflicts?
> What do teams of 1, 10, 100, 1k, etc care the most about?
Oh god no! That would be about the worst way to do it.
Just make it conceptually sound.
gnarlouse 2 days ago |
Probably, but just introducing CRDTs also feels like the wrong way to approach the problem! :)
meindnoch 12 hours ago |
Also, we need personas! Sally the developer, Mark the UX designer, Taylor the manager. Also, we need to build a community, with the help of evangelists!
jillesvangurp 2 days ago |
> How does the modern "agent explosion" potentially affect this?
This changes everything. Agents don't really care what versioning software is used. They can probably figure out whatever you are using. But they'll likely assume it's something standard (i.e. Git) so the easiest is to not get too adventurous. Also, the reasons to use something else mostly boil down to user friendliness and new merge strategies. However, lately I just tell codex to pull and deal with merge conflicts. It's not something I have to do manually anymore. That removes a key reason for me to be experimenting with alternative version control systems. It's not that big of a problem anymore.
Git was actually designed for massive teams (the Linux kernel) but you have to be a bit disciplined using it in a way that many users in smaller teams just aren't. With agentic coding tools, you can just codify what you want to happen in guardrails and skills. Including how to deal with version control and what process to follow.
Where more advanced merge strategies could be helpful is the type of large scale refactoring that are now much easier with agentic coding tools. But doing that in repositories with lots of developers working on other changes is not something that should happen very often. And certainly not without a lot of planning and coordination probably.
lcbasu a day ago |
>Agents don't really care what versioning software is used
Strongly agree that agents don't care about the VCS as they will figure out whatever you throw at them. And you are right about that the merge conflicts are becoming a solved problem when you can just tell an agent to handle it.
But I think there is a much bigger problem emerging that better merge strategies (CRDT or otherwise) do not even touch: the reasoning is gone.
For example the situation taken from the blog is that one side deletes a function while another adds a logging line inside it. The CRDT will give you a better conflict display showing what each side did. Great. But it still doesn't tell you why the function was deleted. Was it deprecated? Moved? Replaced by something else? The reviewer is still reverse-engineering intent from the diff.
This gets/will get much worse with coding agents as agentic commits are orders of magnitude larger, and the commit message barely summarises what happened. An agent might explore three approaches, hit dead ends, flag something as risky, then settle on a solution. All that context vanishes after the session ends.
You are right about codifying guardrails and skills, and I think that is the more productive direction compared to replacing git. We should augment the workflow around it. I also started from a much more radical place, actually, thinking we need to ditch git entirely for agentic workflows [1]. BUT the more I built with agents, the more I realized the pragmatic first step is just preserving the reasoning trail alongside the code, right there in git[2]. No new VCS needed, and the next agent or human that touches the code has the full "WHY" available.
[1] https://github.com/lcbasu/git4aiagents/commit/3a3b197#diff-b... [2]https://www.git4aiagents.com
jFriedensreich 2 days ago |
starts with “based on the fundamentally sound approach of using CRDTs for version control”. How on earth is crdt a sound base for a version control system? This makes no sense fundamentally, you need to reach a consistent state that is what you intended not what some crdt decided and jj shows you can do that also without blocking on merges but with first level conflicts that need to be resolved. ai and language aware merge drivers are helping so much here i really wonder if the world these “replace version control” projects were made for still exists at all.
miloignis 2 days ago |
The rest of the article shows exactly how a CRDT is a sound base for a version control system, with "conflicts" and all.
skydhash 2 days ago |
But the presentation does not show how it resolves conflicts. For the first example, Git has the 3 way-merge that shows the same kind of info. And a conflict is not only to show that two people have worked on a file. More often than not, it highlight a semantic changes that happened differently in two instances and it's a nice signal to pay attention to this area. But a lot of people takes merge conflicts as some kind of nuisance that prevents them from doing their job (more often due to the opinion that their version is the only good one).
jFriedensreich a day ago |
where?
nozzlegear 2 days ago |
> ai and language aware merge drivers are helping so much here i really wonder if the world these “replace version control” projects were made for still exists at all.
I really wonder what kinds of magical AI you're using, because in my experience, Claude Code chokes and chokes hard on complex rebases/merge conflicts to the point that I couldn't trust it anymore.
jFriedensreich a day ago |
the latest codex or opus 4.6 depending what works better. the trick is not to work on giant rebases at all. use commit stacks that constantly rebase on trunk so will constantly have small conflicts that are simple to resolve rather than waiting for giant headaches. another one is to use jj and resolve conflicts "through the stack" so you rebase your work stack, then resolve conflicts from the bottom commit up, one by one, not all at once.
jauntywundrkind 2 days ago |
In case the name doesn't jump out at you, this is Bram Cohen, inventory of Bittorrent. And Chia proof-of-storage (probably better descriptions available) cryptocurrency. https://en.wikipedia.org/wiki/Bram_Cohen
It's not the same as capturing it, but I would also note that there are a wide wide variety of ways to get 3-way merges / 3 way diffs from git too. One semi-recent submission (2022 discussing a 2017) discussed diff3 and has some excellent comments (https://news.ycombinator.com/item?id=31075608), including a fantastic incredibly wide ranging round up of merge tools (https://www.eseth.org/2020/mergetools.html).
However/alas git 2.35's (2022) fabulous zdiff3 doesn't seems to have any big discussions. Other links welcome but perhaps https://neg4n.dev/blog/understanding-zealous-diff3-style-git...? It works excellently for me; enthusiastically recommended!
ulrikrasmussen 2 days ago |
The thing about how merges are presented seems orthogonal to how to represent history. I also hate the default in git, but that is why I just use p4merge as a merge tool and get a proper 4-pane merge tool (left, right, common base, merged result) which shows everything needed to figure out why there is a conflict and how to resolve it. I don't understand why you need to switch out the VCS to fix that issue.
crote 2 days ago |
Seconding the use of p4merge for easy-to-use three-pane merging. Just like most other issues with Git, if your merges are painful it's probably due to terrible native UX design - not due to anything conceptually wrong with Git.
TacticalCoder 2 days ago |
Thirding it except I do it from Emacs. Three side-by-side pane with left / common ancestor / right and then below the merge result. By default it's not like that but then it's Emacs so anything is doable. I hacked some elisp code a great many years ago and I've been using it ever since.
No matter the tool, merges should always be presented like that. It's the only presentation that makes sense.
MarsIronPI 2 days ago |
What tool do you use? Does Magit support it natively?
skydhash 2 days ago |
I think you need to enable 3 way merge by default in git's configuration, and both smerge (minor mode for solving conflicts) and ediff (major mode that encompass diff and patch) will pick it up. In the case of the latter you will have 4 panes, one for version A, another for version B, a third for the result C, and the last is the common ancestor of A and B.
Addendum: I've since long disabled it. A and B changes are enough for me, especially as I rebase instead of merging.
jwr 2 days ago |
Isn't that what ediff does?
arikrahman a day ago |
The extensibility provided with Emacs Lisp has been helpful for hacking together my own Git/Jujutsu plugin. I tried to model it over lazygit/lazyjj although magit has been incredible to use and hard to depart from.
roryokane 2 days ago |
Did you know that VS Code added support for the same four-pane view as p4merge years ago? I used p4merge as my merge tool for a long time, but I switched to VS Code when I discovered that, as VS Code’s syntax highlighting and text editing features are much better than p4merge’s.
I also use the merge tool of JetBrains IDEs such as IntelliJ IDEA (https://www.jetbrains.com/help/idea/resolve-conflicts.html#r...) when working in those IDEs. It uses a three-pane view, not a four-pane view, but there is a menu that allows you to easily open a comparison between any two of the four versions of the file in a new window, so I find it similarly efficient.
stbtrax a day ago |
how do you set that up? the default git tool version seems to be 3 pane
ramchip a day ago |
Looks like it's "Show Base" under the top-level "..." menu when working on a merge conflict
https://github.com/microsoft/vscode/issues/155277#issuecomme...
roryokane 2 days ago |
Even if you don’t use p4merge, you can set Git’s merge.conflictStyle config to "diff3" or "zdiff3" (https://git-scm.com/docs/git-config#Documentation/git-config...). If you do that, Git’s conflict markers show the base version as well:
<<<<<<< left ||||||| base def calculate(x): a = x * 2 b = a + 1 return b ======= def calculate(x): a = x * 2 logger.debug(f"a={a}") b = a + 1 return b >>>>>>> right
With this configuration, a developer reading the raw conflict markers could infer the same information provided by Manyana’s conflict markers: that the right side added the logging line.
ktm5j 2 days ago |
I'm on my phone right now so I'm not going to dig too hard for this, but you can also configure a "merge tool" (or something like that) so you can use Meld or Kompare to make the process easier. This has helped me in a pinch to work out some confusing merge conflicts.
newsoftheday 2 days ago |
I started using Meld years ago and continue to find people who've never heard of it. It's a pretty good tool.
otherayden a day ago |
Huge meld fan here, recommended by a professor about a year ago. Game changer
psychoslave 2 days ago |
That still have an issue with the vocabulary. Things like "theirs/our" is still out of touch but it's already better than a loose spatial analogy on some representation of the DAG.
Something like base, that is "common base", looks far more apt to my mind. In the same vein, endogenous/exogenous would be far more precise, or at least aligned with the concern at stake. Maybe "local/alien" might be a less pompous vocabulary to convey the same idea.
kungito 2 days ago |
After 15 years i still cant remember which is which. I get annoyed every time. Maybe I should invest 15 minutes finally to remember properly
IgorPartola 2 days ago |
Let’s see if I get this wrong after 25 years of git:
ours means what is in my local codebase.
theirs means what is being merged into my local codebase.
I find it best to avoid merge conflicts than to try to resolve them. Strategies that keep branches short lived and frequently merging main into them helps a lot.
marcellus23 2 days ago |
That's kind of the simplest case, though, where "theirs" and "ours" makes obvious sense.
What if I'm rebasing a branch onto another? Is "ours" the branch being rebased, or the other one? Or if I'm applying a stash?
IgorPartola 2 days ago |
> What if I'm rebasing a branch onto another?
Just checkout the branch you are merging/rebasing into before doing it.
> Or if I'm applying a stash?
The stash is in that case effectively a remote branch you are merging into your local codebase. ours is your local, theirs is the stash.
sheept a day ago |
"Ours" and "theirs" make sense in most cases (since "ours" refers to the HEAD you're merging into).
Rebases are the sole exception (in typical use) because ours/theirs is reversed, since you're merging HEAD into the other branch. Personally, I prefer merge commits over rebases if possible; they make PRs harder for others to review by breaking the "see changes since last review" feature. Git generally works better without rebases and squash commits.
sebmellen a day ago |
Wow, interesting to see such a diametrically opposed view. We’ve banned merge commits internally and our entire workflow is rebase driven. Generally, I find that rebases are far better at keeping Git history clean and clearly allowing you to see the diff between the base you’re merging into and the changes you’ve made.
Ajedi32 19 hours ago |
Yes, I prefer that approach as well because it allows the person who authored the change to do all the work of deciding how to resolve conflicts up front (and allows reviewers to review that conflict resolution) instead of forcing whoever eventually does the merge to figure everything out after the fact. It also removes conflicts from the history so you never have to think about them later after the rebase/merge process is finished.
ulrikrasmussen 23 minutes ago |
"Clean" is not the same as "useful". You have to be really, really disciplined to not make a superficially looking "clean" history which may appear linear but which is actually total nonsense.
For example, if one is frequently doing "fix after rebase" commits, then they are doing it wrong and are making a history which is much less useful than a seemingly more complicated merge based history. Rebased histories are only clean if they also tell a true story after the rebase, but if you push "rebase fixes" onto the end of your history, then it means that prior rebased commits no longer make any sense because they e.g. use APIs that aren't actually there. Giving up and squashing everything to one commit is almost better in this case because it at least won't throw off someone who is trying to make sense of the history in the future.
I think that rebasing has won over merges mostly because the tools for navigating git histories suck SO HARD. I have used Perforce at a previous job, and their graphical tools for navigating a merge based history are excellent and were really useful for doing code archeology.
KPGv2 a day ago |
> Git generally works better without rebases and squash commits.
If squash commits make Git harder for you, that's a tell that your branches are trying to do too many things before merging back into main.
ralferoo a day ago |
I don't know. Even when I'm working on my own private repositories across several machines, I really, really dislike regular merges. You get an ugly commit message and I can never get git log to show me the information I actually want to see.
For me, rebasing is the simplest and easiest to understand, and it allows you to squash some of your commits so that it's one commit per feature / bug-fix / logical unit of work. I'll also frequently rebase and squash commits in my work branch too, where I've temporarily committed something and then fixed a bug before it's been pushed into main, I'll just reorder and squash the relevant commits into one.
andy_ppp a day ago |
I completely agree, since doing rebase our history looks fantastic and it makes finding things, cherrypicking and generating changelogs really simple. Why not be neat, it's cost us nothing and you can make yourself a tutorial on Claude if you don't understand rebasing pretty easily.
andy_ppp a day ago |
Don't do squash commits, just rebase -i your branch before merging so you only have one commit. It's pretty trivial to do.
em-bee 2 days ago |
a better (more confusing) example:
i have a branch and i want to merge that branch into main.
is ours the branch and main theirs? or is ours main, and the branch theirs?
IgorPartola 2 days ago |
I always checkout the branch I am merging something into. I was vaguely aware I could have main checked out but merge foo into bar but have never once done that.
Sharlin 2 days ago |
git checkout mybranch git rebase main
A conflict happens. Now "ours" is main and "theirs" is mybranch, even though from your perspective you're still on mybranch. Git isn't, however.
IgorPartola 2 days ago |
Ah that’s fair. This is why I would do a `git merge main` instead of a rebase here.
ljm 2 days ago |
I have met more than one person who would doggedly tolerate rebase, not even using rerere, instead of doing a simple ‘git merge --no-ff’ to one-shot it, not understanding that rebase touches every commit in the diff between main and not simply the latest change on HEAD.
Not a problem if you are a purist on linear history.
em-bee 20 hours ago |
not understanding that rebase touches every commit in the diff
it sounds like that's a problem for you. why would that be? i prefer rebase and fast forward, but i am fully aware that rebase rewrites all commits.
clktmr 2 days ago |
The thing is, you'll typically switch to master to merge your own branch. This makes your own branch 'theirs', which is where the confusion comes from.
IgorPartola 2 days ago |
Not me. I typically merge main onto a feature branch where all the conflicts are resolved in a sane way. Then I checkout main and merge the feature branch into it with no conflicts.
As a bonus I can then also merge the feature branch into main as a squash commit, ditching the history of a feature branch for one large commit that implements the feature. There is no point in having half implemented and/or buggy commits from the feature branch clogging up my main history. Nobody should ever need to revert main to that state and if I really really need to look at that particular code commit I can still find it in the feature branch history.
throwaway7783 a day ago |
Yep. This is the only model that has worked well for me for more than a decade.
KPGv2 a day ago |
This is what I do, and I was taught by an experienced Git user over a decade ago. I've been doing it ever since. All my merges into main are fast forwards.
KPGv2 a day ago |
> ours means what is in my local codebase
Since it's always one person doing a merge, why isn't it "mine" instead of "ours"? There aren't five of us at my computer collaboratively merging in a PR. There is one person doing it.
"Ours" makes it sound like some branch everyone who's working on the repo already has access to, not the active branch on my machine.
itintheory a day ago |
That's between you and git.
imiric a day ago |
> Let’s see if I get this wrong after 25 years of git
You used it 5 years before Linus? Impressive!
IgorPartola a day ago |
Haha yes. You caught me :)
I was wondering when someone was going to point it out. I actually have only been using it since about 2009 after a brief flirtation with SVN and a horrible breakup with CVS.
afiori 2 days ago |
iirc ours is always the commit the merge is starting from. the issue is that with a merge your current commit is the merging commit while with a rebase it is reversed.
I suspect that this could be because the rebase command is implemented as a serie of merges/cherry-picks from the target branch.
Sharlin 2 days ago |
git checkout mybranch git rebase main
Now git takes main and starts cloning (cherry-picking, as you said) commits from mybranch on top of it. From git's viewpoint it's working on top of main, so if a conflict occurs, main is "ours" and mybranch is "theirs". But from your viewpoint you're still on mybranch, and indeed are left on mybranch when the rebase is complete. (It's a different mybranch, of course; once the rebase is completed, git moves mybranch to point to the new (detached) HEAD.) Which makes "ours" and "theirs" exactly the opposite of what the user expects.
XorNot 2 days ago |
Man do I hate this behavior because it would be really some by just using the branch names rather then "ours" and "theirs"
Sharlin a day ago |
Agreed. Even when the branch is the same, it would always be distinguishable by <remote-name>/<branch-name> vs. just <branch-name>.
orthoxerox 2 days ago |
I had to make an alias for rebasing, because I kept doing the opposite:
git checkout master #check out the branch to apply commits to git rebase mybranch #Apply all commits from mybranch
Now I just write
rebase-current-branch
and it does what I want: fetches origin/master and rebases my working branch on top of it.
But "ours"/"theirs" still keeps tripping me up.
cerved a day ago |
Tip, you may want to use origin/HEAD over origin/master
orthoxerox a day ago |
Is it the naming-independent identifier of the tip of the trunk?
skydhash a day ago |
You can use the --onto flag for git rebase
git rebase --onto origin/master
It will checkout origin/master and replay the current branch on top.
P.S. I had to check the man page as I use Magit. In the latter I tap r, then u. In magit my upstream is usually the main trunk. You can also tap e instead of u to choose the base branch.
awesome_dude 2 days ago |
This is one of my pain points, and one time I googled and got the real answer (which is why it's such a pain point).
That answer is "It depends on the context"
> The reason the "ours" and "theirs" notions get swapped around during rebase is that rebase works by doing a series of cherry-picks, into an anonymous branch (detached HEAD mode). The target branch is the anonymous branch, and the merge-from branch is your original (pre-rebase) branch: so "--ours" means the anonymous one rebase is building while "--theirs" means "our branch being rebased".[0]
[0] https://stackoverflow.com/questions/25576415/what-is-the-pre...
flutetornado 2 days ago |
I ended up creating a personal vim plugin for merges one night because of a frustrating merge experience and never being able to remember what is what. It presents just two diff panes at top to reduce the cognitive load and a navigation list in a third split below to switch between diffs or final buffer (local/remote, base/local, base/remote and final). The list has branch names next to local/remote so you always know what is what. And most of the time the local/remote diff is what I am interested in so that’s what it shows first.
mamcx a day ago |
Seriously!
Why not show the names of the branch + short Id (and when is not direct name, at least "this is from NAME")
sheept a day ago |
doesn't it? Next to the conflict markers, it'll display HEAD, the ref name, or the short commit hash.
cerved a day ago |
It does
KPGv2 a day ago |
I'll be honest, as a fairly skilled and experienced programmer who isn't a git expert, I know what HEAD means, but when I'm rebasing I really have no idea. It all seems to work out in the end because my collaborative work is simple and usually 2–3 people only, so I'm never rebasing against a ton of commits I lack context for (because 90% of them are my commits since I'm usually dealing with PRs to my open source projects rather than someone else's).
HEAD is "the thing we're editing now" but that's not terribly useful when rebasing since you're repeatedly editing a fake history.
lmm a day ago |
I avoid this problem by not rebasing.
jimbobimbo a day ago |
Seriously! I have too many years of software development experience, but I use Visual Studio UX to handle pretty much all git operations. And always merge.
I have better things to do in my life than "internalizing" anything that doesn't matter in the grand scheme of things.
ozim a day ago |
I don’t like that approach, because people who work like that commit all kind of crap to repo or cry that GIT ate their homework…
Then we have line ending conflicts, file format conflict UTF8-BOM mixes with just UTF8 it makes more work for everyone like crappy PRs. Because of people for who those are things that „don’t matter in grand scheme of things”.
eru a day ago |
I happen to know a lot about git internals, but I don't think everyone should need to.
About the line ending conflicts: set up your CI once to complain about those. And help your coworkers set up their editors right once.
gitmwnkdkc a day ago |
If it hurts, do it more often.
ozim a day ago |
Hey not every rebase has conflicts. I definitely rebase when there are no conflicts, then merge.
When there are conflicts I merge „theirs” into my branch to resolve those so I keep mental model for this side and don’t have to switch. Then rebase then open PR.
lmm a day ago |
You could do all that. Or you could just merge every time. I know which I find easier.
eru a day ago |
I do the following to keep my sanity when doing something like rebasing a feature branch onto latest origin/master:
* First and most important: turn on rerere.
* Second: merge and resolve all conflicts, commit.
* Third: rebase.
The second step might look redundant, but thanks to rerere git remembers the merge conflict resolution. That makes step 3 have fewer conflicts; and step 2 tends to be easier on me, because we are only reconciling the final outcomes.
(Well, I'm lying: the above is what I used to do. Nowadays I let Claude handle that, and only intervene when it gets too complicated for the bot.)
gwerbin a day ago |
Git leaks a lot of implementation details into its UX. Rebasing is meant to be equivalent to checking out the "base" branch and cherry picking commits onto it. Therefore "ours" during a rebase is the base branch.
The meaning of "ours" and "theirs" is always the same, but the "base" of the operation is reversed compared to what you might be used to during merge.
Rebasing can be confusing and hard and messy, but once I learned that rule and took the time to internalize it, I at least never got confused on this particular detail again.
> fake history
That's the thing, it's not actually fake history. Git really is doing the things it looks like it's doing during a rebase. That's why you can do all kinds of weird tricks like stopping in the middle to reset back a commit in order to make a new intervening commit. The reason you can abort at any time with (almost) no risk is because the old history is still hanging around in the database and won't be removed until GC runs, usually long after the rebase is settled.
skydhash a day ago |
Also git store (almost?) all its operations in the reflog. They have identifier like commits so you can reset to them and restore the original state of the working directory (mostly after an automatic rebase gone wrong).
gwerbin a day ago |
That's the thing, they're not "like commits", they are the actual original commits. It's a history of where the HEAD ref used to be. Eventually those commits will be pruned out of the tree if/when the reflog expires because there is nothing left pointing to them. But otherwise they are normal commits.
eru a day ago |
I think what the grandfather comment meant is 'like' in the sense of 'this is an example'. Not 'like' in the sense of 'sorta / approximately'.
eru a day ago |
Yes, Git is like a Copy-on-Write filesystem.
It's interesting that once even C programmers, like Linus, become really experienced, they embrace the wisdom that functional programmers are forced to swallow anyway.
PunchyHamster a day ago |
Learning git properly is pretty much "read Git book at least 3 times".
All of it makes sense and is decently intuitive once you know how internals work.
People keep imagining git as a series of diffs while in reality it's series of the filesystem tree snapshots + a bunch of tools to manage that and reconcile changes in face of merge. And most of that can be replaced if the builtins are not up to task. And the experience is getting slowly better but it's balance between power users and newbies, and also trying to not break stuff when going forward.
Now of course that sucks if programming is not someone's day job but there is plenty of tools that present simpler workflows built on top of that.
goku12 21 hours ago |
> HEAD is "the thing we're editing now" but that's not terribly useful when rebasing since you're repeatedly editing a fake history.
You got two things wrong here. Firstly, HEAD isn't 'the thing you're editing now'. HEAD is what you have already committed. If you want to edit the HEAD, you have to either amend the commit or reset and redo the commit. (To make the situation even more complex, the amended or overridden commit remains in the repo unchanged, but orphaned.)
The actual thing being edited is a 'patch' that will eventually be converted into a new commit (snapshot). If you're doing a rebase and want to see the next patch in the pipeline that you're editing now, try this:
git rebase --show-current-patch
Secondly, rebase is not editing a fake history. Rebase is creating a new (and real) history by repeatedly cherry picking commits from the old history based on the rebase plan. HEAD is the tip commit of the new history under construction. On completion of the rebase, the branch ref of the old history is switched to the new history, where HEAD is now at. Meanwhile, the old history remains in the repo unchanged, but again orphaned.
All the orphaned commits are still visible in the HEAD's reflog. You can use it to undo the rebase if you wish.
I agree that the entire thing is confusing as hell. But I have a bunch of aliases and scripts that show you the process graphically in realtime. You can use that awareness to make the right call every time. I'm thinking about converting it into a TUI application and publishing it.
pcthrowaway a day ago |
It does now by default, since v2.12 (released 2017). Prior to that you had to set the log.decorate config. Good times.
tome a day ago |
It doesn't matter which is which. The resolution will be the same regardless.
goku12 a day ago |
I was thinking about creating a TUI application that points out what each part in the conflict indicator corresponds to. This idea is primarily meant for rebases where the HEAD and the ID of the updated commits change constantly. Think of it as a map view of the rebase process, that improves your situational awareness by presenting all the relevant information simultaneously. That could trivially work for merges too.
wakawaka28 a day ago |
>Maybe "local/alien" might be a less pompous vocabulary to convey the same idea.
That is more alien and just as contrived. If you merge branches that you made, they're both local and "ours". You just have to remember that "ours" is the branch you are on, and "theirs" is the other one. I have no idea what happens in an octopus merge but anyway, the option exists to show commit titles along with markers to help you keep it straight.
psychoslave 19 hours ago |
Indeed, thanks.
Something that carry the meaning "the branch we want to use as a starting point" and "the other branch we want to integrate in the previous one" is what I had in mind, but it might not fit all situations in git merge/rebase.
wakawaka28 18 hours ago |
Assuming you have to start on one of the branches being merged, "current" is a good name for the one you started on. "Other" is good enough for the other one. By the way I found out that octopus merges never succeed in case of conflicts. I'm not even sure if prerecorded resolutions work in that case. You're supposed to do a series of normal merges instead if you have conflicts.
IshKebab 2 days ago |
This is better but it still doesn't really help when the conflict is 1000 lines and one side changed one character and the other deleted the whole thing. That isn't theoretical - it happens quite regularly.
What you really need is the ability to diff the base and "ours" or "theirs". I've found most different UIs can't do this. VSCode can, but it's difficult to get to.
I haven't tried p4merge though - if it can do that I'm sold!
pfg_ a day ago |
I tried p4merge a while ago and it didn't do it ubfortunately, still stuck copying the base and ours to seperate files and diffing them.
IshKebab a day ago |
So the way you can do it in VSCode is to open the conflict in their smart merge editor... Often it is actually smart enough to highlight the relevant change but if not each of the left/right editors has a button in its toolbar to diff it against the base.
Not the easiest to access but better than copying/pasting (which is what I also used to do).
fransje26 19 hours ago |
If I understood your point correctly, I believe that Meld can do that. And then you get a windows as [1]. You can configure git to choose which version goes where. Something like:
[mergetool "meld"] cmd = meld "$LOCAL" "$MERGED" "$REMOTE" --output "$MERGED" #cmd = meld "$LOCAL" "$BASE" "$REMOTE" --output "$MERGED"
[1] https://linuxkamarada.com/files/2019/11/git-mergetool-meld-e...
tempodox a day ago |
I still find this shit unreadable, even after years of practice.
warmwaffles 20 hours ago |
I've trained myself to avoid this entirely by avoiding changing lines unnecessarily. With LLMs, I also force them to stay concise and ONLY change what is absolutely necessary.
cxr 2 days ago |
> I don't understand why you need to switch out the VCS to fix that issue.
For some reason, when it comes to this subject, most people don't think about the problem as much as they think they've thought about it.
I recently listened to an episode on a well-liked and respected podcast featuring a guest there to talk about version control systems—including their own new one they were there to promote—and what factors make their industry different from other subfields of software development, and why a new approach to version control was needed. They came across as thoughtful but exasperated with the status quo and brought up issues worthy of consideration while mostly sticking to high-level claims. But after something like a half hour or 45 minutes into the episode, as they were preparing to descend from the high level and get into the nitty gritty of their new VCS, they made an offhand comment contrasting its abilities with Git's, referencing Git's approach/design wrt how it "stores diffs" between revisions of a file. I was bowled over.
For someone to be in that position and not have done even a cursory amount of research before embarking on a months (years) long project to design, implement, and then go on the talk circuit to present their VCS really highlighted that the familiar strain of NIH is still alive, even in the current era where it's become a norm for people to be downright resistant to writing a couple dozen lines of code themselves if there is no existing package to import from NPM/Cargo/PyPI/whatever that purports to solve the problem.
hn_throwaway_99 a day ago |
> they made an offhand comment contrasting its abilities with Git's, referencing Git's approach/design wrt how it "stores diffs" between revisions of a file. I was bowled over.
It seems like you have taken offense to the phrase "stores diffs", but I'm not sure why. I understand how commit snapshots and packfiles work, and the way delta compression works in packfiles might lead me to calling it "storing diffs" in a colloquial setting.
ghusbands 18 hours ago |
A common misconception is that git works with diffs as a primary representation of patches, and that's the implied reading of "stores diffs". Yes, git uses diffs as an optimisation for storage but the underlying model is always that of storing whole trees (DAGs of trees, even), so someone talking about it storing diffs is missing something fundamental. Even renames are rederived regularly and not stored as such.
However, context would matter and wasn't provided - without it, we're just guessing.
cxr 17 hours ago |
> It seems like you have taken offense to the phrase "stores diffs", but I'm not sure why.
Yeah, I'm not sure why it seems that way to you, either.
> the way delta compression works in packfiles might lead me to calling it "storing diffs" in a colloquial setting
We're not discussing some fragment of some historical artifact, one part of a larger manuscript that has been lost or destroyed, with us left at best trying to guess what they meant based on the little that we do have, which amounts to nothing more than the words that you're focusing on here.
Their remarks were situated within a context, and they went on to speak for another hour and a half about the topic. The fullness of that context—which was the basis of my decision to comment—involved that person's very real and very evident overriding familiarity with non-DVCS systems that predate Git and that familiarity being treated as a substitute for being knowledgeable about how Git itself works when discussing it in a conversation about the tradeoffs that different version control systems force you to make.
galaxyLogic 2 days ago |
Can it merge ordinary directories (in addition to git-bracnehs)?
killerstorm a day ago |
Yeah, also JetBrains IDEs like IntelliJ have very nice merge UI.
Perhaps the value of doing it on SCM level is that it can remember what you did. Git has some not-so-nice edge cases.
hyttioaoa a day ago |
I often find myself using the gitlens in vscode, to do something similar. I'd compare the working tree to the common base. Then I have the left pane with what's already in the base, the right pane is editable with the result in it.
It's nice to have all the LSP features available too while editing.
eru a day ago |
I agree!
This is all good advice, but these days I just ask Claude for solving merge conflicts, and I can't remember it ever going wrong.
Almost all merge conflicts I see in practice are fairly mechanically solved.
PunchyHamster a day ago |
There isn't. Git plumbing is elastic enough that you could have way different workflows built on top of it and still have repo that is usable by other tools.
Hell, git tools themselves offer a ton of customization, you can have both display and diff command different than the builtin very easily.
Some of Git defaults and command syntax might suck but all of that can be fixed without touching repo format
commandersaki 18 hours ago |
Is there any FOSS tool that gives a 4-pane merge, or even 3-pane so long as the third pane would show the resulting change. That would be so handy.
lemonwaterlime 2 days ago |
See vim-mergetool[1]. I use it to manage merge conflicts and it's quite intuitive. I've resolved conflicts that other people didn't even want to touch.
[1]: https://github.com/samoshkin/vim-mergetool
mentalgear 2 days ago |
Looks like vscode diff view .
skybrian 2 days ago |
It sounds interesting but the main selling point doesn’t really reasonate:
If you haven’t resolved conflicts then it probably doesn’t compile and of course tests won’t pass, so I don’t see any point in publishing that change? Maybe the commit is useful as a temporary state locally, but that seems of limited use?
Nowadays I’d ask a coding agent to figure out how to rebase a local branch to the latest published version before sending a pull request.
dcre 2 days ago |
This is a reasonable reaction — pretty sure I felt the same way when I heard about jujutsu's first-class conflicts[0] — but it turns out to be really useful not to be stuck inside an aberrant state while conflicts are in the process of being resolved.
[0]: https://docs.jj-vcs.dev/latest/conflicts/
dzaima a day ago |
In git if you, say, do some `git rebase -i`, edit some commit, continue the rebase, and hit a conflict, and realize you edited something wrong that caused the conflict, your only option is aborting the entire rebase and starting over and rebuilding all changes you did.
In jj, you just have a descending conflict, and if you edit the past to no longer conflict the conflict disappears; kinda as if you were always in interactive rebase but at all points have the knowledge of what future would look like if you `git rebase --continue`d.
Also really nice for reordering commits which can result in conflicts, but leaves descendants non-conflicting, allowing delaying resolving the conflicts after doing other stuff, or continuing doing some reordering instead of always starting from scratch as with `git rebase -i`.
WCSTombs 2 days ago |
For the conflicts, note that in Git you can do
git config --global merge.conflictstyle diff3
to get something like what is shown in the article.
NetOpWibby 15 hours ago |
This should be the default.
Nearly every time I see a complaint about git, someone comes through with a command like this. Is there a collection of similar tips that makes git better to use? If not, there should be.
mentalgear 2 days ago |
> [CRDT] This means merges don’t need to find a common ancestor or traverse the DAG. Two states go in, one state comes out, and it’s always correct.
Well, isn't that what the CRDT does in its own data structure ?
Also keep in mind that syntactic correctness doesn't mean functional correctness.
Retr0id 2 days ago |
Yes.
There are many ways to instantiate a CRDT, and a trivial one would be "last write wins" over the whole source tree state. LWW is obviously not what you'd want for source version control. It is "correct" per its own definition, but it is not useful.
Anyone saying "CRDTs solve this" without elaborating on the specifics of their CRDT is not saying very much at all.
mweidner a day ago |
You can think of the semantics (i.e., specification) of any CRDT as a function that inputs the operation history DAG and outputs the resulting user-facing state. However, algorithms and implementations usually have a more programmatic description, like "here is a function `(internal state, new operation) -> new internal state`", both for efficiency (update speed; storing less info than the full history) and because DAGs are hard to reason about. But you do see the function-of-history approach in the paper "Pure Operation-Based Replicated Data Types" [1].
[1] https://arxiv.org/abs/1710.04469
alunchbox 2 days ago |
Jujutsu honestly is the future IMO, it already does what you have outlined but solved in a different way with merges, it'll let you merge but outline you have conflicts that need to be resolved for instance.
It's been amazing watching it grow over the last few years.
aduwah 2 days ago |
The only reason I have not defaulted to jj already is the inability to be messy with it. Easy to make mistakes without "git add"
dzaima 2 days ago |
But you do have the op log, giving you a full copy of the log (incl. the contents of the workspace) at every operation, so you can get out of such mistakes with some finagling.
You can choose to have a workflow where you're never directly editing any commit to "gain back autonomy" of the working copy; and if you really want to, with some scripting, you can even emulate a staging area with a specially-formatted commit below the working copy commit.
llyama 2 days ago |
You can be messy. The lack of an explicit staging area doesn't restrict that. `jj commit` gives the same mental model for "I want to commit 2 files from the 5 I've changed".
nchmy a day ago |
You're mistaken. I'm an absolute version control slob. JJ allows me to continue like that yet also collaborate with others. It tracks literally everything so I can not only split, squash, and rebase things to wherever they need to be, but can also rollback/restore/recover anything from either the repo-wide oplog or revision-specific evolog
You really ought to dive in deeper. jjui makes it all vastly simpler
simonmic 13 hours ago |
You can turn off the auto-tracking, and add your files manually.
phtrivier 2 days ago |
A suggestion : is there any info to provide in diffs that is faster to parse than "left" and "right" ? Can the system have enough data to print "bob@foo.bar changed this" ?
dkdbejwi383 19 hours ago |
Or even just "the branch you're on" and "the branch being merged into yours"
lasgawe 2 days ago |
This is a really interesting and well thought out idea, especially the way it turns conflicts into something informative instead of blocking. The improved conflict display alone makes it much easier to understand what actually happened. I think using CRDTs to guarantee merges always succeed while still keeping useful history feels like a strong direction for version control. Looks like a solid concept!
a-dub 2 days ago |
doesn't the side by side view in github diff solve this?
conflict free merging sounds cool, but doesn't that just mean that that a human review step is replaced by "changes become intervals rather than collections of lines" and "last set of intervals always wins"? seems like it makes sense when the conflicts are resolved instantaneously during live editing but does it still make sense with one shot code merges over long intervals of time? today's systems are "get the patch right" and then "get the merge right"... can automatic intervalization be trusted?
edit: actually really interesting if you think about it. crdts have been proven with character at a time edits and use of the mouse select tool.... these are inherently intervalized (select) or easy (character at a time). how does it work for larger patches can have loads of small edits?
socalgal2 2 days ago |
> [CRDT] This means merges don’t need to find a common ancestor or traverse the DAG. Two states go in, one state comes out, and it’s always correct.
Funny, there was just a post a couple of days ago how this is false.
https://news.ycombinator.com/item?id=47359712
monster_truck 2 days ago |
Not this again
codemog 2 days ago |
Nobody should have these types of problems in the age of AI agents. This kind of clean up and grunt work is perfect for AI agents. We don’t need new abstractions.
twsted 2 days ago |
Version control systems are more important than ever with AI.
BlueHotDog2 2 days ago |
This is cool and i keep thinking about CRDTs as a baseline for version control, but CRDTs has some major issues, mainly the fact that most of them are strict and "magic" in the way they actually converge(like the joke: CRDTs always converge, but to what). i didn't read if he's using some special CRDT that might solve for that, but i think that for agentic work especially this is very interesting
bob1029 2 days ago |
I think there are still strong advantages to the centralized locking style of collaboration. The challenge is that it seems to work best in a setting where everyone is in the same physical location while they are working. You can break a lock in 30 seconds with your voice. Locking across time zones and date lines is a nonstarter by comparison.
fn-mote 2 days ago |
It seems like in a reasonable sized org you should not be merging so often that “centralized locking … across time zones” should be an issue.
Are people really merging that often? What is being merged? Doc fixes?
jsmith45 a day ago |
The file locking approach is one used by centralized version control systems, and are mostly used in the everybody commits directly to trunk style of development. In those environments merging isn't much of a thing. (Of course this style also comes with other challenges, especially around code review, as it means either people are constantly commit unreviewed code, or you develop some other system to pre-review code, which can slow down the speed of checking things in.)
This approach is actually fairly desirable for assets types that cannot be easily merged, like images, sounds, videos, etc. You seldom actually want multiple people working on any one file of those at the same time, as one or the other of their work will either be wasted or have to be re-done.
sibeliuss 2 days ago |
Why must everyone preprocess their blog posts with ChatGPT? It is such a disservice to ones ideas.
alansaber 20 hours ago |
Because writing is really hard.
newsoftheday 2 days ago |
OK, I'll stick with git.
nkmnz 2 days ago |
I don't quite understand how CRDTs should help with merges. The difficult thing about merges is not that two changes touch the same part of the code; the difficult thing is that two changes can touch different parts of the code and still break each other - right?
AceJohnny2 2 days ago |
Eh. It's a matter of visible pain vs invisible pain.
Developers are quite familiar with Merge Conflicts and the confusing UI that git (and SVN before it, in my experience) gives you about them. The "ours vs theirs" nomenclature which doesn't help, etc. This is something that VCSs can improve on, QED this post.
Vs the scenario you're describing (what I call Logical Conflicts), where two changes touching different parts of the code (so it doesn't emerge as a Merge Conflict) but still breaking each other. Like one change adding a function call in one file but another change changing the API in a different file.
These are painful in a different way, and not something that a simple text-based version control (which is all of the big ones) can even see.
Indeed, CRDTs do not help with Logical Conflicts.
nkmnz 13 hours ago |
Thank you for the clarification. I agree that the current state of the art to show conflicts _in the same part of the code_ is not sufficient, so any improvement with regard to that is welcome. Still, I'm more looking for solutions with the Logical Conflicts.
gavinhoward 2 days ago |
Bram Cohen is awesome, but this feels a little bare. I've put much more thought into version control ([1]), including the use of CRDTs (search for "# History Model" and read through the "Implementing CRDTs" section).
[1]: https://gavinhoward.com/uploads/designs/yore.md
AceJohnny2 2 days ago |
That's worth making a separate post! (and I recommend rendering it to HTML)
But "bare" is part of the value of Cohen's post, I think. When you want to publicize a paradigm shift, it helps to make it in small, digestible chunks.
63stack 2 days ago |
Is this the Bram Cohen who made bittorrent? There is surprisingly little information on this page.
vessenes 2 days ago |
Yes
esafak a day ago |
Yes, just look at his Github page.
MattCruikshank 2 days ago |
For anyone who thinks diff / merge should be better - try Beyond Compare from Scooter Software.
steveharing1 2 days ago |
Git is my first priority until or unless i see anything more robust than this one.
barrkel 2 days ago |
I don't really get the upside of focus on CRDTs.
The semantic problem with conflicts exists either way. You get a consistent outcome and a slightly better description of the conflict, but in a way that possibly interleaves changes, which I don't think is an improvement at all.
I am completely rebase-pilled. I believe merge commits should be avoided at all costs, every commit should be a fast forward commit, and a unit of work that can be rolled back in isolation. And also all commits should be small. Gitflow is an anti-pattern and should be avoided. Long-running branches are for patch releases, not for feature development.
I don't think this is the future of VCS.
Jujutsu (and Gerrit) solves a real git problem - multiple revisions of a change. That's one that creates pain in git when you have a chain of commits you need to rebase based on feedback.
gzread 2 days ago |
People see that CRDTs have no conflicts and proclaim them as the solution to all problems, not seeing that some problems inherently have conflicts and either can't be represented by CRDTs at all, or that the use of CRDTs resolves conflicts in a way that's worse than if you actually thought about conflict resolution. E.g. that multiplayer text editor that interleaved characters from simultaneous edits.
IgorPartola 2 days ago |
I used to use rebase much more than merge but have grown to be more nuanced over the years:
Merge commits from main into a feature branch are totally fine and easier to do than rebasing. After your feature branch is complete you can do one final main-to-feature-branch merge and then merge the feature branch into main with a squash commit.
When updating any branch from remote, I always do a pull rebase to avoid merge commits from a simple pull. This works well 99.99% of the time since what I have changed vs what the remote has changed is obvious to me.
When I work on a project with a dev branch I treat feature branches as coming off dev instead of main. In this case I merge dev into feature branches, then merge feature branches into dev via a squash commit, and then merge main into dev and dev into main as the final step. This way I have a few merge commits on dev and main but only when there is something like an emergency fix that happens on main.
The problem with always using a rebase is that you have to reconcile conflicts at every commit along the way instead of just the final result. That can be a lot more work for commits that will never actually be used to run the code and can in fact mess up your history. Think of it like this:
1. You create branch foo off main.
2. You make an emergency commit to main called X.
3. You create commits A, B, and C on foo to do your feature work. The feature is now complete.
4. You rebase foo off main and have to resolve the conflict introduced by X happening before A. Let’s say it conflicts with all three of your commits (A, B, and C).
5. You can now merge foo into main with it being a fast forward commit.
Notice that at no point will you want to run the codebase such that it has commits XA or XAB. You only want to run it as XABC. In fact you won’t even test if your code works in the state XA or XAB so there is little point in having those checkpoints. You care about three states: main before any of this happened since it was deployed like that, main + X since it was deployed like that, and main with XABC since you added a feature. git blame is really the only time you will ever possibly look at commits A and B individually and even then the utility of it is so limited it isn’t worth it.
The reality is that if you only want fast forward commits, chances are you are doing very little to go back and extract code out of old versions a of the codebase. You can tell this by asking yourself: “if I deleted all my git history from main and have just the current state + feature branches off it, will anything bad happen to my production system?” If not, you are not really doing most of what git can do (which is a good thing).
barrkel 2 days ago |
I am now wholly bought into the idea of having a feature branch with (A->B->C) commits is an anti-pattern.
Instead, if the feature doesn't work without the full chain of A+B+C, either the code introduced in A+B is orphaned except by tests and C joins it in; or (and preferably for a feature of any significance), A introduces a feature flag which disables it, and a subsequent commit D removes the feature flag, after it is turned on at a time separate to merge and deploy.
IgorPartola 2 days ago |
I treat each feature branch as my own personal playground. There should be zero reason for anyone to ever look at it. Sometimes they aren’t even pushed upstream. Otherwise, just work on main with linear history and feature flags and avoid all this complexity that way.
Just like you don’t expect someone else’s local codebase to always be in a fully working state since they are actively working on it, why do you expect their working branch to be in a working state?
DSMan195276 2 days ago |
I think you're somewhat missing the point - if the code from A and B only works if joined with C, then you should squash them all into one commit so that they can't be separated. If you do that then the problem you're describing goes away since you'll only be rebasing a single commit anyway.
Whether this is valuable is up to you, but IMO I'd say it's better practice than not. People do dumb things with the history and it's harder to do dumb things if the commits are self-contained. Additionally if a feature branch includes multiple commits + merges I'd much rather they squash that into a single commit (or a couple logical commits) instead of keeping what's likely a mess of a history anyway.
IgorPartola 2 days ago |
That is literally what I advocate you do for the main branch. A feature branch is allowed to have WIP commits that make sense for the developer working on the branch just like uncommitted code might not be self contained because it is WIP. Once the feature is complete, squash it into one commit and merge it into main. There is very little value to those WIP commits (rare case being when you implement algorithm X but then change to Y and later want to experiment with X again).
bathwaterpizza 2 days ago |
One downside of squash merging is that when you need to split your work across branches, so that they're different PRs, but one depends on the other, then you have to do a rebase after every single one which had dependencies is merged.
IgorPartola a day ago |
When that happens I essentially pick one of the branches as the trunk for that feature and squash merge into that, test it, then merge a clean history into main.
hackrmn 2 days ago |
When you say "unit of work", unit of _which_ work are you referring to? The problem with rebasing is that it takes one set of snapshots and replays them on top of another set, so you end up with two "equivalent" units of work. In fact they're _the same_ indeed -- the tree objects are shared, except that if by "work" you mean changes, Git is going to tell you two different histories, obviously.
This is in contrast with [Pijul](https://pijul.org) where changes are patches and are commutative -- you can apply an entire set and the result is supposed to be equivalent regardless of the order the patches are applied in. Now _that_ is unit of work" I understand can be applied and undone in "isolation".
Everything else is messy, in my eyes, but perhaps it's orderly to other people. I mean it would be nice if a software system defined with code could be expressed with a set of independent patches where each patch is "atomic" and a feature or a fix etc, to the degree it is possible. With Git, that's a near-impossibility _in the graph_ -- sure you can cherry-pick or rebase a set of commits that belong to a feature (normally on a feature branch), but _why_?
barrkel 2 days ago |
By "unit of work", I mean the atomic delta which can, on its own, become part of the deployable state of the software. The thing which has a Change-Id in Gerrit.
The delta is the important thing. Git is deficient in this respect; it doesn't model a delta. Git hashes identify the tip of a tree.
When you rebase, you ought to be rebasing the change, the unit of work, a thing with an identity separate and independent of where it is based from.
And this is something that the jujutsu / Gerrit model fixes.
josephg 2 days ago |
CRDTs should be able to give you better merge and rebase behaviour. They essentially make rebase and merge commits the same thing - just different views on a commit, and potentially different ways to present the conflict. CRDTs also behave better when commits get merged multiple times in complex graphs - you don’t run into the problem of commits conflicting with themselves.
You should also be able to roll back a single commit or chain of commits in a crdt pretty easily. It’s the same as the undo problem in collaborative editors - you just apply the inverse of the operation right after the change. And this would work with conflicts - say commits X and Y+Z conflict, and you’re in a conflicting state, you could just roll back commit Y which is the problem, while keeping X and Z. And at no point do you need to resolve the conflict first.
All this requires good tooling. But in general, CRDTs can store a superset of the data stored by git. And as a result, they can do all the same things and some new tricks.
sroussey 2 days ago |
In theory, maybe. In practice… last write wins (LWW) is a CFDT operator, so replace every mention of CRDT with LWW and issues will more obvious.
Really though, the problem with merges is not conflicts, it’s when the merged code is wrong but was correct on both sides before the merge. At least a conflict draws your attention.
When I had several large (smart but young) teams merging left and right this would come up and they never checked merged code.
Multiply by x100 for AI slop these days. And I see people merge away when the AI altered tests to suit the broken code.
josephg a day ago |
> In practice… last write wins (LWW) is a CFDT operator, so replace every mention of CRDT with LWW and issues will more obvious.
Yeah. A lot of people are also confused by the twin meanings of the word "conflict". The "C" in CRDT stands for "Conflict (free)", but that really means "failure free". Ie, given any two concurrent operations, there is a well defined "merge" of the two operations. The merge operation can't fail.
The second meaning is "conflict" as in "git commit conflict", where a merge gets marked as requiring human intervention.
Once you define the terms correctly, its possible to write a CRDT-with-commit-conflicts. Just define a "conflict marker" which are sometimes emitted when merging. Then merging can be defined to always succeed, sometimes emitting conflict markers along the way.
> Really though, the problem with merges is not conflicts, it’s when the merged code is wrong but was correct on both sides before the merge.
CRDTs have strictly more information about whats going on than Git does. At worst, we should be able to remake git on top of CRDTs. At best, we can improve the conflict semantics.
skydhash a day ago |
> CRDTs have strictly more information about whats going on than Git does. At worst, we should be able to remake git on top of CRDTs. At best, we can improve the conflict semantics.
That is a worthwhile goal, but remember that code is just a notation for some operation, it's not the operation itself (conducted by a processor). Just like a map is a description of a place, not the place itself. So semantics exists outside of it and you can't solve semantics issue with CRDTs.
As code is formal and structured, version control conflict is a signal, not a nuisance. It may be crude, but it's like a canari in a mine. It lets you know that someone has modified stuff you've worked on in your patch. And then it's up to you to resolve the probable semantics conflicts.
But even if you don't have conflicts, you should check your code after a synchronization as things you rely on may have changed since your last one.
afiori a day ago |
being able to customize the chunking/diffing process with something analogous to an lsp would greatly improve this. In my experience a particularly horribly handled case is when eg two branches add two distinct methods/functions in the same file location (especially if there is some boilerplate so that the two blocks share more than a few lines).
a language aware merge could instead produce
>>>> function foo(){ ... } ===== function bar(){ ... } <<<<<<
mtndew4brkfst 21 hours ago |
If you haven't heard of it yet, Mergiraf uses tree-sitter grammars to resolve merges using syntax-aware logic and has a pretty good success rate for my work.
ivlozada a day ago |
This is the key point. Once your data structure carries the full edit history instead of reconstructing it from DAG traversal, rebase and merge become different views of the same operation. Not fundamentally different operations with different failure modes.
The weave approach moves ordering into the data itself. That's the same insight that matters in any system that needs deterministic ordering across independent participants: put the truth in the structure, not in the topology of how it was assembled.
gf000 a day ago |
Afaik pijul already does that though
bawolff a day ago |
> Jujutsu (and Gerrit) solves a real git problem - multiple revisions of a change. That's one that creates pain in git when you have a chain of commits you need to rebase based on feedback.
I use gerrit extensively... well it does "solve" that problem, i think its far from an ideal solution. It becomes a mess once you have patches depending on other patches and you have to edit a patch somewhere in the stack, and it very much feels bolted on to git. Don't get me wrong, it works, but i think fresh new ideas on how to solve this problem are still needed.
catlifeonmars 2 days ago |
Can we stop using line-oriented diffs in favor of AST-oriented diffs?
Is it just lack of tooling, or is there something fundamentally better about line-oriented diffs that I’m missing? For the purpose of this question I’m considering line-oriented as a special case of AST-oriented where the AST is a list of lines (anticipating the response of how not all changes are syntactically meaningful or correct).
Aperocky 2 days ago |
Outside of the merit of the idea itself, I thought I was going to look at a repository at least as complete as Linus when he released git after 3 weeks, especially with the tooling we had today.
Slightly disappointed to see that it is a 470 line python file being touted as "future of version control". Plenty of things are good enough in 470 lines of python, even a merge conflict resolver on top of git - but it looks like it didn't want anything to do with git.
Prototyping is almost free these days, so not sure why we only have the barest of POC here.
ithkuil 2 days ago |
It clearly says in the article that this is just a demo
mtndew4brkfst 21 hours ago |
A demo as "the future of" something doesn't really resonate with me. It's like saying a melodic motif and a couple well-written lines of lyrics are "the future of" my new music career.
ithkuil 19 hours ago |
I guess it depends on how you parse "the future of X".
To me it sounds like "This is how I imagine the future of X", i.e. a preview of a possible future.
merlindru 2 days ago |
I recently found a project called sem[1] that does git diffs but is aware of the language itself, giving feedback like "function validateToken added", "variable xyzzy removed", ...
i think that's where version control is going. especially useful with agents and CI
[1] https://ataraxy-labs.github.io/sem/
lowbloodsugar 2 days ago |
Araxis merge. Four views. Theirs, ours, base and “what you did so far in this damned merge hell”.
echrisinger 2 days ago |
Has anyone considered a VCS that integrates more vertically with the source code through ASTs?
IE if I change something in my data model, that change & context could be surfaced with agentic tooling.
conartist6 a day ago |
That's me, with BABLR. Working on getting a beta release announced in the next few days.
aggregator-ios 2 days ago |
What CRDT's solve is conflicts at the system level. Not at the semantic level. 2 or more engineers setting a var to a different value cannot be handled by a CRDT.
Engineer A intended value = 1
Engineer B intended value = 2
CRDT picks 2
The outcome could be semantically wrong. It doesn't reflect the intent.
I think the primary issue with git and every other version control is the terrible names for everything. pull, push, merge, fast forward, stash, squash, rebase, theirs, ours, origin, upstream and that's just a subset. And the GUI's. They're all very confusing even to engineers who have been doing this for a decade. On top of this, conflict resolution is confusing because you don't have any prior warnings.
It would be incredibly useful if before you were about to edit a file, the version control system would warn you that someone else has made changes to it already or are actively working on it. In large teams, this sort of automation would reduce conflicts, as long as humans agree to not touch the same file. This would also reduce the amount of quality regressions that result from bad conflict resolutions.
Shameless self plug: I am trying to solve both issues with a simpler UI around git that automates some of this and it's free. https://www.satishmaha.com/BetterGit
jnsie 2 days ago |
> It would be incredibly useful if before you were about to edit a file, the version control system would warn you that someone else has made changes to it already or are actively working on it. In large teams, this sort of automation would reduce conflicts, as long as humans agree to not touch the same file. This would also reduce the amount of quality regressions that result from bad conflict resolutions.
Bringing me back to my VSS days (and I'd much rather you didn't)
aggregator-ios 2 days ago |
I knew I should have put a trigger warning, because I was thinking of this as I was typing it. Sorry!
mbfg 2 days ago |
well, the mismatch here is widened by the fact that almost everyone it seems uses git with a central, prominent, visible, remote repository. Where as git was developed with the a true distributed vision. Now sure that truely distributed thing only becomes final when it reaches some 'central' repo, but it's quite a big different than we all do.
j1elo 2 days ago |
For that you need a very centralized VCS, not a decentralized one. Perforce allows you to lock a file so everybody else cannot make edits to it. If they implemented more fine-grained locking within files, or added warnings to other users trying to check them out for edits, they'd be just where you want a VCS to be.
How, or better yet, why would Git warn you about a potential conflict beforehand, when the use case is that everyone has a local clone of the repo and might be driving it towards different directions? You are just supposed to pull commits from someone's local branch or push towards one, hence the wording. The fact that it makes sense to cooperate and work on the same direction, to avoid friction and pain, is just a natural accident that grows from the humans using it, but is not something ingrained in the design of the tool.
We're collectively just using Git for the silliest and simplest subset of its possibilities -a VCS with a central source of truth-, while bearing the burden of complexity that comes with a tool designed for distributed workloads.
sensanaty 2 days ago |
I haven't used them, but doesn't SVN or Mercurial do something like this? It blocks people from working on a file by locking them, the problem is that in large teams there are legitimate reasons for multiple people to be working on the same files, especially something like a large i18n file or whatever.
josephg 2 days ago |
> CRDT picks 2
They don’t have to.
The crdt library knows that value is in conflict, and it decides what to do about it. Most CRDTs are built for realtime collab editing, where picking an answer is an acceptable choice. But the crdt can instead add conflict marks and make the user decide.
Conflicts are harder for a crdt library to deal with - because you need to keep merging and growing a conflict range. And do that in a way that converges no matter the order of operations you visit. But it’s a very tractable problem - someone’s just gotta figure out the semantics of conflicts in a consistent way and code it up. And put a decent UI on top.
senfiaj a day ago |
Yeah, same thoughts. I also think semantic merge is the best. Also it would be nice if you could add a plugin for custom binary file formats, such as sqlite (which obviously can't be merged like a text file).
injidup 2 days ago |
I'm confused about what this solves. They give the example of someone editing a function and someone deleting the same function and claim that the merge never fails and then go on to demonstrate that indeed rightly the merges still fails. There are still merge markers in the sources. What is the improvement exactly?
galkk 2 days ago |
Yeah, the author fails to present his case even in the intro
> A CRDT merge always succeeds by definition, so there are no conflicts in the traditional sense — the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails. This project works that out.
It has clear contradiction. Crdt always succeed by definition, no conflicts in traditional sense so (rephrasing) conflicting changes are marked as conflicted. Emm, like in any other source control?
In fact, after rereading that intro while writing that answer I start suspect at least smell of an ai writing.
fleebee 2 days ago |
The README of the repo offers a hint:
> The code in this project was written artisanally. This README was not.
josephg 2 days ago |
The benefit of using a crdt for this is that you can get better merge semantics. Rebase and merge become the same thing. Commits can’t somehow conflict with themselves. You can have the system handle 2 non conflicting changes on the same line of code if you want. You can keep the system in a conflict state and add more changes if you want to. Or undo just a single commit from a long time ago. And you can put non text data in an crdt and have all the same merge and branching functionality.
ballsweat 2 days ago |
Everyone should vibe code a VCS from scratch in their fave language.
It’s an awesome weekend project, you can have fun visualizing commits in different ways (I’m experimenting with shaders), and importantly:
This is the way forward. So much software is a wrapper around S3 etc. now is your chance to make your own toolset.
I imagine this appeals more to DIYer types (I use Pulsar IDE lol)
braidedpubes 2 days ago |
Do I have it right that it’s basically timestamp based, except not based on our clocks but one it manages itself?
So as long as all updates have been sent to the server from all clients, it will know what “time” each character changed and be able to merge automatically.
Is that it basically?
shitfilleddonut 2 days ago |
It seems more like the past of version control
EGreg 2 days ago |
I remember I met Bram Cohen (of Bittorent fame!) around 15 years ago. Around that time is when I had started building web-based distributed collaborative systems, starting with Qbix.com and then spun off a company to build blockchain-based smart contracts through Intercoin.org etc.
Anyway, I wanted to suggest a radical idea based on my experience:
Merges are the wrong primitive.
What organizations (whethr centralized or distributed projects) might actually need is:
1) Graph Database - of Streams and Relations
2) Governance per Stream - eg ACLs
A code base should be automatically turned into a graph database (functions calling other functions, accessing configs etc) so we know exactly what affects what.
The concept of what is “too near” each other mentioned in the article is not necessarily what leads to conflicts. Conflicts actually happen due to conflicting graph topology and propagating changes.
People should be able to clone some stream (with permission) and each stream (node in the graph) can be versioned.
Forking should happen into workspaces. Workspaces can be GOVERNED. Publishing some version of a stream just means relating it to your stream. Some people might publish one version, others another.
Rebasing is a first-class primitive, rather than a form of merging. A merge is an extremely privileged operation from a governance point of view, where some actor can just “push” (or “merge”) thousands of commits. The more commits, the more chance of conflicts.
The same problem occurs with CRDTs. I like CRDTs, but reconciling a big netsplit will result in merging strategies that create lots of unintended semantic side effects.
Instead, what if each individual stream was guarded by policies, there was a rate limit of changes, and people / AIs rejected most proposals. But occasionally they allow it with M of N sign offs.
Think of chatgpt chats that are used to modify evolving artifacts. People and bots working together. The artifacts are streams. And yes, this can even be done for codebases. It isnt about how “near” things are in a file. Rather it is about whether there is a conflict on a graph. When I modify a specific function or variable, the system knows all of its callers downstream. This is true for many other things besides coding too. We can also have AI workflows running 24/7 to try out experiments as a swarm in sandboxes, generate tests and commit the results that pass. But ultimately, each organization determines whether they want to rebase their stream relations to the next version of something or not.
That is what I’m building now with https://safebots.ai
PS: if anyone is interested in this kind of stuff, feel free to schedule a calendly meeting w me on that site. I just got started recently, but I’m dogfooding my own setup and using AI swarms which accelerates the work tremendously.
theknarf 2 days ago |
You can't use CRDTs for version control, having conflicts is the whole point of version control. Sometimes two developers will make changes that fundamentally tries to change the code in two different ways, a merge conflict then leaves it up to the developer who is merging/rebasing to make a choice about the semantics of the program they want to keep. A CRDT would just produce garbage code, its fundamentally the wrong solution. If you want better developer UX for merge conflicts then there are both a bunch of tooling on top of Git, as well as other version control systems, that try to present it in a better way; but that has very little to do with the underlaying datastructure. The very fact that cherry-picking and reverting becomes difficult with this approach should show you that its the wrong approach! Those are really easy operations to do in Git.
tbrownaw 2 days ago |
> You can't use CRDTs for version control
You misunderstand what is being proposed.
Using CRDTs to calculate the results of a merge does not require being allowed to commit the results of that calculation, and doesn't even require that you be able to physically realize the results in the files in your working copy.
.
Consider for example if you want to track and merge scalar values. Maybe file names if you track renames, maybe file properties if you're not just using a text listing (ie .gitattributes) for that, maybe file content hash to decide whether to actually bother running a line-based merge.
One approach is to use what Wikipedia says is called an OR-set[1], with the restriction that a commit can only have a single unique value; if it was previously in the set then it keeps all the same tags, if it wasn't then it gets a new tag.
That restriction is where the necessity of conflict resolution comes in. It doesn't have to be part of the underlying algorithms, just the interface with the outside world.
[1] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...
Retr0id a day ago |
If semantics-layer conflicts still have to be detected somehow, and resolved by hand, what value is the underlying CRDT providing?
furyofantares a day ago |
> the result is always the same no matter what order branches are merged in — including many branches mashed together by multiple people working independently.
Retr0id a day ago |
Why is that valuable?
zephen a day ago |
Yeah, symmetry is overrated.
Git's merge is already symmetrical for two branches being merged, and that, in and of itself, often leads to problems.
It's completely unclear that extending this to multiple branches would provide any goodness.
drfloyd51 a day ago |
It means anyone can fix the conflict. Including a server side AI.
Retr0id a day ago |
non sequitur
SpaceNoodled 17 hours ago |
I was wondering when someone would try to cram The Slop Machine into this.
mememememememo a day ago |
Syntax-only manual merges are just time consuming waste of human time.
I'm a small PR-er so 99% of the time it is Syntax. If if it is semnatic at all often then try trunk based development.
Retr0id a day ago |
The CRDT isn't syntax-aware either
mememememememo a day ago |
If it is doing what I think CRDT does, and tracking where the user clicked and what they typed, it sort of carries a bit more syntax info. It has a chance to get it right. And often since it does something it turns a figure this shit out with review this.
vlovich123 a day ago |
Much better is AST merging (which itself is also more amenable to crdts). But doing this at the text level is doing to be a failed experiment - the value isn’t there.
sigbottle 19 hours ago |
This was my immediate thought when seeing this post too lol.
mememememememo a day ago |
But merging already auto-merges what it can best effort. Conflicts are syntax conflicts not semantic ones.
Therefore you could have automerges that conflict in a way that breaks the code.
Example would be define a global constant in file X. One commit removes it. Another commit on another branch makes use of it in file Y.
OTOH where I get merge conflicts in Git it is usually purely syntax issue that could be solved by a slightly cleverer merge algo. CRDT or semantic merge.
toomim a day ago |
Pijul does both. It's a VCS, that is a CRDT, that preserves conflicts until a human resolves them.
Look it up: https://pijul.org
It also makes cherrypicking and rebasing wayyyy easier. You can actually add or remove any set of patches, at any time, on any peer. It's a dramatic model shift -- and is awesome.
shubhamjain a day ago |
I would like to point out that Bram Cohen seems to be obsessed with “better merges” and had a verbal spat with Linus on Git when it was just taking off (2007).
https://news.ycombinator.com/item?id=8118817
It’s pretty weird that he has gone back to the same idea without understanding why Git’s approach is better. I would say VCS is largely a solved problem. You can simplify a few things here and there, maybe improve support for binaries and few other things, but that’s almost on the top of existing systems. The foundation is rock solid, so it doesn’t sound very sensible to attempt something from ground up.
masklinn a day ago |
Both jj and pijul save (~commit) conflicts to be resolved later, rather than require immediate resolution.
And jj was built around rebase being a routine operation, often transparent (cherrypicking being a form of rebasing).
astrostl 2 days ago |
Disagree. We all are — or should be — Linux kernel developers. What's more, we should align to a specific and singular VCS worldview informed by BitKeeper, which no longer exists, whether or not we used it. Therefore Git. Thank you for your attention to this matter!
simultsop 2 days ago |
You sound more like a DOS dev instead of linux.
Ferret7446 a day ago |
I'm pretty sure jujutsu already does this, and it's interopable with git
nailer a day ago |
You’re still treating code as text which it isn’t (in the same way you wouldn’t treat JSON as text) it’s actually more like an AST.
Jujutsi (jj) does that. And it’s git compatible.
dcre a day ago |
I love jj and think virtually every git user should switch to it, but I don't think it treats text as an AST. What do you mean?
nailer a day ago |
Ugh you're right. There was a new AST-based version control system that came out a month ago and I couldn't remember the name. I asked an LLM what the name was and repeated the answer the LLM gave me without checking (facepalm).
I may have been thinking of https://github.com/gritzko/librdx/tree/master/be
WolfeReader 18 hours ago |
Thanks for the correct link! AST-based version control sounds like a great idea.
12_throw_away a day ago |
No, it doesn't.
nailer 21 hours ago |
https://news.ycombinator.com/item?id=47488270
mweidner a day ago |
I'm surprised to see the emphasis on tracking lines of text, which ties in to the complexity of merge vs merge-the-other-way vs rebase. If we are committed to enhancing the change history, it seems wiser to go all in and store high-level, semantically-meaningful changes, like "move this code into an `if` block and add `else` block ...".
Consider the first example in the readme, "Left deletes the entire function [calculate]. Right adds a logging line in the middle". If you store the left operation as "delete function calculate<unique identifier>" and the right operation as "add line ... to function calculate", then it's obvious how to get the intended result (calculate is completely deleted), regardless of how you order these operations.
I personally think of version control's job not as collaborating on the actual files, but as collaborating on the canonical order of (high-level) operations on those files. This is what a branch is; merge/rebase/cherry-pick are ways of updating a branch's operation order, and you fix a conflict by adding new operations on top. (Though I argue rebase makes the most sense in this model: your end goal is to append to the main branch.)
Once you have high-level operations, you can start adding high-level conflict markers like "this operation changed the docs for function foo; flag a conflict on any new calls to foo". Note that you will need to remember some info about operations' original context (not just their eventual order in the main branch) to surface these conflicts.
philwelch a day ago |
Regardless of the merits of CRDT's, I'm just glad someone is finally trying to create a new version control system. Everyone loves to complain about Git but nobody's actually tried to move beyond it in two decades.
Nursie a day ago |
There are those of us who remember the before-times, who I think are in general just happy we have git.
Having lived through sccs, pvcs, SourceSafe, Clearcase and svn (among others), the introduction of lightweight, sane branching, merging, rebasing etc was a revelation.
Yes, there are still things that an adept could do with some of those other systems that git doesn't make easy. For example we have the holy war between those who demand a git repo has a clean history vs those who would rather a revision control system actually stores revision history and forms a record of what really happened. In Rational ClearCase you would use a different config specification depending on your task to programatically select visibility, and hey presto, you have both views available.
(Not that I would wish ClearCase on my worst enemy these days, those config specs were a language in themselves and the amount of times people would get in trouble with them was a real drag, and that's only one of the myriad downsides.)
Then git came along and did away with so much of that complexity that I imagine there are legions of us who think it's good enough (TM) that version control is more or less a solved problem and nothing irks us enough to seek out alternatives.
philwelch 18 hours ago |
I can definitely relate to that, and I'm actually quite fond of git myself. I just never expected that people would stop trying to make a better version control system. That's not entirely fair, a lot of work has been invested in Pijul and Fossil, but I would have expected much more interest in alternatives. I remember all of the reasons Git was so new and exciting at the time, but now it feels like we're all locked into it, not out of any deliberate attempt at vendor lockin but out of complacency, and because fewer and fewer of us still remember a time that it felt possible to invent a better version control system.
Nursie 6 hours ago |
It's definitely ruined one of my favourite interview questions. I used to ask candidates what their preferred source control system was and why.
I wanted them to demonstrate that they'd thought about it and had some insight rather than give a specific answer. But now it's just "... uh, git, I've only really used git", and you can't blame anyone for that because it's so dominant now. I've updated my question to be about branching strategies but I'm just not as fond of it.
Well, you've inspired me to have a look more into the competition. I don't think git is perfect, I think it's just good enough and it'll take quite a lot to shift people over to something else now we have the whole github network effect. But hey, no harm in looking around :)
KPGv2 a day ago |
I've tested out jj a bit, and doesn't it solve the issues presented at the link already? I don't work on a team where I need VC better than git, so I just stick with it for my own private use, but I did test jj out of curiosity, and I could've sworn this is basically the same pitch as switching to jj (but for the CRDT under the hood).
danpalmer a day ago |
I'm struggling to understand the problem this solves for me. I can see in the abstract why this might be useful, but in practice I don't see the problems.
For me, jj represents a massive step forward from git in terms of usability, usefulness, and solving problems I actually have.
I think the next step forward for version control would be something that works at a lower level, such as the AST. I'd love to see an exploration of what versioning looks like when we don't have files and directories, and a piece of software is one whole tree that can be edited at any level. Things like LightTable and Dark have tried bits of this, it would be good to see a VCS demo of that sort of thing.
conartist6 a day ago |
It is coming. Had to build a whole new system of parsers.
pbw a day ago |
This sounds good, but I wonder if AI has changed the calculus on conflict resolution. It can not only chase down the conflicting changes, but also read those commit messages and PRs to divine intent. It might be that git is "good enough," given we have AI.
periodjet a day ago |
Does this mean he’s given up on Chia?
ReaLNero a day ago |
Thank you. This sounds like a great approach! I’ve also been surprised at the mechanical nature that git resolves conflicts in, and the loss of intent when auto-merging.
Androider a day ago |
I've had really good success lately with having Claude Code resolve conflicts, to the point that I don't see myself doing manual resolutions going forward.
Set git.conflictStyle to zdiff3 and ask Claude to resolve the conflict, or even better, complete the entire rebase for you. A quick diff sanity check against the merge base of the result takes just a few seconds.
mememememememo a day ago |
Thanks literally just used this!
suralind a day ago |
Guys, there’s a tool called mergiraf[1] that does wonders. I don’t remember my last rebase
[1]: https://mergiraf.org/
senfiaj a day ago |
Yeah, I also thought that a semantic merge is the best solution. It would be nice if it could be extended with custom formats such as sqlite.
csomar a day ago |
I am working on merge conflicts tool[1], so this area is of interest to me. But I fail to see the points of the author. In the first example he gave, git will actually give you three blobs: our, their and ancestor. The ancestor should have the missing information from his example and using code diffs[2], you can see what happened at each blob. Essentially, his blob is a single view of the 3 blobs merged together. Could be useful on the terminal, but if you are using a visual tool, a 3-way diff is always better.
> merges never fail
I am not sure what never fail means here.
> Conflicts are informative, not blocking. The merge always produces a result.
What does this even mean? You merge first and review later? And then other contributor just build on top of your main branch as you decided you want to change your selection?
If you want a smarter merge conflict tool, the one I am enthusiastic about today is Mergiraf: https://codeberg.org/mergiraf/mergiraf
1: https://codeinput.com/products/merge-conflicts 2: https://codeinput.com/products/merge-conflicts/demo
donatj a day ago |
> One idea I’m particularly excited about: rebase doesn’t have to destroy history.
I guess I don't understand why not just merge at that point? The point of rebadge is to destroy history...
modeless a day ago |
> the key insight is that changes should be flagged as conflicting when they touch each other
Not really. Changes should be flagged as conflicting when they conflict semantically, not when they touch the same lines. A rename of a variable shouldn't conflict with a refactor that touches the same lines, and a change that renames a function should conflict with a change that uses the function's old name in a new place. I don't think I would bother switching to a new VCS that didn't provide some kind of semantic understanding like this.

chungy a day ago |

The merge conflict syntax and "doesn't destroy history" both sound exactly like what Fossil does. (Fossil is at https://fossil-scm.org/)

Just a trivial example here:

    lorem ipsum
    <<<<<<< BEGIN MERGE CONFLICT: local copy shown first <<<<<<<<<<<< (line 2)
    dolor sit amet,
    ####### SUGGESTED CONFLICT RESOLUTION follows ###################
    consectetur adipiscing elit
    ||||||| COMMON ANCESTOR content follows ||||||||||||||||||||||||| (line 2)
    ======= MERGED IN content follows =============================== (line 2)
    consectetur adipiscing elit
    >>>>>>> END MERGE CONFLICT >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

qsera a day ago |
I think this was something that was waiting for something like LLMs to happen to be solved.
Why aren't AI companies touting "zero-shot"ing huge merge conflicts being resolved by LLMs..
adastra22 a day ago |
Have you tried this? It would be a nightmare. The LLM wold "solve" the merge conflict by eliminating the code.
qsera a day ago |
But that should make the builds or some tests to fail no? I think we can consider it as a first approximation and proceed to a manual review. It should be a lot easier than manually resolving the difficult merge..
adastra22 20 hours ago |
That’s solved by disabling or removing those tests.
I have tried this. It does’t work.

toomim a day ago |

CRDTs actually have a long history in version control.

  - The original 1977 version control system, SCCS, was a CRDT: https://braid.org/meeting-60/sccs-is-a-time-collapse

  - It called its data structure a 'weave"

  - Brahm's old project "Codeville" used a weave for version control

  - But then git blew up in popularity.

  - The project "DARCS" tried to make a robust "theory of patches," and eventually led to the development of Pijul

  - Pijul is a VCS that is a CRDT: https://pijul.org

jes5199 a day ago |
This is just CRDT merges and better diffs?? I think the future of version control is much, much weirder than this. Like if you have CRDTs why not have ephemeral branches with real-time collaborative editing and live CI as you type
forrestthewoods a day ago |
I hate Git. I think it is mediocre at absolute best.
But nothing in this article is in my top 10. So this doesn’t really do anything for me.
All I really want is support for terabytes scale repo history with super fast, efficient, sparse virtual clones. And ideally a global cache for binary blobs with copy-on-write semantics. Which is another way to say I want support for large binary files, and no GitLFS is not sufficient.
latand666 a day ago |
This sounds very promising, liked the idea
Is there a CLI like git CLI? Have read the readme but didn’t quite get how to use
latand666 a day ago |
Have you also thought about multiple staging levels/layers? Sometimes I don’t want to commit the work, just to review the AI work piece by piece, moving to the next staging level.
I guess I can do that with plain commits as well and then rewrite history when I need, but I think having multiple staging levels would be more developer friendly especially in the era of AI coding
QuiCasseRien a day ago |
I will look at it, it seems interesting. However, i hope a better ending than Pyjul (https://pijul.org/).
I'm no longer waiting for it whereas everything sound awesome : quite no more merge conflict and patches order free.
So sad it's still not production ready.
PunchyHamster a day ago |
At this point if your VCS isn't a layer above git plumbing, nobody gonna waste time using it. Especially if the improvements are minor enough that it could be reasonably just a wrapper and still have 90% of the improvements.
> Two opaque blobs. You have to mentally reconstruct what actually happened.
Did you not discover what git diff does ? It's clearer than the presented improvement !
Plenty of 3 way merge tools supported by git too, sure, it's external tool but it's adding one tool rather than upending the workflow
> Conflicts are informative, not blocking. The merge always produces a result. Conflicts are surfaced for review when concurrent edits happen “too near” each other, but they never block the merge itself. And because the algorithm tracks what each side did rather than just showing the two outcomes, the conflict presentation is genuinely useful.
Git merge cache (git rerere) is good enough. Only problem is that it isn't shared but that could be possibly done within git format itself if someone really wanted to
gavinhoward 14 hours ago |
> At this point if your VCS isn't a layer above git plumbing, nobody gonna waste time using it.
Probably true, but it's a shame because there are better ways of storing and processing the data, ways that natively handle binary files, semantics, and large files without falling over.
teeray a day ago |
> Conventional rebase creates a fictional history where your commits happened on top of the latest main
This is not fiction though. If someone added a param to the functions you’re modifying on your branch, rebasing forces you to resolve that conflict and makes the dependency on that explicit and obvious.
alkonaut a day ago |
The most infuriating part about git's default behavior is that it's so ignorant about what actual reality users live in.
For example: when merging or rebasing it's really important to know what I did myself, vs what someone else did. Yet it has a really opaque left/right or mine/theirs representation which even switches meaning depending on the operation you are doing.
This isn't even a fundamental diff/patch issue it's just that git shrugs and assumes you want to perform some abstract operation on a DAG of things rather than, you know, rebase your code onto that of your colleagues.
ata-sesli a day ago |
The core separation line here seems to be Snapshot vs. Weave. Git treats history as a path between states, but Manyana treats the state as the history.
Since the weave grows with every line ever written, how do you handle "tombstone" (deleted data) bloat? In a decade-old repo with high churn, does the metadata overhead for a single file eventually make it unmanageable compared to Git’s "forgetful" snapshotting?
gitmwnkdkc a day ago |
People are still having a problem with distributed version control, because some people want to force ”the server’s” history down the throats of all coworkers.
This can not be solved with tech, it’s a people problem.
Conflicts between branches is only a symptom of conflicts between people. Some want individual freedom to manage branches in whatever way (and these people are usually very open to other people managing branches in another way), but some people are against this freedom and thinks branches should be managed centrally by an authority (such people usually have a problem working on their own).
guytv a day ago |
I just let claude do all conflict resolution for me. code conflicts are a solved problem.
mpalmer 21 hours ago |
This sounds more like the present of version control in the form of Jujutsu, which already supports 3-way merges and history-preserving rebases, and does so with data structures that are trivially representable in a standard git repo.
> What it is is a proof that CRDT-based version control can handle the hard UX problems and come out with better answers than the tools we’re all using today — and a coherent design for building the real thing.
No such thing is "proven". You have not proven superiority to the state of the art with 400 lines of Python.
The LLM you used to draft this blog post is making your solution out to be far more than it is at this point.
gigatexal 21 hours ago |
how are CRDTs magic wrt to merges and conflicts?
im happy to see new entrants into the space. I think I'm a git beginner+. I know enough to be productive and am no longer fearful of it... and I even train up my coworkers on it and help them out of binds... but I am not an expert by any means and still sometimes resort to radical things to get me through some problem I've put myself into.
metmac 19 hours ago |
https://loro.dev/
Relevant. Loro a lovely CRDT library, explored implementing VCS semantics with CRDTs.
m12k 19 hours ago |
I used to think the future of version control was semantic: E.g. I renamed a method, while someone else concurrently added another call to that (now differently named) method. Git doesn't catch this, nor would this new system. The solution seems obvious to a human: Use the new name at the new call-site too. But it requires operating at the level of the semantic meaning of a change, and not just the dumb textual changes. I used to think this would require a new version control system that encodes the semantics of the changes in the commits, in order to have them available at merge-time. But these days, it seems much more realistic to stick to git, but loop in LLMs when merging, to re-create the semantics from the textual changes.
bilekas 19 hours ago |
This is more than just a version control though, they only thing any VC uses that's important is the diff and the timestamps, you would be adding in project context awareness which is a whole other thing.
I'm sure there are smarter people than me who could create some hooks to automagically update those references on merge/rebase though. Not sure I would pay a whole LLM each time personally.
WolfeReader 18 hours ago |
Darcs did this decades ago with the "replace" command. It's not a legitimate semantic replacement, though - it's more just telling your VCS to do a find/replace.
gnufx 13 hours ago |
As far as I remember, that's just because only the find/replace was implemented, and it could have more sophisticated (semantic?) features.
cush 15 hours ago |
> I renamed a method, while someone else concurrently added another call to that
This is the most common use case for any compiler or linter
blueplanet200 18 hours ago |
Reminds me of this article from way back, "A look back: Bram Cohen vs Linus Torvalds"
Of significance here because the resolution strategy from merges was deeply at the disagreement between Bram and Linus.
https://web.archive.org/web/20110728005409/http://www.wincen...
jancsika 15 hours ago |
It's worth reflecting on just how many VCS data (or pain) points Linus had ingested right before writing git. Probably more than anyone in FOSS outside of maybe a few Debian devs. Add to that his experience successfully using Bitkeeper prior to that and you can easily see why git is where it is in 2026.
Given a large enough amount of data/pain, designing/optimizing to attack specific, known pain points always beats trying to solving a more general problem elegantly. I mean, kudos to whoever decided Zoom clients with shoddy connections should buffer then race back to realtime at 1.x - 2x speed (can't remember exactly how fast it goes-- perhaps it's dynamic?). One could come up with 1000 toy examples of where that breaks (music lesson, drama class, etc.), or just implement it and save a gazillion people gazillion hours of repeating themselves in boring meetings.
Edit: clarification
tinfoilcondom 17 hours ago |
Why are some projects like “artifact” marked Dead here while others like “fossil” are promoted in the comments?
What counts as advertise vs spam? They seem like nearly identical posts and both projects really exist, separate authors.
Why are random posts marked Dead on this platform? Seems like outright censorship
itsnexis 17 hours ago |
The insight about rebase is underappreciated. The reason rebase is "considered harmful" in many teams isn't just that it rewrites history — it's that it makes history a lie that eventually collapses under its own weight in large teams. The CRDT weave approach is clever because it separates two things that git conflates: the logical order you want the commits to appear in (narrative history) vs. the actual causal order things happened (real history). Keeping both is strictly better. The conflict marker improvement alone would be worth the migration cost for many teams. Two opaque blobs with no attribution is genuinely one of the worst UX failures in developer tooling.
cweagans 17 hours ago |
> it makes history a lie that eventually collapses under its own weight in large teams
Can you please elaborate on this? I've seen this argument from others as well, but nobody has ever been able to articulate what that actually looks like and why rebasing branches specifically is to blame.
My perspective: whatever happens to the commit history on your non-`main` branch is your business. I don't care about the specifics until your work is merged into a shared branch that we all understand to be the canonical representation of the software we're working on.
idoubtit 15 hours ago |
I'm not the GP, but I've seen "rebase lies" in the wild.
Suppose a file contains a list of unique strings, one by line. A commit on a feature branch adds an element to the list. Later on, the branch is rebased on the main branch and pushed.
But the main branch had added the same element at another position in the list. Since there was a wide gap between the two positions, there was no conflict in Git's rebase. So the commit in the feature branch breaks the unicity constraint of the list.
For someone that pulled the feature branch, the commit seems stupid. But initial commit was fine, and the final (rebased) commit is a lie: nobody created a duplicate item.
cweagans 15 hours ago |
Thanks for that. I'm definitely familiar with that kind of situation, but what I'm not seeing is how that leads to history "collapsing under its own weight" in larger teams. That seems like a relatively straightforward rebase error that is easily corrected. (Also, if it is important for that list to only include unique items and you were able to merge it anyway, maybe that also reveals a gap in the test suite?)
cxr 15 hours ago |
> Can you please elaborate on this?
You're replying to an LLM-powered comment generator.
brahimmami 17 hours ago |
Brahim 1462163579
cush 15 hours ago |
The single trivial example is not convincing
r3c0nc1l3r 14 hours ago |
This is a cool idea!
These days, I think that all new version control solutions now have to be examined in the light of how well they work with with coding agents.
In that light, the CRDT merging here is interesting, as it allows history preservation in scenarios that would otherwise be destructive. This way, agents can use worktrees with much less hassle as squash, rebase, merge become more straightforward.
Koshkin 13 hours ago |
What's currently missing from the automatic conflict resolution is intelligence. The AI doing merges is the future.
mdnahas 9 hours ago |
This is a bad idea. I spent a lot of time thinking about git’s snapshot system vs. merge-based system that were promoted by functional programming fans. Auto merging systems are bad for a good reason: because we care about features, which are a property of snapshots not diffs.
If you have a diff that adds a button and a diff that turns existing button blue, the merge of those diffs doesn’t add a button and have all button blue. Because it may not make the new button blue.
Features like “all buttons are blue” are properties of snapshots. Snapshot based revision control, like git, it better for that reason.