LLVM-MOS – Clang LLVM fork targeting the 6502

147 points by jdmoreira 6 days ago | 78 comments

mtklein 6 days ago |
This was a nice surprise when learning to code for NES, that I could write pretty much normal C and have it work on the 6502. A lot of tutorials warn you, "prepare for weird code" and this pretty much moots that.
cmrdporcupine 6 days ago |
It's been amazing to see the progress on this project over the last 5 years. As someone who poked around looking at the feasibility of this myself, and gave up thinking it'd never be practical, I'm super happy to see how far they've gotten.
Maybe someday the 65816 target will get out there, a challenge in itself.
jacquesm 5 days ago |
Instead of the 65816 we got the ARM, which I think was the better thing to happen in the longer term.
self_awareness 6 days ago |
Rust fork that works on this LLVM fork, for 6502, genering code that can be executed on a Commodore-64: https://github.com/mrk-its/rust-mos
michalpleban 6 days ago |
How does it compare to cc65 with regard to code size and speed?
asiekierka 6 days ago |
Here's a benchmark of all modern 6502 C compilers: https://thred.github.io/c-bench-64/ - do note that binary sizes also include the size of the standard libraries, which means it is not a full picture of the code generation density of the compilers themselves.
michalpleban 5 days ago |
Thank you, that's really helpful.
gregsadetsky 6 days ago |
I don't know this world well (I know what llvm is) but - does anyone know why this was made as a fork vs. contributing to llvm? I suppose it's harder to contribute code to the real llvm..?
Thanks
jjmarr 5 days ago |
LLVM has very high quality standards in my experience. Much higher than I've ever had even at work. It might be a challenge to get this upstreamed.
LLVM is also very modular which makes it easy to maintain forks for a specific backend that don't touch core functionality.
gregsadetsky 5 days ago |
Super interesting, thanks. I specifically thought that its modular aspect made it possible to just "load" architectures or parsers as ... "plugins"
But I'm sure it's more complicated than that. :-)
Thanks again
zozbot234 5 days ago |
LLVM backends are indeed modular, and the LLVM project does allow for experimental backends. Some of the custom optimization passes introduced by this MOS backend are also of broader interest for the project, especially the automated static allocation for provably non-reentrant functions, which might turn out to be highly applicable to GPU-targeting backends.
It would be interesting to also have a viable backend for the Z80 architecture, which also seems to have a highly interested community of potential maintainers.
codebje 5 days ago |
https://github.com/jacobly0/llvm-project
... but now three years out of date, because it's hard to maintain :-)
codebje 5 days ago |
My experience is that while LLVM is very modular, it also has a pretty high amount of change in the boundaries, both in where they're drawn and in the interfaces between them. Maintaining a fork of LLVM with a new back-end is very hard.
jjmarr 5 days ago |
I know my company (AMD) maintains an llvm fork for ROCm. YMMV.
codebje 5 days ago |
I should have qualified: it's hard to do for an individual or very small team as a passion side-project. It's pretty time consuming to keep up with the rate of change in LLVM.
ahartmetz 5 days ago |
Do you know why it's a fork? Als, from this https://github.com/ROCm/llvm-project/commits/amd-staging/ it looks like it might be more appropriately called a staging branch than a fork.
jjmarr 5 days ago |
Various reasons, like embargoes on information, stuff we didn't want to wait for review on before shipping, or features that don't make sense for upstream like `hipcc` which is an `nvcc` wrapper.
Our goal is to get most modifications not in the third category into upstream at some point which makes the maintenance load bearable.
Sharlin 5 days ago |
Pretty sure that the prospects of successfully pitching the LLVM upstream to include a 6502 (or any 8/16-bit arch) backend are only slightly better than a snowball’s chances in hell.
alexrp 5 days ago |
Worth noting that LLVM has AVR and MSP430 backends, so there's no particular resistance to 8-bit/16-bit targets.
Sharlin 5 days ago |
Oh, thanks for the correction. I couldn’t find a conprehensive list of backends (which is weird) and the lists I did find only included 16+ bit targets.
weinzierl 5 days ago |
These processors were very very different from what we have today.
They usually only had a single general purpose register (plus some helpers). Registers were 8-bit but addresses (pointers) were 16-bit. Memory was highly non-uniform, with (fast) SRAM, DRAM and (slow) ROM all in one single address space. Instructions often involved RAM directly and there were a plethora of complicated addressing modes.
Partly this was because there was no big gap between processing speed and memory access, but this makes it very unlikely that similar architectures will ever come back.
As interesting as experiments like LLVM-MOS are, they would not be a good fit for upstream LLVM.
zozbot234 5 days ago |
> ... there was no big gap between processing speed and memory access, but this makes it very unlikely that similar architectures will ever come back. ...
Don't think "memory access" (i.e. RAM), think "accessing generic (addressable) scratchpad storage" as a viable alternative to both low-level cache and a conventional register file. This is not too different from how GPU low-level architectures might be said to work these days.
djmips 5 days ago |
Great point. And you can even extend that to think like a 6502 or GPU programmer on an AMD, ARM or Intel CPU as well if you want the very best performance. Caches are big enough on modern CPUs that you can almost run portions of your code in the same manner. I bet TPUs at Google also qualify.
mysterymath 5 days ago |
Hey, llvm-mos maintainer here. I actually work on LLVM in my dayjob too, and I don't particularly want llvm-mos upstream. It stretches LLVM's assumptions a lot, which is a good thing in the name of generality, but the way it stretches those assumptions isn't particularly relevant anymore. That is, it's difficult to find modern platforms that break the same assumptions.
Also, maintaining a fork is difficult, but doable. I work on LLVM a ton, so it's pretty easy for it to fold in to my work week-to-week. And quite surprisingly, I used AI to help last time, and it actually helped quite a lot!
zozbot234 5 days ago |
Even if y'all don't particularly care about having the full backend upstream just yet, it still seems worthwhile to comprehensively document these assumptions within the project, and perhaps to upstream a few of the simpler custom passes where not too much "stretching" of assumptions is involved, if only to ease future forward-porting work.
nineteen999 5 days ago |
What's your take on sdcc 6502 support at the moment, if you have one? Im just happy to finally have an 8-bit C compiler that supports both targets, even if the codegen for 6502 needs a lot of work right now.
I'd happily take a llvm-z80 and llvm-6502 over sdcc if both were available
Edit: oh wow, look at that https://github.com/grapereader/llvm-z80. Aw but not touched for 12 years.
bbbbbr 4 days ago |
Not the parent, but I have a take. :)
For GBDK-2020 we've been using the 6502 support in SDCC to support the NES as a target console for about 2 years alongside the existing Game Boy and SMS/Game Gear targets.
The 6502 port has been usable, but doesn't seem fully mature. There has been a lot of code churn for it during the last 12 months compared to the z80/sm83 ports as it gets improved. Recently (their recommended pre-release build 15614) this seems to have resulted in some breaking regressions that we haven't fully tracked down.
Perhaps this port is getting less testing coverage than the z80/sm83 port. Unsure. The majority of the 6502 work seems to be done by a newer member of their team, with the longer term members seeming to be somewhat hands-off. That might be an additional factor.
Edit: BTW, the 6502 port in SDCC at build 15267 (~4.5.0+) has been reasonably stable and usable, and is what we based our last GBDK-2020 release on (6 months ago).
nineteen999 4 days ago |
Ah thank you, that is all very helpful - I've been using 4.4.0 which is fine for Z80 code, but yeah had the feeling 6502 code generation could be improved.
HarHarVeryFunny 5 days ago |
According to this page, LLVM-MOS seems to be pretty soundly beaten in performance of generated code by Oscar64.
https://thred.github.io/c-bench-64/
I think the ideal compiler for 6502, and maybe any of the memory-poor 8-bit systems would be one that supported both native code generation where speed is needed as well as virtual machine code for compactness. Ideally would also support inline assembler.
The LLVM-MOS approach of reserving some of zero page as registers is a good start, but given how valuable zero page is, it would also be useful to be able to designate static/global variables as zero page or not.
zozbot234 5 days ago |
AIUI, Oscar64 does not aim to implement a standard C/C++ compiler as LLVM does, so the LLVM-MOS approach is still very much worthwhile. You can help by figuring out which relevant optimizations LLVM-MOS seems to be missing compared to SOTA (compiled or human-written) 6502 code, and filing issues.
djmips 5 days ago |
I feel like no amount of optimizations will close the gap - it's an intractable problem.
fooker 5 days ago |
It's performance of generated code, not performance of the compiler.
djmips 4 days ago |
mhm
HarHarVeryFunny 4 days ago |
I wouldn't say intractable, but it's not clear whether LLVM's optimization framework is flexible enough for it.
From mysterymath's (LLVM-MOS) description, presenting some of zero page as 16 bit registers (to make up for lack thereof, and perhaps due to LLVM not having any other support for preferred/faster memory regions), while beneficial, still had limitations since LLVM just assumes that there will be FAST register-register transfer operations available, and that is not even true for the 6502's real registers (no TXY), let alone these ZP "registers" which would require using the accumulator to copy.
A code generation/optimization approach that would seem more guaranteed to do well on the 6502 might be to treat it more as tree search (with pruning) - generate multiple branching alternatives and then select the best based on whatever is being optimized for (clock cycles or code size).
Coding for the 6502 by hand was always a bit like that ... you had some ideas of alternate ways things could be coded, but there was also an iterative phase of optimization (cf search) to tweak it and save a byte here, a couple of cycles there ...
I've mentioned elsewhere I used to work for Acorn back in the day, developing for the BBC micro, with me and a buddy developing the ISO-Pascal system which was delivered in 2 16KB ROMs. Putting code in ROM gives you an absolute hard size budget, and putting a full Pascal compiler, interpreter, runtime library and programmers editor into a total 32KB was no joke! I remember at the end of the project we were still a few hundred bytes over what would fit in the ROMs, and had to fight for every byte to make it fit!
djmips 4 days ago |
It is my conjecture that due to the 8 bit index registers, contrast that to 6800, 6809 and others, the 6502 becomes fundamentally a structure of arrays (SOA) system versus C which is coupled in it's libraries and existing code base with array of structures (A0S).
Optimizing code will never solve good data oriented design. This is just one of the reasons that Asm programmers routinely beat C code on the 6502. Another one is math. In the C language specification, if fixed point had been given equal footing with float that would also help.
These are such a blind spos that you rarely even see custom 6502 high level languages accommodate these fundamental truths.
BTW, growing up on the 6502, I had no problems moving into the very C friendly 68000 but later going backwards to the 6809, on the surface it looked so much like a 6502 that I was initially trying to force SOA in my data design before realizing it was better suited to AOS.
zozbot234 4 days ago |
There is a C standard extension for embedded/low-level programming which specifies fixed point arithmetic. (And other goodies such as hardware register access and multiple address spaces.)
HarHarVeryFunny 3 days ago |
If we're comparing performance of code generated by a C compiler vs hand optimized assembler, then for it to be an apples-to-apples comparison the same data structures (e.g. SOA or AOS) need to be used in both cases.
djmips 3 days ago |
Yes, that's true and should be how good 6502 high level code would be written.
HarHarVeryFunny 3 days ago |
Yep. C was always meant to be a "close to the metal" language providing a feature set that could be mapped pretty directly to the processors it was running on. It's a "low level, high level language" where the expectation is more what you see is what you get (WYSIWYG), even though a modern optimizer might be expected to remove invariant code out of loops, etc - localized efficiency gains, but not large scale transformation.
So, optimal C targetting the 6502 is not going to look much like C targetting a modern processor. The developer still needs to be very aware of the limitations of the processor they are targetting.
One somewhat radical thing that LLVM-MOS does is to analyze the program's call graph, and for functions that are not used recursively it will assign parameters and local variables to zero page instead, both for speed of access and to avoid need for a stack frame. Even though this violates the WYSIWYG mental model, this is a nice abstraction of what the assembly language programmer would have done themself.
djmips 3 days ago |
>One somewhat radical thing that LLVM-MOS does is to analyze the program's call graph, and for functions that are not used recursively it will assign parameters and local variables to zero page instead, both for speed of access and to avoid need for a stack frame. Even though this violates the WYSIWYG mental model, this is a nice abstraction of what the assembly language programmer would have done themself.
very nice, it sounds similar to the 'compiled stack' concept. I've seen that here in the co2 language for the 6502
"Variables declared as subroutine parameters or by using let are statically allocated using a "compiled stack", calculated by analyzing the program's entire call graph. This means scopes will not use memory locations used by any inner scopes, but are free to use them from sibling scopes. This ensures efficient variables lookups, while also not wasting RAM. However, it does mean that recursion is not supported."
https://github.com/dustmop/co2
asiekierka 4 days ago |
We already know what the main remaining issue is - LLVM-MOS's register allocator is far from optimal for the 6502 architecture. mysterymath is slowly working on what may become a more sutiable allocator.
HarHarVeryFunny 4 days ago |
There is a video below of mysterymath presenting LLVM-MOS where he talks about reserving 32 bytes of zero page to present to LLVM as 16 16-bit registers, to be able to utilize it's register allocator, which does seem a sane approach.
https://www.youtube.com/watch?v=ejbTKtgSZI0
However his github doesn't show any activity on it in the last 2 years.
https://github.com/mysterymath/llvm-mos
mysterymath 4 days ago |
Lol? https://github.com/mysterymath/llvm-mos/tree/regalloc
HarHarVeryFunny 4 days ago |
My bad - was just looking at the main branch.
bbbbbr 5 days ago |
With regard to code size in this comparison someone associated with llvm-mos remarked that some factors are: their libc is written in C and tries to be multi-platform friendly, stdio takes up space, the division functions are large, and their float support is not asm optimized.
HarHarVeryFunny 5 days ago |
I wasn't really thinking of the binary sizes presented in the benchmarks, but more in general. 6502 assembler is compact enough if you are manipulating bytes, but not if you are manipulating 16 bit pointers or doing things like array indexing, which is where a 16-bit virtual machine (with zero page registers?) would help. Obviously there is a trade-off between speed and memory size, but on a 6502 target both are an issue and it'd be useful to be able to choose - perhaps VM by default and native code for "fast" procedures or code sections.
A lot of the C library outside of math isn't going to be speed critical - things like IO and heap for example, and there could also be dual versions to choose from if needed. Especially for retrocomputing, IO devices themselves were so slow that software overhead is less important.
djmips 5 days ago |
More often than not the slow IO devices were coupled with optimized speed critical code due to cost savings or hardware simplification. Heap is an approach that rarely works well on a 6502 machine - there are no 16 bit stack pointers and it's just slower than doing without - However I tend to agree that a middle ground 16 bit virtual machine is a great idea. The first one I ever saw was Sweet16 by Woz.
HarHarVeryFunny 5 days ago |
I agree about heap - too much overhead to be a great approach on such a constrained target, but of course the standard library for C has to include it all the same.
Memory is better allocated in more of a customized application specific way, such as an arena allocator, or just avoid dynamic allocation altogether if possible.
I was co-author of Acorn's ISO-Pascal system for the 6502-based BBC micro (16KB or 32KB RAM) back in the day, and one part I was proud of was a pretty full featured (for the time) code editor that was included, written in 4KB of heavily optimized assembler. The memory allocation I used was just to take ownership of all free RAM, and maintain the edit buffer before the cursor at one end of memory, and the buffer content after the cursor at the other end. This meant that as you typed and entered new text, it was just appended to the "before cursor" block, with no text movement or memory allocation needed.
sehugg 5 days ago |
I've implemented Atari 2600 library support for both LLVM-MOS and CC65, but there are too many compromises to make it suitable for writing a game.
The lack of RAM is a major factor; stack usage must be kept to a minimum and you can forget any kind of heap. RAM can be extended with a special mapper, but due to the lack of a R/W pin on the cartridge, reads and writes use different address ranges, and C does not handle this without a hacky macro solution.
Not to mention the timing constraints with 2600 display kernels and page-crossing limitations, bank switching, inefficient pointer chasing, etc. etc. My intuition is you'd need a SMT solver to write a language that compiles for this system without needing inline assembly.
ddingus 5 days ago |
A very simple BASIC compiled pretty well! It did feature online assembly, and I agree with you on this necessary point especially concerning the 2600!
See Batari Basic
kwertyoowiyop 5 days ago |
Aztec C had both native and interpreted code generation, back in the day.
zozbot234 5 days ago |
> I think the ideal compiler for 6502, and maybe any of the memory-poor 8-bit systems would be one that supported both native code generation where speed is needed as well as virtual machine code for compactness.
Threaded code might be a worthwhile middle-of-the-way approach that spans freely across the "native" and "pure VM interpreter" extremes.
anthk 5 days ago |
If it runs fast under an AppleI, it will run fine in the rest.
iberator 5 days ago |
Slightly off-topic. If you want to learn low level assembly programming in the XXI century, 6502 is still an EXCELLENT choice!
Simple architecture and really really joyful to use even for casual programmers born a decade, or two later :)
1000100_1000101 5 days ago |
I'd argue that 68K is simpler to learn and use. You get a similar instruction set, but 32-bit registers, many of them. It's even got a relocatable stack so it can handle threading when you get to that point.
chihuahua 5 days ago |
I agree, I feel like the 68k architecture was a dream for assembly programming. each register is large enough to store useful values, there are lots of them, there are instructions for multiply and divide. This allows you to focus on the essence of what you want to accomplish, and not have to get side-tracked into how to represent the X-coordinate of some object because it's just over 8 bits wide, or how to multiply to integers. Both of these seemingly trivial things already require thought on the 6502.
monocasa 5 days ago |
And registers are actually pointer width, so you don't have to go through memory just to do arbitrary pointer arithmetic.
jacquesm 5 days ago |
If 8 bit: 6809. If 32 bit: 68K. Those are miles ahead of the 6502. Otoh if you want to see a fun quirky chip the 6502 is definitely great, and I'd recommend you use a (virtual) BBC Micro to start you off with.
bsder 5 days ago |
Yeah, the 6809 is just ridiculously good to learn assembly language on. Motorola cleaned up all the idiocies from the 6800 on the 6809.
The attention the 6502 get is just because of history. The advantage the 6502 had was that it was cheap--on every other axis the 6502 sucked.
jacquesm 5 days ago |
Imagine a world where the Apple II had a 6800 later upgraded to 6809...
hashmash 5 days ago |
It wouldn't have happened because the 6809 wasn't binary compatible with the 6800.
jacquesm 5 days ago |
So?
hashmash 5 days ago |
Because none of the existing software would work. The idea of running a Rosetta-like feature on an 8-bit CPU isn't feasible. The Apple II eventually received an upgraded processor, the 65816, which was compatible with the 6502.
jacquesm 5 days ago |
In those days nobody cared about binary compatibility. If you had an assembler and the source code you were all set.
jecel 5 days ago |
Though a problem, as you point out, it still happened. The 6800 based SWTPC was followed by 6809 machines what need to have all their software reassembled.
On the other side of the cpu wars, all those 8080 machines moving on the Z80s got to keep all their binary software, which happened again for IBM PCs and clones as those evolved.
https://en.wikipedia.org/wiki/SWTPC_6800
djmips 5 days ago |
The 6809 was SOURCE compatible with the 6800 - you can assemble 6800 code on a 6809 assembler and it will run with perhaps very minor tweaks.
bsder 5 days ago |
Then the Apple II would never have sold.
The 6800 was expensive versus the 6502--almost 10x (6502 was $25 when the 6800 was $175 which was already reduced from $360)!
jacquesm 5 days ago |
Yes, I was thinking more from a tech perspective, not from a price perspective.
djmips 5 days ago |
And yeah, there was a 6502 Apple I too!
classichasclass 5 days ago |
Sucked, compared to? If the 6502 sucked on every other metric but cost, while it would have gotten some use I don't think it would have been as heavily used as it was.
Someone 5 days ago |
The 6502 is from 1975, the 6809 from 1978.
The 6800 (from 1974) could have been competitive with the 6502, but (https://en.wikipedia.org/wiki/Motorola_6809#6800_and_6502):
“The 6800 was initially sold at $360 in single-unit quantities, but had been lowered to $295. The 6502 was introduced at $25, and Motorola immediately reduced the 6800 to $125. It remained uncompetitive and sales prospects dimmed. The introduction of the Micralign to Motorola's lines allowed further reductions and by 1981 the price of the then-current 6800P was slightly less than the equivalent 6502, at least in single-unit quantities. By that point, however, the 6502 had sold tens of millions of units and the 6800 had been largely forgotten.”
retrac 5 days ago |
LLVM includes an equivalent to binutils, with a macro assembler, linker, objdump with disassembler, library tools, handling formats like ELF and raw binary etc.
LLVM-MOS includes all of that. It is the same process as using LLVM to cross-assemble targeting an embedded ARM board. Same GNU assembler syntax just with 6502 opcodes. This incidentally makes LLVM-MOS one of the best available 6502 assembly development environments, if you like Unix-style cross-assemblers.
BoredomIsFun 5 days ago |
I'd argue Atmel AVR is a better choice - this is a very a much alive platform, that did not change much last 30 years.
bbbbbr 5 days ago |
There is a similar project for the Game Boy (sm83 cpu) with a fork of LLVM.
https://github.com/DaveDuck321/gb-llvm
https://github.com/DaveDuck321/libgbxx
It seems to be first reasonably successful attempt (can actually be used) among a handful of previous abandoned llvm Game Boy attempts.
retrac 5 days ago |
Presumably it would be straightforward to port the GB code generation to the Intel 8080 / Z80. There have been a few attempts for LLVM for those CPUs over the years. But none which panned out, I think?
zozbot234 5 days ago |
Most attempts at developing new LLVM downstream architectures simply fail at keeping up with upstream LLVM, especially across major releases. Perhaps these projects should focus a bit more on getting at least some of their simpler self-contained changes to be adopted upstream, such as custom optimization passes. Once that is done successfully, it might be easier to make an argument for also including support for a newly added ISA, especially a well-known ISA that can act as convenient reference code for the project as a whole.
codebje 5 days ago |
The CE-dev community's LLVM back-end for the (e)Z80 'panned out' in that it produced pretty decent Z80 assembly code, but like most hobby-level back-ends the effort to keep up to date with LLVM's changes overwhelmed the few contributors active and it's now three years since the last release. It still works, so long as you're OK using the older LLVM (and clang).
This is why these back-ends aren't accepted by the LLVM project: without a significant commitment to supporting them, they're a liability for LLVM.
avadodin 5 days ago |
A lot of people try to write backends for LLVM to support obscure architectures but these guys are the only ones I know that have ever been successful to any degree.
Portable assembly has a nice ring to it but reality is a harsh mistress and she only speaks C++.
Even the hobby underdog qbe seems ill-suited to 6502 repurposing.