Building the Rust Compiler with GCC

fractalfir.github.io

220 points by todsacerdoti 2 days ago

> Normally, debuing the compiler is fairly straightforward: it is more or less a run of the mill executable.

> In the bootstrap process, the entire thing becomes way more complex. You see, rustc is not invoked directly. The bootstrap script calls a wrapper around the compiler.

> Running that wrapped rustc is not easy to run either: it requires a whole lot of complex, environment flags to be set.

> All that is to say: I don’t know how to debug the Rust compiler. I am 99.9 % sure there is an easy way to do this, documented somewhere I did not think to look. After I post this, somebody will tell me "oh, you just need to do X".

> Still, at the time of writing, I did not know how to do this.

> So, can we attach gdb to the running process? Nope, it crashes way to quickly for that.

It's kind of funny how often this problem crops up and the variety of tricks I have in my back to deal with it. Sometimes I patch the script to invoke gdb --args [the original command] instead, but this is only really worthwhile if it's a simple shell script and also I can track where stdin/stdout are going. Otherwise I might patch the code to sleep a bit before actually running anything to give me a chance to attach GDB. On some platforms you can get notified of process execs and sometimes even intercept that (e.g. as an EDR solution) and sometimes I will use that to suspend the process before it gets a chance to launch. But I kind of wish there was a better way to do this in general…LLDB has a "wait for launch" flag but it just spins in a loop waiting for new processes and it can't catch anything that dies too early.

timhh a day ago

I have a C library (I've also done a Python one in the past) that you load into the executable you want to debug. It activated based on an environment variable so normally I just permanently link it.
When it is loaded it will automatically talk to VSCode and tell it to start a debugger and attach to it & it waits for the debugger to attach.
End result is you just have to run your script with an environment variable set and it will automatically attach a nice GUI debugger to the process no matter how deeply buried in scripts and Makefiles it is.
https://github.com/Timmmm/autodebug
I currently use this for debugging C++ libraries that are dynamically loaded into Questa (a commercial SystemVerilog simulator) that is started by a Python script running in some custom build system.
In the past I used it to debug Python code running in an interpreter launched by a C library loaded by Questa started by a Makefile started by a different Python interpreter that was launched by another Makefile. Yeah. It wasn't the only reason by a long shot but that company did not survive...
o11c 2 days ago

Other ideas:
* Run the whole tree of processes under `gdb` with `set detach-on-fork off`.
* LD_PRELOAD a library that inserts the sleeps for you, maybe on startup or maybe on signal/exit.
Ideally, we'd have some kind of infrastructure to name and identify particular processes recursively.
jcranmer 2 days ago

I have a LD_PRELOAD library that hooks SIGSEGV into spawning gdb on the process using the best guess for the process's terminal (which currently isn't very smart because I haven't yet needed to debug processes that do a lot of stdio redirection).
- dezgeg a day ago
  
  Do you have this publicly available? Sounds useful!
  
  theMMaI a day ago
  
  https://github.com/stass/libsegfault does something similar if not the same
izacus a day ago

I find it outright incredible how much software is built in a way that outright prevents debugging and observabiliy of what's going on in it (no hooks, no logging, no error messages, etc.). I have no idea how people fix bugs there outside vibing.
- mystified5016 a day ago
  
  The old fashioned way: elbow grease and lots of squinting and swearing at your computer
mark_undoio a day ago

Process recording by time travel debug seems like a good fit for this problem - then you can capture 100% of process execution and then go back and investigate further.
We (Undo.io) came up with a technique for following a tree of processes and initiating process recording based on a glob of program name. It's the `--record-on` flag in https://docs.undo.io/UsingTheLiveRecorderTool.html. You can grab a free trial from our website.
For open source, with rr (https://rr-project.org/) I think you'd just `rr record` the initial process and you'll end up capturing the whole process tree - then you can look at the one you're interested in.
As others have said you could also do some smart things with GDB's follow-fork settings but I think process recording is ideal for capturing complicated situations like this as you can go and review what happened later on.
touisteur a day ago

Now wondering whether the author might be able to force a core dump. With recent snapshot abilities, on Modern Intel processors one can get a Processor Trace that can be helpful even without getting an actual interactive debugging session (haven't done one of those in a while as snapshots seem enough for my needs these days).
AndrewDucker a day ago

This is where I really appreciate the .Net command
System.Diagnostics.Debugger.Launch();
Which pops up a window asking you to select what debugger you want to use, and then opens your application in it.
CJefferson 2 days ago

I agree, recently I was working with a large Java program and after spending about 90 minutes (far too long, I was getting obsessed), I just gave up trying to get it into a debugger.
This is one area where rust disappoints me, there isn’t a “cargo debug” built in (there is an external program but it doesn’t work well), and when I just manually attach gdb most of the symbols are usually missing.
I would seriously consider a language billed as “debugger-first”, just to see what the experience was like.
- gavinray a day ago
  
  For Java programs, you run the JAR with a flag telling it to launch a remote-debug server listening on some port (traditionally 5005):
  java -jar -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
  If your app is a gradle app, the flag is "--debug-jvm"
  Then you connect to it with "jdb"
  
  CJefferson a day ago
  
  Thanks, I'll try that if I'm ever Java debugging again!
  I don't think I want to do it myself, but I'd love a "101 ways to get into a debugger" webpage.
- dmitrygr 2 days ago
  
  > I would seriously consider a language billed as “debugger-first”, just to see what the experience was like.
  Use C, tell gdb "-O0 -g -ggdb3"
  
  CJefferson a day ago
  
  You are probably fairly accurate there, but that would require me going back and writing more C, which I've try to avoid nowadays :)
  I still sometimes get caught out getting debug information all in the right place, particularly when someone is using ninja or some such, I've even ended up wrapping gcc just to strip optimisation options and add -g, rather than figure out how to fight some very complex build system mess.

juxhindb a day ago

> Really, think about it. How often do you see somebody inexperienced use ifs instead of a switch? This pattern is common enough for most compilers to recognize, and more importantly, optimize it.

This made me chuckle. Playing dumb so that gcc optimizes you away is both hilarious and genius

1718627440 a day ago

Actually the compiler converts the switch to ifs not the other way around.

dwheeler 2 days ago

It may not seem like it, but this is impressive progress. Getting a compiler to bootstrap at all is an accomplishment, especially for Rust since that depends on so many things working. Once it can reliably bootstrap, a lot of performance-improving steps can begin. Congrats!

ramon156 a day ago

I'm not that deep into gcc. Would it really add a lot of performance gain?
- mijoharas a day ago
  
  I don't think the goal is performance (and I don't think a well optimised gcc implementation will improve performance). I think the comment you are replying to is speaking about performance because of things like this line from the article:
  > ...with the emphasis on limp. At some points, we are using over 100 GB of RAM... not ideal.
  (so performance will be the next thing to be worked on to make this useable).
  I think the goals of using gcc for rust is that it can provide a parallel implementation, which can help uncover bugs and provide more stability to the entire language, and because if there is a large gcc project already, they may be reticent to introduce LLVM as a dependency as well.
  
  johnklos a day ago
  
  And let's not forget that bootstrapping rust using something other than rust is quite useful.
  
  pornel a day ago
  
  You can't use this implementation to bootstrap Rust (in the sense of bootstrapping from non-Rust language or a compiler that isn't the rustc).
  This GCC support here is only a backend in the existing Rust compiler written in Rust. The existing Rust compiler is using GCC as a language-agnostic assembler and optimizer, not as a Rust compiler. The GCC part doesn't even know what Rust code looks like.
  There is a different project meant to reimplement Rust (front end) from scratch in C++ in GCC itself, but that implementation is far behind and can't compile non-toy programs yet.
  
  tucnak a day ago
  
  > reticent
  That is a $200 word right there, and you're using it wrong.
  
  mijoharas a day ago
  
  https://en.wiktionary.org/wiki/reticent (2)
  https://www.merriam-webster.com/grammar/can-reticent-mean-re...
  You're correct that some people recommend against it, and it's only been in use since the second world war.
  
  trelane a day ago
  
  It's also Merriam-Webster, a descriptivist dictionary. Others are of a prescriptionist bent. For example, Cambridge: https://dictionary.cambridge.org/us/dictionary/english/retic...
  More on prescriptive vs descriptive: https://twominenglish.com/prescriptivist-vs-descriptivist/
  I personally am more descriptivist when it comes to English. It is also a lot of fun to bring some M-W citations in order to "Well, actually" someone who is "Well, actually"-ing.
  
  echelon a day ago
  
  I only knew of the "reluctant" meaning.
  Languages mutate quickly.
  
  mijoharas a day ago
  
  Yep, it's what I've heard most often too (I only just learnt the other meaning from the kind person prompting me look up the definition). I also don't think of it as that much of an unusual word, but hey.
  
  nartho a day ago
  
  Reticent is a valid synonym of reluctant.

Cogito 2 days ago

Really great read.

Someone mentioned recently that the slowness of rustc is in large part due to llvm. I know that is probably orthogonal to the work here, but I do like the idea of building the compiler with different toolchains, and that there may be follow on effects down the line.

JoshTriplett 2 days ago

Depends on the workload, but yes, codegen is a huge part of the total compilation time.
That said, that doesn't mean LLVM is always where the fixes need to be. For instance, one reason rustc spends a lot of time in LLVM is that rustc feeds more code to LLVM than it should, and relies on the LLVM optimizer to improve it. Over time, we're getting better about how much code we throw at LLVM, and that's providing performance improvements.
- Cogito a day ago
  
  I'm completely ignorant so forgive me if this is obvious: in the effort of the parent article - to compile rustc with gcc - will rustc still be feeding lots of code to LLVM, or would that code now be fed to gcc?
  
  heftig a day ago
  
  Which codegen backend the building compiler uses is independent of which codegen backend(s) the built compiler uses.
  Similarly, you can build Clang using itself or using GCC. The resulting compiler should behave the same and produce the same machine code, even if its own machine code is somewhat different.
  The produced binaries could still have artifacts from the original compiler in them, e.g. if "compiler built-in" libraries or standard libraries were compiled with the original compiler.
  Both GCC and rustc use a multi-stage build process where the new compiler builds itself again, so you reach an idempotent state where no artifacts from the original compiler are left.
feznyng a day ago

There's an experimental compiler backend [1] using cranelift [1] that's supposed to improve debug build times. I never see it mentioned often in threads about Rust's long compilation time so I'm not sure if I'm missing something.
[1] https://github.com/rust-lang/rustc_codegen_cranelift/ [2] https://cranelift.dev/
torstenvl 2 days ago

It's slow because the borrow checker is NP complete. LLVM may or may not generate slower code than GCC would for rustc, but I doubt it's anywhere close to the primary cause of the lack of snappy.
- resurrectedcyb a day ago
  
  I do not know if the borrow checker and Rust's type system have been formalized. There are stacked borrows and tree borrows, and other languages experimenting with features similar to borrow checking, but without formal algorithms like Algorithm W or J for Hindley-Milner, or formalizations like of Hindley-Milner and problems like typability for them, I am not sure how one can prove the complexity class of the borrow checking problem nor a specific algorithm, like https://link.springer.com/chapter/10.1007/3-540-52590-4_50 does for ML.
  I could imagine you being correct about the borrow checking typability problem being NP-complete. Or an even worse complexity class. Typability in ML is EXPTIME-complete, a larger set than NP-complete https://en.wikipedia.org/wiki/EXPTIME https://dl.acm.org/doi/10.1145/96709.96748 .
  I also am not sure how to figure out if the complexity class of some kind of borrow checking has something to do with the exponential compile times of some practical Rust projects after they upgraded compiler version, for instance in https://github.com/rust-lang/rust/issues/75992 .
  It would be good if there was a formal description of at least one borrow checking algorithm as well as the borrow checking "problem", and maybe also analysis of the complexity class of the problem.
  
  genrilz a day ago
  
  There isn't a formal definition of how the borrow checking algorithm works, but if anyone is interested, [0] is a fairly detailed if not mathematically rigorous description of how the current non-lexical lifetime algorithm works.
  The upcoming Polonius borrow checking algorithm was prototyped using Datalog, which is a logical programming language. So the source code of the prototype [1] effectively is a formal definition. However, I don't think that the version which is in the compiler now exactly matches this early prototype.
  EDIT: to be clear, there is a polonius implementation in the rust compiler, but you need to use '-Zpolonius=next' flag on a nightly rust compiler to access it.
  [0]: https://rust-lang.github.io/rfcs/2094-nll.html
  [1]: https://github.com/rust-lang/polonius/tree/master
  
  resurrectedcyb a day ago
  
  Interesting. The change to sets of loans is interesting. Datalog, related to Prolog, is not a language family I have a lot of experience with, only a smidgen. They use some kind of solving as I recall, and are great at certain types of problems and explorative programming. Analyzing the performance of them is not always easy, but they are also often used for problems that already are exponential.
  I read something curious.
  https://users.rust-lang.org/t/polonius-is-more-ergonomic-tha...
  >I recommend watching the video @nerditation linked. I believe Amanda mentioned somewhere that Polonius is 5000x slower than the existing borrow-checker; IIRC the plan isn't to use Polonius instead of NLL, but rather use NLL and kick off Polonius for certain failure cases.
  That slowdown might be temporary, as it is optimized over time, if I had to guess, since otherwise there might then be two solvers in compilers for Rust. It would be line with some other languages if the worst-case complexity class is something exponential.
  
  genrilz a day ago
  
  > IRC the plan isn't to use Polonius instead of NLL, but rather use NLL and kick off Polonius for certain failure cases.
  Indeed. Based on the last comment on the tracking issue [0], it looks like they have not figured out whether they will be able to optimize Polonius enough before stabilization, or if they will try non-lexical lifetimes first.
  [0]: https://github.com/rust-lang/rust-project-goals/issues/118
- almostgotcaught 2 days ago
  
  You're wrong it's been debunked that the borrow checker is any appreciable part of the compile time - Steve Klabnik actually verified it on here somewhere.
  Edit: found it
  https://news.ycombinator.com/item?id=44391240
  
  resurrectedcyb a day ago
  
  I don't see that debunking it. Instead, it says "usually". That means that it depends on the project.
  There is definitely Rust code that takes exponential time to compile, borrow checker or not.
  https://play.rust-lang.org/?version=stable&mode=release&edit...
  https://github.com/rust-lang/rust/issues/75992
  Some people used async in ways that surfaced these problems. Upgraded rustc, then the project took forever to compile.
  
  steveklabnik a day ago
  
  I say “usually” because of course sometimes bugs happen and of course you can conduct degenerate stress tests. But outside of those edge cases, it’s not an issue. If it were, blog posts that talk about lowering compile times would be discussing avoiding the borrow checker to get better times, but they never do. It’s always other things.
  
  resurrectedcyb a day ago
  
  Is there any tool for Rust that does profiling that detects what part of compilation time is caused by what? Like, a tool that reports:
  - Parsing: x ms
  - Type checking: y ms
  - LLVM IR generation: z ms
  And have there been any statistics done on that across open-source projects, like mean, median, percentiles and so on?
  I am asking because it should depend a lot on each project what is costly in compile time, making it more difficult to analyse. And I am also curious about how many projects are covered by "edge cases", if it is 1%, 0.1%, 0.01%, and so on.
  
  steveklabnik a day ago
  
  The post my original comment is on discusses doing this at length.
  > And have there been any statistics done on that across open-source projects, like mean, median, percentiles and so on?
  I am not aware of any. But in all the posts on this topic over the years, codegen always ends up being half the time. It’s why cargo check is built the way it is, and why it’s always faster than a full build. If non-codegen factors were significant with any regularity, you’d be seeing reports of check being super slow compared to build.
  
  resurrectedcyb a day ago
  
  I have actually seen a few posts here and there of 'cargo check' being slow. I have also heard of complaints of rust-analyzer being slow, though rust-analyzer may be doing more than just 'cargo check'.
  https://www.reddit.com/r/rust/comments/1daip72/rust_checkrun...
  May not be indicative, not sure what crate the author was using.
  
  steveklabnik a day ago
  
  cargo check can be slow but what I mean is relative to a full build. Large projects are going to be slow to build by virtue of being large.
  
  resurrectedcyb a day ago
  
  That project had 'cargo check' take 15-20 minutes, though it might not have been indicative, the submitter posted an update about how it was fixed.
  This may be a better example.
  https://github.com/rust-lang/rust/issues/132064
  >cargo check with 1.82: 6m 19s
  >cargo check with 1.81: 1m 22s
  It may be difficult to fix.
  https://github.com/rust-lang/rust/issues/132064#issuecomment...
  >Triage notes (AFAIUI): #132625 is merged, but the compile time is not fully clawed back as #132625 is a compromise between (full) soundness and performance in favor of a full revert (full revert would bring back more soundness problems AFAICT)
  Update: They fixed it, but another issue surfaced, possibly related, as I read the comments.
  
  3836293648 7 hours ago
  
  Yes, there's a -Ztime-passes, but it's nightly only (or stable with the bootstrap env var set)
  
  steveklabnik 2 hours ago
  
  My understanding is that -Ztime-passes isn't very accurate anymore, as it's not integrated into the query system well, or something.
  
  Cogito a day ago
  
  Thanks, I'm pretty sure I was thinking of this whole discussion when I made my original comment :)

1718627440 a day ago

Wouldn't the best way to check whether something is inlineable to just try it? Why can't GCC be let to try to inline it and if it encounters that it isn't possible it can be instructed to just not inline it?

pornel a day ago

My guess is that the public library interface of GCC doesn't support it this way.
This back-end uses the confusingly-named libgccjit (not as JIT), which gives access only to a subset of GCC's functionality.
If something isn't already exposed, it might take a while to get patches to GCC and libgccjit accepted and merged.

Rogach 2 days ago

Sounds like attempting to always inline a recursive function should be an error instead. But it's probably undesirable to make that change because it would likely break existing crates and thus backwards compatibility as well?

dwattttt a day ago

"inline(always)" I expect matches Clang's "always_inline", and Clang's documentation makes what it does clearer:
> Inlining heuristics are disabled and inlining is always attempted regardless of optimization level.
So it should be interpreted as "always attempt to inline", as opposed to "this must be inlined", or other attributes that instead influence the "should this be inlined" heuristic.
EDIT: as curious an attribute as it might be, I didn't mean to be talking about inclines
j16sdiz a day ago

recursive function can be inline when unrolled. This is a valid optimization.
Google "llvm inline recursion". It exists. It should works. Fibonacci is the standard test case.
- 1718627440 a day ago
  
  But not every recursive function can be inlined (without a secondary stack).

dtgriscom 2 days ago

I love vicarious engineering.

aswanson 2 days ago

I just started playing with rust again today. Godspeed.