I maintain a high-performance JIT compiler used by large corporations around the world to run complicates business rules for logistics and optimisations. It uses LLVM as the backend for machine code generation.
I mostly disable the optimisations provided by LLVM. Because it makes code generation much slower with almost no performance gains. Writing high level optimisation passes before converting the AST to LLVM is what made the generated code super fast.
LLVM has no knowledge of the semantics of the language. So it can only optimise low level details that the higher level optimisations will get rid of anyway.
It seems to me that the LLVM optimisations are only of benefit to you if you generate really bad LLVM code in the first place.
This should be a fairly unsurprising result. High level languages are fundamentally more optimisable than low level ones because, as you say, the latter express unnecessary constraints, lacking information about domain-level semantics. Low-level optimisations are also known to follow a pareto distribution wrt their efficacy: 20% of the optimisations are responsible for 80% of your performance (if not more). See for example mir - https://developers.redhat.com/blog/2020/01/20/mir-a-lightwei...
That said:
> It seems to me that the LLVM optimisations are only of benefit to you if you generate really bad LLVM code in the first place.
Much llvm development is sponsored by large corporations for whom it really is worth it to squeeze that last 1%.
I guess your generated code contains a lot of black boxes that the optimizer can't see through. At my work, the LLVM optimizer is doing an insane amount of optimizations.
Rust is very optimizable by LLVM, but still has similar issue with performance. It's costly to optimize overly verbose/inefficient LLVM IR. Rust ended up implementing its own (MIR) optimization passes before LLVM to generate more optimized IR for it.
JITs are generally one of the most challenging places to use LLVM, exactly because of its bad compile-time characteristics. There are some successful uses of LLVM based JIT compilers (e.g. Azul's Falcon JIT), but this is definitely a use case where you can't just use the standard optimization pipeline. You'll generally use a custom pipeline and likely only use LLVM for the second stage JIT compiler.
That said, I don't think your statement that LLVM optimizations only benefit you if you generate bad input IR is correct. It just sounds like they are not useful for your specific problem domain.
Out of the box LLVM optimizations won't do much if you put work in preoptimizing. I think this is even mentioned in docs. IIRC they say that the generic optimizations only really work on naive IR generation and that more mature projects will probably not find them useful and instead create their own transforms - if you're mainly using the JIT then your approach seems the best. Running the pre-bundled optimizations is really a brute force approach to optimization. It works great for just getting things going but you outgrow it pretty quick.
Surely LLVM's inlining heuristics must be one of its strengths. I thought good inlining was almost all of optimisation these days, based on a Chandler Carruth talk on LLVM
It's more of a 'bring your own optimizer' kind of a framework.
The idea is that, you know best what optimizations work for your domain.
But a compiler needs a large amount of engineering for things which are not optimizations.
MLIR makes it possible to get this infra (developed utilizing lessons from LLVM and other compilers) for free and share improvements among multiple compilers without pulling your hair out trying to understand misleading academic papers.
My (partial/incomplete/buggy/experimental) Ruby compiler generates awful code, and still by far the biggest performance bottleneck is creation and garbage collection of objects that improving the low level code generation will have only marginal effects on.
E.g. finally adding type tagging for integers (instead of creating objects on the heap) sped up compiling itself by tens of times (taking it from unusually slow to comparable to MRI on that specific task) and there's nothing a low level optimizer will do to figure out transformations like that.
Maybe one day I'll get far enough on fixing the high level issues that it'll be worth even trying to do more complex low level optimizations, but that's a long time away.
Is any information on this public? I've believed for a while that domain specific optimisations are the right way to go but haven't found many examples of it in practice.
LLVM is largely (at least originally) tuned for clang's output which tends towards simple IR that LLVM will clean up later, with a fair bias towards making numerical benchmarks run faster.
I mostly disable the optimisations provided by LLVM. Because it makes code generation much slower with almost no performance gains. Writing high level optimisation passes before converting the AST to LLVM is what made the generated code super fast.
LLVM has no knowledge of the semantics of the language. So it can only optimise low level details that the higher level optimisations will get rid of anyway.
It seems to me that the LLVM optimisations are only of benefit to you if you generate really bad LLVM code in the first place.