I maintain a high-performance JIT compiler used by large corporations around the...

moonchild · on May 13, 2023

This should be a fairly unsurprising result. High level languages are fundamentally more optimisable than low level ones because, as you say, the latter express unnecessary constraints, lacking information about domain-level semantics. Low-level optimisations are also known to follow a pareto distribution wrt their efficacy: 20% of the optimisations are responsible for 80% of your performance (if not more). See for example mir - https://developers.redhat.com/blog/2020/01/20/mir-a-lightwei...

That said:

> It seems to me that the LLVM optimisations are only of benefit to you if you generate really bad LLVM code in the first place.

Much llvm development is sponsored by large corporations for whom it really is worth it to squeeze that last 1%.

yukIttEft · on May 13, 2023

I guess your generated code contains a lot of black boxes that the optimizer can't see through. At my work, the LLVM optimizer is doing an insane amount of optimizations.

pornel · on May 13, 2023

Rust is very optimizable by LLVM, but still has similar issue with performance. It's costly to optimize overly verbose/inefficient LLVM IR. Rust ended up implementing its own (MIR) optimization passes before LLVM to generate more optimized IR for it.

sanxiyn · on May 14, 2023

MIR optimization doesn't just move work from LLVM to Rust, it saves work, because it runs before monomorphization (template specialization in C++).

nikic · on May 13, 2023

JITs are generally one of the most challenging places to use LLVM, exactly because of its bad compile-time characteristics. There are some successful uses of LLVM based JIT compilers (e.g. Azul's Falcon JIT), but this is definitely a use case where you can't just use the standard optimization pipeline. You'll generally use a custom pipeline and likely only use LLVM for the second stage JIT compiler.

That said, I don't think your statement that LLVM optimizations only benefit you if you generate bad input IR is correct. It just sounds like they are not useful for your specific problem domain.

klik99 · on May 13, 2023

Out of the box LLVM optimizations won't do much if you put work in preoptimizing. I think this is even mentioned in docs. IIRC they say that the generic optimizations only really work on naive IR generation and that more mature projects will probably not find them useful and instead create their own transforms - if you're mainly using the JIT then your approach seems the best. Running the pre-bundled optimizations is really a brute force approach to optimization. It works great for just getting things going but you outgrow it pretty quick.

fooker · on May 13, 2023

This is the exact problem MLIR intends to solve.

LLVM optimizations are only really effective for simplifying integer expressions and loop patterns seen in code generated from imperative languages.

3836293648 · on May 14, 2023

Surely LLVM's inlining heuristics must be one of its strengths. I thought good inlining was almost all of optimisation these days, based on a Chandler Carruth talk on LLVM

fooker · on May 14, 2023

If you are compiling a language like C++ and care a lot about code size, yes.

Most compilers don't have to, and then you just inline whatever you can until some estimated register pressure is reached at the call site.

mhh__ · on May 13, 2023

Is MLIR practically good at anything other than loop patterns in AI and HPC acceleration?

fooker · on May 13, 2023

It's more of a 'bring your own optimizer' kind of a framework.

The idea is that, you know best what optimizations work for your domain.

But a compiler needs a large amount of engineering for things which are not optimizations.

MLIR makes it possible to get this infra (developed utilizing lessons from LLVM and other compilers) for free and share improvements among multiple compilers without pulling your hair out trying to understand misleading academic papers.

brickteacup · on May 13, 2023

Can you elaborate on the extent to which you use LLVM optimizations? Is it comparable to e.g. clang's -O1/2/3?

vidarh · on May 13, 2023

I think this is fairly unsurprising.

My (partial/incomplete/buggy/experimental) Ruby compiler generates awful code, and still by far the biggest performance bottleneck is creation and garbage collection of objects that improving the low level code generation will have only marginal effects on.

E.g. finally adding type tagging for integers (instead of creating objects on the heap) sped up compiling itself by tens of times (taking it from unusually slow to comparable to MRI on that specific task) and there's nothing a low level optimizer will do to figure out transformations like that.

Maybe one day I'll get far enough on fixing the high level issues that it'll be worth even trying to do more complex low level optimizations, but that's a long time away.

JonChesterfield · on May 13, 2023

Is any information on this public? I've believed for a while that domain specific optimisations are the right way to go but haven't found many examples of it in practice.

LLVM is largely (at least originally) tuned for clang's output which tends towards simple IR that LLVM will clean up later, with a fair bias towards making numerical benchmarks run faster.