Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl

ChrisFoster · on June 16, 2020

My main concern with this comparison is that there's no mention of numerical stability / accuracy of this method.

Calculating determinants is notoriously prone to cancellation error if done naively, so you've got to be really careful. By extension, the usual text book implementation of Cramer's rule is a rather bad way to solve matrix systems. (I don't know much about Grassman.jl so perhaps they've used some known tricks to make this work better, but you do have to be really careful with this stuff.)

Actually in early versions of StaticArrays we were pretty cavalier about numerical accuracy because it was just a library we needed for fast geometric calculations. However as people started using this library for serious numerical work we've had to pay a lot more attention to having good numerical properties.

There's still a lot to learn - and no doubt much work to do - in upgrading the implementation to make sure everything is first robust, and second as fast as possible.

DreamScatter · on June 16, 2020

This algorithm is not Cramer's rule, it is analogous to it, but based on the original exterior algebra of Hermann Grassmann.

ChrisFoster · on June 17, 2020

I agree the exterior algebra is beautifully elegant mathematics, but expressive elegance of the formalism is largely irrelevant to numerical robustness. (Cramer's rule itself is a perfect example of this: it's elementary to express, but in simple form is a terrible idea numerically.)

To be clear, I'm not claiming there's anything wrong with your algorithm but simply adding a note of caution. Any careful comparison between methods must include both efficiency and numerical accuracy.

dklend122 · on June 15, 2020

Turns out the allocations are just from the fact that the computation as written was not type stable. Making it so brings down the allocations to 2 and increases the relative speed to just under 2x.

Here's the post from chris:

"That’s not type-stable. x = [@SMatrix randn(5,5) for i in 1:10000] is a type stable way to compute an array of random SMatrices, and that nearly doubles the speed of the computation and removes the allocations down to 2, i.e. from:

2.549 ms (29496 allocations: 1.44 MiB) to

1.469 ms (2 allocations: 390.70 KiB) for me. Still a nice performance for Grassmann.jl, but seems to be <2x."

dklend122 · on June 15, 2020

I'd be interested to see a static arrays benchmark run on 1.5 as it can now stack allocate views and other types due to general compiler improvements.

stabbles · on June 15, 2020

StaticArrays.jl is just linear algebra operations for tuples, and they have always been stack allocated as they are immutable and have a size known at compile-time, so Julia 1.5 will not bring anything new for it.

The deal with 1.5 is that the GC has improved in that it does not require structs with references to dynamically allocated objects to be heap allocated themselves.

dklend122 · on June 15, 2020

The allocations for the StaticArrays solve must come from somewhere

snicker7 · on June 15, 2020

The real performance win seems to be reducing the number of allocations (2 vs. 30 thousand).

spacedome · on June 15, 2020

This has been my general experience writing numerical linear algebra methods in Julia. Optimizing a specific algorithm, eliminating unnecessary allocations is the first thing I do, and can give large performance gains, especially for iterative methods.

It can be a bit annoying, and can result in much less readable code (eg having to explicitly write things like mul!(C, A, B)). It ends up looking a lot more like the FORTRAN it is meant to replace, and if you aren't careful you lose the ability to use generic types, though multiple dispatch is a great solution. The worst case I have found is iteratively calling linear algebra solvers (Eig, SVD, LU) which do not have any option for preallocating and reusing work arrays. The only way to do this now is to call the BLAS/LAPACK routines directly with ccall, which is a huge pain. There was an attempt to fix this with PowerLAPACK.jl, but it seems abandoned, so for the moment writing optimized methods in FORTRAN/C and calling from Julia is sometimes still preferable.

Julia does quite well though, so this only seems necessary for things like core NLA libraries, for example I would not try rewriting ARPACK in Julia for a performance gain. The gain in flexibility for writing these in Julia is absolutely huge though, and I definitely recommend it. The Grassmann.jl library has many examples of what a language like Julia makes possible, or DifferentialEquations.jl.

etik · on June 15, 2020

There is a recent effort [1] to provide low-level support for faster operations by transforming user code to take advantage of a compiler's instruction set, memory packing, etc. This is being expanded upon to essentially provide a Julia-native BLAS. Some of the benchmarks are even competitive with or beat Intel MKL (calibrate that statement appropriately to your level of trust in benchmarks). I wouldn't count out a Julia ARPACK implementation just yet.

[1] LoopVectorization: https://github.com/chriselrod/LoopVectorization.jl Announcement post and discussion: https://discourse.julialang.org/t/ann-loopvectorization/3284...

stabbles · on June 15, 2020

You're probably confusing ARPACK with BLAS / LAPACK.

Pure julia ARPACK already exists, e.g. https://github.com/haampie/ArnoldiMethod.jl/.

A competive BLAS-gemm is implemented here https://github.com/YingboMa/MaBLAS.jl/blob/master/src/gemm.j... (single-threaded).

A LAPACK-like library could be https://github.com/JuliaLinearAlgebra/GenericLinearAlgebra.j...

etik · on June 15, 2020

I was mostly referring to the parent comment's suggestion that low level numerical libraries wouldn't benefit from a pure Julia implementation, specifically to the statement that it was still better to write optimized C/FORTRAN and call from Julia. Indeed, MaBLAS, which you linked, is built on top of LoopVectorization.jl.

I don't know how well ArnoldiMethod.jl compares with ARPACK, but if there is a gap my suggestion is simply that these recent developments might help bridge it :)

stabbles · on June 15, 2020

FWIW, MaBLAS currently does not depend on LoopVectorization.jl, the code to generate kernels is all handwritten.

spacedome · on June 15, 2020

This is great, a lot of the performance in BLAS comes from memory management. This does not solve my problem though. Sometimes you want to manually control memory, and Julia does not make it easy to do so. In particular when interfacing with BLAS/LAPACK though Julia, memory management is quite ugly, mainly due to the need for allocating work arrays, and a pure Julia implementation of BLAS is unlikely to fix this. I don't think writing eg ARPACK performantly in Julia is imposible, just painful to the point that writing it in FORTRAN starts to make sense (to me).

stabbles · on June 15, 2020

> I don't think writing eg ARPACK performantly in Julia is impossible

Not sure what you mean, ARPACK just wraps a bunch of calls to LAPACK and BLAS, that's all. It does not have any low-level linalg kernels of its own. Also, its main bottleneck is typically outside of the algorithm, namely matrix-vector products. ArnoldiMethod.jl is pretty much a pure Julia implementation of ARPACK without any dependency on LAPACK (only BLAS when used with Float32/Float64/ComplexF32/ComplexF64). Finally note that you can easily beat ARPACK by adding GPU support, since they don't provide it.

spacedome · on June 15, 2020

The person I was responding to seemed to have read my comment and thought I had an issue with raw performance. My point above is that writing iterative code that makes many LAPACK calls becomes difficult to write in Julia because there is no way to manage the memory in this situation other than ccall-ing everything yourself, at which point I would rather write FORTRAN. I work on eigenvalue solvers, so it is all more or less just wrapping a bunch of calls to BLAS/LAPACK. As you say, the main bottleneck is in the BLAS calls anyways, but the excess allocation in Julia can really slow you down. ARPACK is maybe a bad example because its all just mat-vec, when you need to do something like compute an SVD every iteration, thats where you run into issues.

Xcelerate · on June 15, 2020

I love Julia and have used it for years, but this is really my only complaint with my language. I used to do HPC work and ended up spending a lot of time tracking down where tiny bits of memory were being allocated in for loops. A lot of my routines ended up having “blocks” of memory to pass in as arguments, which as you mention, moves the language a bit closer to C or Fortran IMO.

snicker7 · on June 16, 2020

Minimizing allocations (and deallocations) and optimizing memory layout are just things you have to do for HPC in any language. Though I think that it is easier in Julia than in C/Fortran.

doublesCs · on June 15, 2020

That's not uncommon in HPC.

I'm surprised if this was inherent to StaticArrays.jl though. Probably some interop edge case, I'm imagining.

Shall we summon Chris Rackauckas?

ChrisRackauckas · on June 16, 2020

It worked, I answered on Discourse because someone pinged me for an answer, saying "you've been summoned" haha.

marmaduke · on June 15, 2020

So the real question is why a library with the word "static" in its title does 30 thousand allocations..

dnautics · on June 15, 2020

I believe the "static" refers to static, aka, compile-time dimension of the array (yes, julia is a compiling language), versus an array that has a run-time, mutable array dimensionality.

ChrisFoster · on June 15, 2020

"Static" is a reference to the size of the arrays being fixed and encoded in the type. There's various allocation strategies, for instance StaticArrays.SizedArray uses a normal heap-allocated Array as the storage.

That said, the backing storage for the particular static array types used in this example (SMatrix,SVector) is an immutable Tuple which should generally be stack allocated if the library and optimizer are working together as intended. So the allocations here are a bit of a surprise; they may indicate a lack of inlining in something we expected to be inlined.

tkoolen · on June 16, 2020

1) a mistake in the benchmark procedure explained in https://discourse.julialang.org/t/grassmann-jl-a-b-3x-faster..., and 2) the dynamically-sized result vector containing 10000 statically-sized 5-vectors still needs to be allocated.

canjobear · on June 16, 2020

I would like to move from Python to Julia because Julia gives easier access to better abstractions for numerical operations. What keeps me on Python is that there is no real replacement for PyTorch.

eigenspace · on June 16, 2020

Just a thought, but have you considered simply using PyTorch through PyCall.jl or something like ThArrays.jl? This may not be worth it if the majority of your code is reliant on PyTorch, but it's certainly an option that should be taken seriously imo.

thanatropism · on June 15, 2020

When I first heard of Julia in 2016 it was sandwiched between R and Python and thought it had no chance whatsoever of succeeding.

I’m increasingly impressed by Julia. It’s a crowded space and they remain relevant. Maybe even best of class for some applications.

nextos · on June 15, 2020

It has a lot of potential. You can often write code in Julia that is really performant by just following a few simple rules. Most notably, keeping type stability. Libraries are now catching up. Some are great, some are unique, others are still missing.

I have generated SIMD code that outperforms C and C++, even after some hand optimizations. As a matter of fact, some people are re-writing a BLAS drop-in replacement in pure Julia, and the performance is not going to be too far off from BLAS [1]. If you work in scientific computing, you can probably appreciate how amazing that is.

The combination of multiple dispatch, a simple type system and an LLVM backend is wonderful. In the long run, Julia will probably grab lots of Python's and R's marketshare because having a single language instead of two languages (Python + Cython/Pythran/... or R + C++) is a huge advantage.

[1] https://discourse.julialang.org/t/we-can-write-an-optimized-...

krapht · on June 15, 2020

Maybe Julia will become more popular, but I'd bet on the 800 pound gorilla Python getting faster through broader use of Dask, Cython, and Numba. I'll never bet against the network effects of language and library development.

cbkeller · on June 15, 2020

People often talk about speed as an advantage of Julia (which is true), but the real secret advantage compared to any other language I've worked with is more of a network effect: composability

I didn't appreciate the importance of this (or the degree to which this just works as a result of multiple dispatch), but there's a good talk about this from JuliaCon 2019 [1] by Stefan Karpinski

[1] https://www.youtube.com/watch?v=kc9HwsxE1OY

KenoFischer · on June 15, 2020

One aspect that's underappreciated is that composability can actually lead to speed. If it's super easy to compose your algorithmic package with speed-improvement packages (parallelism, hardware accelerators, custom compilers), you're much more likely to be actually able to use them.

contravariant · on June 15, 2020

Pretty sure you can call Julia code from python, and vice-versa.

ta17711771 · on June 16, 2020

That's why Go will supersede Python. All the new cool and secure small shit is being done in Go.

neutronicus · on June 16, 2020

Go is not gonna replace SciPy

No operator overloading, and scientists ain't gonna look twice at anything without that

dunefox · on June 16, 2020

Please no.

ta17711771 · on June 16, 2020

Elaborate?

blurbleblurble · on June 15, 2020

Grassmann.jl is the library that made me take Julia seriously. It's an awesome library, and the author really gets one of Julia's most special and well integrated features, multiple dispatch.

Because the code is written in solid, idiomatic Julia, it's interoperable with tons of other libraries. I actually had success using this library with CuArrays.jl without any modifications. So yeah, you can use it to run geometric algebra computations on a GPU for algebras of dimension < 2^5 (in my case, I tried it with CGA3D).

blurbleblurble · on June 16, 2020

<= 2^5

plafl · on June 15, 2020

The same first impression for me. It's great people with more faith than me have pushed forward and actually I'm still waiting to make the jump. It was very similar with python for scientific computing in the beginning. When I wrote my master thesis in python in 2006 python was completely dwarfed by Matlab. I was more enthusiastic on those days trying new things and also had more time. It is yet to be seen if Julia displaces python but at least it has non zero chance of doing it.

peatmoss · on June 15, 2020

I’ve wanted Scheme / Racket to make a showing in places where R and Python have dominated the data science and numerical computing spaces (for those of us not writing C/C++/Fortran).

But, I think Julia is the closest thing to an honest-to-goodness lisp that has a prayer of mainstream adoption. I quite like Julia and think the community has leveraged the meta programming strengths of the language well enough to convince me to relinquish my love of minimalist lisp syntax.

I do wonder if the biggest risk to Julia isn’t that it’s not extreme / superlative in any dimension. If you want raw performance, Rust seems to be capturing the lion’s share of excitement among “new” languages. If you want meta programming, Julia still feels like a compromise compared to Scheme/Racket/Clojure/Common Lisp. If you want libraries, Python is still king. If you want stats libraries, R is still king. Julia is impressively good at all of those things... but it’s not good enough to be the most compelling of any of them.

Still, it makes me think that it may be time to dive in head first rather than dabble my toes as I've been doing for the past few years. I guess my main barrier at the moment is compiler / deployment story—it still seems hard to build a deployable binary that fits in, say, a Lambda layer as far as I can tell.

ddragon · on June 15, 2020

Even if Julia never ends up being the very best at anything particular, Python has shown that being "generally the second best language for everything" has even more value for most people. Leveraging 80% of the flexibility and composability of Lisp, 80% of the "code that looks like pseudo-code" aspect and overall easy to use from Python, 80% of the speed of the fastest static languages and eventually the library and tooling maturity to support all use cases, it can definitely become an even better second best language for everything.

improbable22 · on June 15, 2020

> it’s not extreme / superlative in any dimension. If you want raw performance ... good at all of those things... but

I guess the selling point is that being good at many things means you don't have to keep switching between all these languages, or glueing them together. Others have mentioned the composability benefits of this. But it also has a learning advantage, I think. You can incrementally learn to speed up the crucial piece of something. You can learn just enough metaprogramming to solve some problem you have.

ssivark · on June 15, 2020

Could somebody motivate the comparison being made? It is not apriori obvious what Grassman numbers have to do with linear solvers. Thanks.

spacedome · on June 15, 2020

They are solving the linear system Ax=b, written in Julia/Matlab usually as A\b, specifically for a fixed size 5x5 matrix. The Grassmann algebras can give a way of representing the matrix and vectors, and a comparison is being made to the representation (and solver) given by StaticArrays.jl, which just uses typical matrices and arrays but optimized for small sizes. No real motivation to use Grassmann algebras if your only concern is generally solving a linear system, other than this method seeming to be faster, which mostly (imo) points to the StaticArrays.jl method needing to be optimized.

Edit: This post also does not mention numerical stability of the Grassmann.jl method, which is a real concern in practice, even for such small systems.

deltasquared · on June 15, 2020

This is a pretty neat trick, they implemented exterior algebra see https://en.wikipedia.org/wiki/Exterior_algebra

You see this kind of math in textbooks focused on computer graphics.

contravariant · on June 15, 2020

Do you have a Physics background by any chance? Anyway, I believe the module might be named after the mathematician, not the number system called after the same mathematician.

Regardless the name Grassman is tied to exterior algebras (and therefore geometric algebras), which are implemented by this library and which can be used to solve matrix equations (although I couldn't recommend using the geometric algebra approach for higher dimensions, unless you can do so symbolically).

deltasquared · on June 16, 2020

I do have a bit of physics in my background. I read about Grassmann recently in a game programming book. Foundations of game engine development Volume 1 By Eric Lengyel ends with a nice explanation of Grassmann algebra.

The road to reality by Roger Penrose also has a treatment of Grassmann algebra in physics context.

I agree the library is likely a reference to the mathematician.

contravariant · on June 16, 2020

Interesting. Yeah it's nothing too important it's just that I've only ever seen physicists use Grassman numbers, whereas mathematicians tend to work with the exterior algebra directly (which works in much the same way except the wedge operator is used explicitly, IMHO this makes things clearer).

DreamScatter · on June 16, 2020

Grassmann algebra and exterior algebra are the same thing, regardless of whether a mathematician or physicist uses them.

contravariant · on June 16, 2020

Sure, but the Grassman numbers are a bit of an odd model of Grassman algebra with its own notation. They live in some anonymous vector space that is effectively defined in terms of the Grassman numbers, whereas mathematicians usually have some vector space they're interested in and construct an exterior algebra from its elements. The differences are minor but they're two clearly distinct approaches.

Like I said it's nothing terribly important but it's still an interesting difference.

deltasquared · on June 17, 2020

I agree, the subtle difference between the physics and math mindset is interesting. I wound up on the sidelines as a computer scientist who worked in a physics lab.

6gvONxR4sf7o · on June 16, 2020

There's a neat kind of linear algebra (multilinear algebra, technically) that doesn't just deal with vectors (think vectors and planes and volumes), but it's often computationally unweildy. This is a fast library for it that competes with simple matrix code in efficiency.