It also happens to be one of the most easy and reliable ways I know of to install CUDA on your machine. Everything is handled through the artifact system so you don't have to mess with downloading it yourself and making sure you have the right versions and such.
(Before someone complains, you can also opt out of this and direct the library to a version you installed yourself)
Could you elaborate a bit on this? I know that a gigantic bunch of libraries are involved that is usually terrible to install but Julia does it for you in the equivalent of a python virtual-env. However, aren't there also Linux kernel components that are necessary? How are those installed?
Yes, there's a kernel component which needs to be installed, but that's usually pretty easy these days, because it's usually one of
1) You're using a container-ish environment where the host kernel has the CUDA drivers installed anyway (but your base container image probably doesn't have the userspace libraries)
2) The kernel driver comes with your OS distribution, but the userspace libraries are outdated (userspace libraries here includes things like JIT compilers, which have lots of bugs and need frequent updates) or don't have some of the optional components that have restrictive redistribution clauses
3) Your sysadmin installed everything, but then helpfully moved the CUDA libraries into some obscure system specific directory where no software can find it.
4) You need to install the kernel driver yourself, so you find it on the NVIDIA website, but don't realize there's another 5 separate installers you need for all the optional libraries.
5) Maybe you have the NVIDIA-provided libraries, but then you need to figure out how to get the third-party libraries that depend on them installed. Given the variety of ways to install CUDA, this is a pretty hard problem to solve for other ecosystems.
In Julia, as long as you have the kernel driver, everything else will get automatically set up and installed for you. As a result, people are usually up and running with GPUs in a few minutes in Julia.
CUDA.jl installs all of the CUDA drivers and associated libraries like cudnn for you if you don't have them. Those are all vendered via the Yggdrasil system so that users don't have to deal with it.
CUDA.jl does not install the actual kernel driver, right? I do not really see how it can do that and the sibling comment does confirm that the kernel driver is not managed by Julia.
Yes, you would still need to the NVIDIA kernel driver (preferably the most current one). Desktop users typically have it already installed. But the main difficulty in my opinion is to install CUDA (with CuDNN,...). Even the TensorFlow documentation [0] is outdated in this regards as it covers only Ubuntu 18.04.
The installation process of CUDA.jl is really quite good and reliable. Per default it downloads it own version of CUDA and CuDNN, or you can use a system-wide CUDA installation by setting some environment variables [1].
I last experimented with CUDA.jl a year ago, and it was very useable then. This is a good reminder to re-evaluate the Julia deep learning ecosystem. If I were working for myself I would definitely try to do more with Julia (for machine learning). Realistically, python has such an established base that it will take some time to get orgs that are already all in on python to come over.
I think it's not dumb to target Greenfield users: just installing python gpu wheels is often difficult enough that several companies exist (indirectly) because it's so difficult to do right (e.g. selling a gpu PC with that stuff preinstalled)
I just finished setting up a new machine to run some Kaggle stuff. Both Tensorflow and PyTorch had issues with CUDA versions and dependencies that weren't immediately fixed with a clean virtualenv, while both Knet.jl and Flux.jl installed flawlessly.
For Pytorch, I had no issues with conda. But with Tensorflow from conda, the training process just hangs (consuming 100% of CPU but no GPU resources, despite my GPUs are recognized). I got more luck with installing Tensorflow with pip. Given the fact that Tensorflow documentation does not mention conda, I wondering how well this is supported.