This story talks about MLX and Ollama but doesn't mention LM Studio - https://lmstudio.ai/
LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models
I suspect Ollama is at least partly moving away open source as they look to raise capitol, when they released their replacement desktop app they did so as closed source. You're absolutely right that people should be using llama.cpp - not only is it truly open source but it's significantly faster, has better model support, many more features, better maintained and the development community is far more active.
Only issue I have found with llama.cpp is trying to get it working with my amd GPU. Ollama almost works out of the box, in docker and directly on my Linux box.
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.
LMStudio? No, it's the easiest way to run am LLM locally that I've seen to the point where I've stopped looking at other alternatives.
It's cross-platform (Win/Mac/Linux), detects the most appropriate GPU in your system and tells you whether the model you want to download will run within it's RAM footprint.
It lets you set up a local server that you can access through API calls as if you were remotely connected to an online service.
The tradeoff is a somewhat higher learning curve, since you need to manually browse the model library and choose the model/quantization that best fit your workflow and hardware. OTOH, it's also open-source unlike LMStudio which is proprietary.
Just use llama.cpp. Ollama tried to force their custom API (not the openai standard), they obscure the downloaded models making them a pain to use with other implementations, blatantly used llama.cpp as a thin wrapper without communicating it properly and now has to differentiate somehow to start making money.
If you've ever used a terminal, use llama.cpp. You can also directly run models from llama.cpp afaik.
Yes, I wanted to try it already but setting up an environment with an MI50 was a bit tricky so I wanted to try something I knew first. Now that I have ollama running I will give llama.cpp a shot.
Ooh, I have experience with it. If you're on linux, just use Vulkan. If you face any other issues, just google my username + "MI50 32GB vbios reddit". It depends on which vBIOS you have, but that post on reddit has most of the info you may need. Good luck!
Most LLM sites are now offering free plans, and they are usually better than what you can run locally, So I think people are running local models for privacy 99% of the time
LM Studio can run both MLX and GGUF models but does so from an Ollama style (but more full-featured) macOS GUI. They also have a very actively maintained model catalog at https://lmstudio.ai/models