Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And you won’t get there. Those models are far too large for a 2024 GPU. Llama-3 70b is arguably close to GPT-4 but is still too large for gaming GPUs (and probably for many years of GPU updates)


“You won’t get there” is a pretty vast statement for all of the future. Two fairly reasonable predictions: 1) the compute needed to get GPT4 performance will decrease. 2) the compute on consumer GPUs will increase.

At some point they cross, and you will be able to run a GPT4-quality LLM on a consumer GPU. At some point after that, you’ll be able to run a GPT4-quality LLM on a 2024 consumer GPU if you can find one.

Important to emphasize, I’m not saying “GPT-4”. Llama-3 was trained on 24k GPU clusters. “Able to do the exact same processing at 1/24k the compute” is different from “Able to get equivalent performance at 1/24k compute”. Even then, given a long enough time scale, the former is possible.


> 1) the compute needed to get GPT4 performance will decrease. 2) the compute on consumer GPUs will increase.

I’m assuming we’re just talking inference here…

Sure compute abilities for consumers will increase but the original comment had a fixed GPU - the 4090. I can already eke out LLama3:8b on my MacBook Air, and Apple will sell you a laptop capable of running the full sized LLama.

There is a direct correlation between parameters and “knowledge” for an LM. There’s some open questions as to density (LLaMa3 specifically challenged previous assumptions) but it seems implausible to fit an equivalent model as GPT4 into 24gb vram. Just like compression, you can’t shrink forever.

GPT-4 and GPT-2 are pretty similar architecturally (I assume). So if abilities don’t matter, we can already run GPT-2 so we’re basically there for 4.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: