You can run Models up to 128GB on a MacBook Pro Max. So we're already at a point where you can run all but the biggest frontier models on consumer hardware.
Yeah, I also think that the ~5k price is quite hefty. It's difficult for me to imagine that running sizeable LLMs on commodity/consumer hardware will be possible without another breakthrough in the field. The prices of GPUs I wouldn't expect to fall if technology proves its worthiness.
Massive increases in demand due to this stuff being really really useful can cause prices to go up even for existing chips (NVIDIA is basically printing money as they can sell all they can make at for as much money as the buyers can get from the investors). I have vague memories of something like this happening with RAM in the late 90s, but perhaps it was just Mac RAM because the Apple market was always its own weird oddity (the Performa 5200 I bought around then was also available in the second hand listings on one of the magazines for twice what I paid for it).
Likewise prices can go up from global trade wars, e.g. like Trump wants for profit and Biden wants specifically to limit access to compute because AI may be risky.
Likewise hot wars right where the chips are being made, say if North Korea starts fighting South Korea again, or if China goes for Taiwan.
I can imagine a world where "good enough" GPGPUs become embedded in common chipsets the same way "good enough" regular GPUs are embedded now, but we're definitely not there yet. That said, it was only a few years between the VooDoo cards coming to market and Intel integrated graphics showing up.
We already have something similar in terms of HW accelerators for AI workloads in recent CPU designs but that's not enough.
LLM inference workloads are bound by the compute power, sure, but that's not insurmountable IMO. Much bigger challenge is memory. Not even the bandwidth but just a sheer amount of RAM you need to just load the LLM weights.
Specifically, even a single H100 will hardly suffice to host a mid-sized LLM such as llama3.1-70B. And H100 is ~50k.
If that memory amount requirement is there to stay, and with current LLM transformer architecture it is, then what is really left as an only option for affordable consumer HW are only the smallest and least powerful LLMs. I can't imagine having a built-in GPGPU with 80G of on-die memory. IMHO.