RAM requirements stay the same. You need all 358B parameters loaded in memory, a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		l9o 56 days ago \| parent \| context \| favorite \| on: GLM-4.7: Advancing the Coding Capability RAM requirements stay the same. You need all 358B parameters loaded in memory, as which experts activate depends on each token dynamically. The benefit is compute: only ~32B params participate per forward pass, so you get much faster tok/s than a dense 358B would give you.

atq2119 56 days ago [–]

The benefit is also RAM bandwidth. That probably adds to the confusion, but it matters a lot for decode. But yes, RAM capacity requirements stay the same.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact