Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines

what about per-FLOP?

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: