> Our implementation is up to 2x faster than optimized speculative decoding base...

		boltzmann-brain 2 days ago \| parent \| context \| favorite \| on: Speculative Speculative Decoding (SSD) > Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines what about per-FLOP?

		help