There's a difference because audio processing is often "massively parallel", or ...

screcth · 2025-02-23T15:05:20 1740323120

Can you use the remaining SIMD lanes for processing independent data streams?

Think encoding or decoding non-overlapping parts of a video.

astrange · 2025-02-25T06:09:54 1740463794

So, you can't necessarily do that because video is compression, and compression means not predictable. (If it's predictable it's not compressed well enough.)

That means you have to stick to inside the current block. But there are some tricks; like for an IDCT there's a previous stage where you can rearrange the output memory elements for free, so you can shuffle things as you need to fit them into vectors.