This is not true. Lots of algorithms simply can't use 100% of the GPU even thoug...

defrost · on Aug 23, 2024

In remote sensing | computation physicas applications it's rare to have a single FFT to compute (whatever algorithm is chosen).

Hence the practice of stuffing many FFT's through GPU grids in parallel and working to max out the hardware usage in order to increase application throughput.

eg:

https://arxiv.org/pdf/1707.07263

https://ieeexplore.ieee.org/document/9835388

shaklee3 · on Aug 23, 2024

I don't mean a single fft. I mean the fft algorithms are inherently not going to use the GPU at 100% utilization by any metric.

mpreda · on Aug 23, 2024

Not so inherently IMO.

What I mean is: where did you take that from? I program FFTs on GPUs, and I see no reason for the "inherently can't reach 100% utilization by any metric".

lights0123 · on Aug 23, 2024

I interpret that comment as you're not going to be using every silicon block that the GPU provides, like video codecs and rasterizing. If you've maxed out compute without going over the power budget, for example, you'd likely still be able to decode video if the GPU has a separate block for it.

defrost · on Aug 24, 2024

I had a similar read .. I packed a lot of parallel FFT's and other processing into custom TI DSP cards but the DSP family chips were RISC and carried little 'baggage' - just fat fat 32 bit | 64 bit floating point pipelines with instruction sets optimised for modular ring indexing of scalar | vector operations.

Even then they ran @ 80% "by design" for expected hard real time usage .. they only went to 11 and dropped results in toast until they smoke tests and with operators that redlined limits (and got feedback to that effect).

shaklee3 · on Aug 23, 2024

I'd be curious to see how you can do it. Try launching an fft of any size and batches and see if you can hit 100%