Power usage is indeed a better representation of GPU utilization during ML train...

Power usage is indeed a better representation of GPU utilization during ML training. It has the advantage of combining many important indirect signals that aren’t visible, and avoids many downfalls of compute usage, which can go to 100% even in all-reduce deadlocks, among other scenarios.