This is not true. I have written many libraries to improve on the performance of numpy for image processing applications. Sort backs many operations, such as unique, that result in painfully slow code.
I too have written a lot of extension code to speed up numpy code. It's often not even especially difficult since any code at that level tends to be very algorithmic and looks essentially the same if written in numpy or C++. Having control of the memory layout and algorithms can be a huge win.
Of course I don't _usually_ do this, but that's just be most of the code I write doesn't need it. But it's not at all some crazy sort of thing to do.