It depends, for SIMD float-> scalar floats it is fast as they operate on the same registers. If pulling out of lane 0 you don't even need to do anything(just a type cast). For other lanes you need a shuffle.
For SIMD integer to scalar integer, it has to move into separate register, so there is some short penalty(3 cycles iir).
For SIMD integer to scalar integer, it has to move into separate register, so there is some short penalty(3 cycles iir).