Arithmetic Instructions

Single-precision floats provide the best performance, and their use is highly encouraged.

The throughput of individual arithmetic operations on devices of compute capability 1.x is detailed in Section F.3 of the CUDA C Programming Guide, and the throughput of these operations on devices of compute capability 2.x is detailed in Section F.4 of the programming guide.