It makes a lot more sense when programming it as a vector machine. The divergenc...

dragontamer · 2025-04-06T07:18:01 1743923881

> The divergence/convergence stuff is the cuda/simt model

Even on NVidia, you're "allowed" to diverge and converge. But its not efficient.

Optimal NVidia coding will force more convergence than divergence. That's innate to GPU architecture. Its more efficient to run 32-at-a-time per NVidia warp, than a diverged 8-at-a-time warp.

Yes, NVidia _CAN_ diverge and properly execute a subwarp of 8-at-a-time per clocktick... including with complex atomics and all that. But running a full 32-at-a-time warp is 400% the speed because its ALWAYS better to do more per clock tick than less-per-clock tick.