Your comment about compilers is most likely true. The exception being if he needs to vectorize or if he had the opportunity use these special-purpose instructions that are seemingly built to accelerate particular algorithms.
On x86 pretty much all of the special purpose and vector instructions are accessible from C or C++ through intrinsics. No need to drop down to assembly for that except perhaps for a very, very specialized use case.