Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"We achieve 19.8 GB/s prefix sum throughput—1.8x faster than a naive implementation and 2.6x faster than FastPFoR"

"FastPFoR is well-established in both industry and academia. However, on our target platform (Graviton4, SIMDe-compiled) it benchmarks at only ~7.7 GB/s, beneath a naive scalar loop at ~10.8 GB/s."

I thought the first bit was a typo but it was correct; the naive approach was faster than a "better" method. Another demonstration of how actually benchmarking on the target platform is important!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: