Here is a repro using GCE's C4A Axion instances (c4A-highcpu-72). Seems to beat Graviton? Maybe the title of the thread can be updated to a larger number :) ? I used the largest instance to avoid noisy neighbor issues.
Oh nice! Axion C4A and Graviton4 use the same core (Neoverse V2), so the performance difference is due to factors like clock speed and power management.
I used a geometric mean to calculate the top-line "86 GB/s" for NEON pack/unpack; so that's 91 GB/s for the C4A repro. Probably going to leave the title unmodified.