Hybrid locks are also bad for overall system performance by maximizing local application performance. There is a reason default lock implementations from OS don't spin even a little bit.
That depends on your workload. If you're making a game that's expected to use near 100% of system resources, or a real time service pinned to specific cores, your local application is the overall system.
And I think it'd a poor choice that causes worse system performance. Android's bionic doesn't spin, nor does Windows or Fuchsia. Avoiding the syscall overhead is generally detrimental to overall system performance especially when the CPU load is high.
This is nonsense. If the lock hasn't been acquired, you don't spin to begin with and if the lock has been acquired and the lock is being released shortly after, the spinning avoids a context switch. If the maximum number of retries has been reached, the thread was going to sleep anyway and starts scheduling the next thread (which was only delayed by the few attempted spins). This means in the worst case the next spin will only happen once all the other queued up threads have had their turn and that's assuming you're immediately running into another acquired lock.