https://youtu.be/_qaKkHuHYE0 (CppCon) A senior software engineer at Google tried to optimize tcmalloc by replacing a mutex with lockless MPMC queue. After many bugs and tears, the result is not statistically significant in production systems.
A fully lock free allocator would be a huge result by itself: you would be able to allocate from a signal handler, or having truly lock free algorithms that do not need custom allocators, or avoid deadlock prone code in tricky parts like dlclose...
But I guess this was only one part and malloc wouldn't be fully lockfree.
In any case the lockfree mallocs designs I have seen use NxN queues to shuffle buffers around, but I guess it would be unsuitable for a generic malloc.