Lazy munmap() is definitely possible, there just needs to be more kernel to userspace bookkeeping. The weirdness you reference would only be visible in a use after free scenario, which is a bug, and therefore the behavior is undefined anyway.
But, reallocation is simplified with a synchronous munmap()—the memory allocator knows (through shared memory communication channels) that the virtual pages can be safely reallocated once the call returns; with a lazy approach some other mechanism needs to inform the allocator when all cores have been flushed (and thus it’s safe to make a new mapping), or else do a shootdown in mmap(). I think Solaris might have done something similar.
It’s safe to take a TLB miss on a remapping, but it’s not safe to reallocate memory and then inadvertently use an old, cached mapping. The synchronous design assumes that allocations are in the critical path and should be fast, but deallocations are not and can be slow. I think the original logic also assumed that virtual address space was scarce. These days I think a lazy unmap is probably worth it as a way of encouraging more efficient reuse of physical memory. Virtual address space is now plentiful. Note that, for security, a physical page might still result in a synchronous shoot down if it’s needed by another process quickly enough.
Doesn't such a program today already observe all the weirdness of the page "disappearing" from different threads at different points in time?
What difference would there be with a lazy mmap(), except that the window during which these inconsistencies are observable would be longer?