It really depends on the specifics. If there's a lot of contention then performance will drop off a cliff. Even atomic instructions can become a bottleneck ( https://stackoverflow.com/q/2538070 ).
I think what you're observing, i.e. that in many cases just use a lock and don't worry about it (or your variation) is true. But there are certain applications/situations where you can do better. Having a consumer pull out everything from the queue with one locking operation and being careful with how you signal the consumer from the producer(s), assuming there's a signal, can also make the queue more efficient/have higher throughput (e.g. you shouldn't signal for every item you put in the queue, only when it becomes non-empty).
I think what you're observing, i.e. that in many cases just use a lock and don't worry about it (or your variation) is true. But there are certain applications/situations where you can do better. Having a consumer pull out everything from the queue with one locking operation and being careful with how you signal the consumer from the producer(s), assuming there's a signal, can also make the queue more efficient/have higher throughput (e.g. you shouldn't signal for every item you put in the queue, only when it becomes non-empty).