Yes, it was using mnesia as the storage layer, and if I had a few dozen queues with a few hundred messages each, it caused timeouts in some clients (celery/kombu is an example).
I decided to add expiry policies to each queue so that the system cleans itself from stale messages and that fixed all the message dropping issues.
4.0 Changelogs state that they are switching to a new k/v storage (switching from experimental to default)
Yep, similar symptoms. (OpenStack's services are also written in Python, or at least were back then, so probably similar to Celery.) We had regular problems with RMQ restarting. (Unfortunately I can't recall if it was for OOM or just some BEAM timeout.)
A few hundred messages in a few dozen queues seem ... inconsequential. I mean whatever on-disk / in-memory data structure mnesia has should be able to handle ~100K stale messages ... but, well, of course there's a reason they switched to a new storage component :)
I decided to add expiry policies to each queue so that the system cleans itself from stale messages and that fixed all the message dropping issues.
4.0 Changelogs state that they are switching to a new k/v storage (switching from experimental to default)