Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because B-tree indexes are ordered, rows likely to be adjacent on disk (written in time order) are not at all adjacent in the index and vice-versa. There is no "hot page" in the cache representing recent records; the index node you need for any given uuid is random and makes your internal index pages effectively uncacheable. The result is increased IO and cache thrashing.


> Because B-tree indexes are ordered, rows likely to be adjacent on disk (written in time order) are not at all adjacent in the index and vice-versa.

But this still doesnt matter, right? If you want time ordering you'd prolly have some field like `created_at`


Rows are generally written to disk in the order they're inserted, whether you have a timestamp or not. It's just how database IO works.

If your index happens to be in a different order (the position in the index is uncorrelated to the position of the row), you're gonna be thrashing pages. This isn't unique to UUID keys - text keys or anything other than current timestamp or serial id are going to have the same problem.

It's really only an issue once your index size exceeds available shared RAM. If you can fit your table's index completely in memory, you probably won't notice. But once you exceed that threshold, performance starts falling off a cliff.


Yes and no. Now if your index fits in the L2 cache, then you've got something. Otherwise, you have to keep loading the cache from RAM and performance starts falling off a cliff.


That is a great point! This would be fun to benchmark


Consider inserting 1000 rows.

If those rows have keys that sort near each other, you're changing a few pages.

If those rows have keys that are all over the keyspace, you're changing roughly 1000*k pages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: