Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There are still situations where you need a tree but if you do, try very hard to use a representation like a binary array heap where the child links are implicit. Or if you must use "pointers" consider allocating all the values in a contiguous block and storing the link structure separately as 32-bit or smaller integer indexes into the data array. This at least improves the memory overhead and locality.

There is an ironic side to that advice. If the node in the tree is larger than 8 bytes (ie 2 don't fit into a cache line), then pointers vs vector doesn't get you a win as either way you hit a different cache line when you access the next element. Stroustrup's example is a nice "gotcha", but the reality most of us don't store integers in a vector or tree, we store bigger things. That boils down to two cases: this thing is so big storing a couple of pointers along side them doesn't matter, or you store a pointer to the bigger thing in your tree which means you are using a pointer heavy data structure anyway.

OK, OK, I've cheated. I get what you are saying is using a linked list to store those pointers to the big object is always worse than storing them in an array. But there is a third option which is better than both: embed the pointers into the big object. If that pointer ends in the same cache line as whatever you are using to distinguish the objects the pointer heavy method will be faster than the non-pointer method, and that is an irony.

This pattern has been followed by hoary old C programmers like myself for decades. It's all over the Linux kernel for instance. Granted it ain't so easy to do in languages other than C. (Even rust lacks the necessary "I have a pointer to linked list node that I know is embedded in a object of type T, give me the pointer to the T object" operation because it's so hard to make type safe.) But just because most languages don't support it doesn't mean pointer heavy data structures are bad.



> but the reality most of us don't store integers in a vector or tree, we store bigger things.

I work in numerical computing where storing 64/32-bit floats is by far the most common use case. In this world, storing a lot of small, immutable values is the norm and using pointer-heavy data structures would be a disaster.

That's a good trick that I'll have to think about good ways to expose safely in a higher level language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: