This is a very nice article, I'm just getting confused by the prefix trie part. I'd be happy if someone could explain. The text says that the structure only has eight pointers per node and fills one cache line exactly.
But looking at the code, the struct is larger than 64 bytes so it won't fit into one cache line. Even if we just look at the first 64 bytes of the struct (the array), the nodes are allocated using plain calloc which (on your typical linux system, but not guaranteeed) is going to give you memory aligned to 16 bytes.
Author here. Yes, you're right -- the whole structure is larger than 64 bytes, but the pointers to the child nodes all fit in 8 pointers only (in contrast with the previous version, which had 256 pointers instead). Maybe I need to adjust the wording to make this clear.
But looking at the code, the struct is larger than 64 bytes so it won't fit into one cache line. Even if we just look at the first 64 bytes of the struct (the array), the nodes are allocated using plain calloc which (on your typical linux system, but not guaranteeed) is going to give you memory aligned to 16 bytes.