Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the suggestion was to multiply the powers of primes. e.g.

fn 2 @ v1, fn 3 @ v1 = 2^1 * 3^1 = 6

fn 2 @ v2, fn 3 @ v1, fn 5 @ v3 = 2^2 * 3^1 * 5^3 = 2 * 2 * 3 * 5 * 5 * 5 = 1500



I guess that might be useful if the vast majority of things are on version 0. But it quickly get's out of hand.

EX: Multiplying by 2 adds a full bit to your number every time you increment it. And it just get's worse from there.

If you really want to store mostly zeros your probably better off just using a count of the functions > 0 and then either use a bit for each function to show it's > 0 or an index if things are sparse.


It's certainly true that it has a nasty growth curve (exponential in version number; probably worse in number of active terms).

I just think it's fun for low value things, and interesting if you're in the situation you have a huge feature set relative to the number of active features, since it incurs no penalty for fields that are 0. Any time you have about ~1% features being present, it can make a difference. Example: if you have about 5 of 500 fields with 16 possible values represented, storing all the 0s is expensive. Using a naive 4-bits-per-field, you end up with 2000 bits, whereas, using my method, you only use ~620. Even using a 500bit number to flag which fields are present and then listing just those fields in order only saves you about ~100 bits over my method.

Plus, I manage to spark off a discussion about math and the problem, rather than just whining about it being hard, which I consider to be a good thing.


What's wrong with indexes? 1024 = 10 bits * 5 = 50bits then 4 bits * 5 = 70 bits total.

Or just a bit flag for active (500bits) then 4 bits * 5 for your count = 520 bits.

And if you really want to compress things start with a count so the 500 bit flag is only used if your count is over 50.

PS: in your case you also need an index of number if bits / bytes used.

Edit: If there is a reasonable chance your counts are greater than 1 just appending one bit per count vs using higher exponents saves space aka for 3 numbers append 100101 = first 1 then 3 then 2.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: