Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a little bit silly, but zeroing a page is an extremely cheap operation (far cheaper than just about anything you are reasonably going do with the page once it's zeroed -- on the order of a hundred cycles is pretty typical these days). That said, yes, it is a cost, and it's not crazy to want to address it with HW.

FWIW, both powerpc and arm64 have a "just zero the damn cacheline" instruction. That's not quite what you want, but it is quite useful.



A hundred cycles to zero a page? If 4 KB can be written in a hundred cycles then, assuming a 3 GHz processor, that's 30 million pages per second or ~120 GB/s. That's pretty fast memory. On x86/x64 processors, which lack a zero-the-cacheline instruction, the memory will also end up being read, so you need 240 GB/s to clear pages that quickly. This ignores the cost of TLB misses, which require further memory accesses.


The zeros don't need to get pushed to memory immediately. They go to cache, where they will typically be overwritten with your real data long before they are pushed out to memory. That push of your real data would have needed to happen anyway, so there (usually) minimal extra cost associated with the zeroing.

There are, of course, pathological cases where you touch one byte on a new page, and then don't write anything else before the whole thing gets pushed out, but they are relatively rare in performance-critical contexts.


> but zeroing a page is an extremely cheap operation

Uh? Anything which touch main memory is NOT 'an extremely cheap operation': that's why we have registers, L1, L2 (even L3) caches!


The whole point of cache is that you don't need to go to main memory every time you do an operation. If you don't write anything else to the page, those zeros will be written out to main memory eventually, but if you weren't going to write anything, there was no need to allocate the page to start with. So the most common scenario is the zeros get written to L1, which is fast, and then your real data overwrites it in L1 before it's pushed out to lower levels of the cache or memory. The initial write may require a read-for-ownership from memory, depending on the exact cache semantics, but if that's required, it would be required for you to write your data anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: