This is a good thing to know about, but bear in mind that these days it's often slower than just using whatever built in square root you have access to. Especially beware in languages where you need to do something more than a cast to get the raw bits out of the floating point number, oftentimes that alone is enough to kill any performance gains you might otherwise see. Profile your code before and after, and make sure you've got an easy way to switch back if you realize that you've gone and made things slower (or worse, too inaccurate for your use case).
If this was a universally effective optimization with no potential downside, it would already be built in to your standard math library...
This isn't a universally effective optimization. It's an exquisite optimization for when a little imprecision is tolerable. Micro-optimizations like this are rarely worth the increase in cognitive complexity of the code, but Quake had to do a lot of math in very little time on 1996 hardware, so I'll trust Carmack (whose programming skills are legendary if you aren't familiar with him).
It's an approximation, and thus does not comply with the IEEE floating-point standard. That is, numerical results of the operations should be correct up to the precision of the floating-point number. There's lots of optimizations possible with floating-point math if speed/energy-efficiency is prioritized over accuracy or standard compliance.
All of the IEEE floating point standard is an approximation :).
It's a cool hack but very mid-90s centric and not that all applicable today. In modern pipelined processors with segregated register files the cost of moving a piece of data from floating point registers to integer registers and back again -- along with pretty much all modern instruction sets (SSE, AltiVec, NEON) coming with instructions that give you a means to calculate a reciprocal square root (either directly or with an estimate + refine) -- means that such a trick is no longer practical.
I think we should be thinking again the floating-point implementations on CPUs. I guess hacks like this directly on hardware would double the battery life on mobile devices. IEEE level accuracy is rarely needed either in number precision or computation accuracy. I guess fast IEEE square root isn't either space or energy efficient on silicon.
I disagree -- the amount of silicon dedicated to this type of operation is miniscule compared to the caches on modern processors. Removing hardware floating point reciprocal square root isn't going to have any noticeable effect on power consumption.
This sort of floating-point optimization, while admittedly quite imprecise, actually does still see plenty of application in high performance, framerate-critical gaming. Especially games with require a robust physics package (my previous employment was at a now-defunct maker of hardware-accelerated physics products). For these gaming applications, or specifically for the graphics drivers they depend on, you're coding multi-threaded routines in proprietary instruction sets that run on a GPU, which is becoming more and more like a parallel vector processor.
Huh. The only really interesting part of the algorithm is the initial guess with the magic number, and he doesn't go into that part at all. Read Chris Lamont's paper if you care enough to want to understand it. The only downside to Lamont's paper is that it's a post mortem analysis rather than a pedagogical derivation. I worked out a step-by-step development of the algorithm last year, but I've never gotten around to writing it up in tutorial form.
Image text: Some engineer out there has solved P=NP and it's locked up in an electric eggbeater calibration routine. For every 0x5f375a86 we learn about, there are thousands we never see.
Every time I read the headline for a fast algorithm for inverse square root I think "Well, yeah, you just multiply the input by itself. Can't get much faster to undo a square root than that." Silly me.
My native language isn't English, and I made the very same mistake like 50 times already. Sometimes I still forget what inverse means in math. I would instantly understand "reciprocal" though, which is the name I learnt for this operation.
>Sometimes I still forget what inverse means in math.
I think your interpretation is actually the more correct one, here.
Squaring and square-rooting are the inverse of each other.
They are really talking about calculating the reciprocal of the square root of X. They are calling it the 'inverse square root', but thats just them abusing terminology, which people do all the time, rightly or wrongly.
But you weren't wrong to be initially confused by this abuse of terminology, and it doesn't have anything to do with your native language not being English.
I thought the same thing. I guess the confusion comes from thinking of the inverse of the function square root instead of the multiplicative inverse of the value of the square root function. Technically I suppose the terminology is a bit hand-wavy, since you could have also been referring to the additive inverse, but I think it's fairly common to talk about "inverses" of members of a field and assume that people know you mean multiplicative inverses. But that's exactly what isn't a safe assumption here, since you could be thinking about a function (which seems like the more natural interpretation), as opposed to a number.
Came across something similar a couple of days ago while browsing the source code of Marathon (one of the early games of Jason Jones and Bungie, later of Halo fame). At the time, Bungie were the Mac equivalent of id software, and Jason Jones the equivalent of John Carmack.
Which came first? Quake 3 or the papers? This defines how interesting it is. I've seen a lot of folks working through Quake's code lately. Hard to tell if they are originators or appliers, but either way it's much more interesting than reading bland examples from a book.
If this was a universally effective optimization with no potential downside, it would already be built in to your standard math library...