Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to Fix Gimbal Lock in N-Dimensions (medium.com/swlh)
66 points by panic on Aug 29, 2020 | hide | past | favorite | 20 comments


A quick summary of the arguments:

Gimbal lock arises in graphics systems when you store the rotation state as 3 Euler angles:

   Internal State: AngleX, AngleY, AngleZ
   Update algorithm: AngleX + dx, AngleY + dy, AngleZ + dz
   Rotation = Rotationmatrix(AngleX) * Rotationmatrix(AngleY) * Rotationmatrix(AngleZ)
With this setup it is easy to come to a situation where AngleZ performs a rotation so that the X/Y rotation vectors coincide, and you lose one degree of freedom in the resulting rotation matrix.

How to fix this:

  Internal state: RotationMatrixOld
  Update: RotationMatrixOld * RotationMatrix(dx, dy, dz)
With this setup there is no chance of Gimbal lock, the dx/dy/dz matrix always retains its degrees of freedom.

Other notes:

* The above description applies irrespective of whether you use Quaternions or rotation matrices

* In higher dimension it's better to describe rotations by their rotation plane instead of the normal vector: in higher dimension there is no "normal vector". Rotation always works in ONE plane / two axis, so there are (N-2) "normal directions" in N dimensions.


So the thing that made Gimbal Lock click is a combination of two things: rotations are ordered, and euler angles (and fixed angles, its counterpart) are a combination of three different rotation motions.

Effectively, since you have XYZ, one of them is nested within the others. If you instead have a nested set of gimbals, like XYZX2Y2*Z2, then you avoid gimbal lock since you can always "escape" to the other gimbal.

Also, orientations are not rotations. Gimbal lock doesn't prevent you from finding a set of values that gives you an orientation, it prevents you from finding a good path from one orientation to another.

That you can't cover all rotations with three nested gimbals is a demonstration of the fact that rotations are a not obvious space. See also the Dirac belt trick.


> Gimbal lock is a common issue that arises in 3D rotation systems. Conventional wisdom says that you should represent your rotations as quaternions to avoid this problem

Does conventional wisdom really say specifically that? Because the first I heard back in the day was that constructing 3D matrix from axis+angle solves it.

I think the conventional wisdom is don't use euler angles, not that you must use quaternions


Quaternions have a couple of advantages over matricies for pure rotations:

1. They're smaller - 4 floats - whereas you've got a minimum of 9 floats for a 3x3 rotation matrix, or up to 16 floats (4x the size!) in some gamedev codebases with a single, general purpouse, 4x4 SSE optimized matrix.

2. They won't accumulate a skew or scale from floating point errors. At worst, you trivially renormalize a quaternion if it drifts too far out of scale, with only a few edge cases to worry about (0-length vectors, or nan/inf/denormal components.) While a matrix can be unskewed and unscaled with a few cross products and normalizations, there's more math and more edge cases to worry about.

You might transform those quaternions into temporary matricies to compose them with scale/translation/projection transformations for rendering, but you might not want to store your rotations as matricies.


That's not what I mean, quaternions may have their specific advantages. But saying they avoid gimbal lock as if that's their exclusive feature (even though other simple methods, almost anything that isn't euler angles, can also avoid it) while leaving out the other advantages of quaternions is unfair to other methods and at the same time discredits the actual unique quaternion advantages.


You're reading bolder claims into the quick contextualization in the preamble than I am.

If I saw an earlier edit of yours correctly, you were using "must" when the article uses the weaker "should". And the article itself later admits "nothing about this technique is unique to quaternions" and compares vs matricies, without framing it as a contradiction of suprise in light of the aforementioned "conventional wisdom".

I read the sentence in question as merely claiming:

1. Conventional wisdom says that you should represent your rotations as quaternions

I would lightly agree with this - granted, for more reasons than just solving gimbal lock (hence why I brought some of them up) - but they're the right default, admittedly sometimes with exceptions.

2. quaternions avoid this [the gimbal lock] problem

And I would lightly agree with this too - with the article pointing out the important caveat that you can't just construct 3 quaternions from your euler angles and assume that will fix your euler angle caused gimbal lock problems. You gotta use 'em right.

-----

We agree that none of the following are true, nor "conventional wisdom":

- quaternions must be used

- quaternions are the exclusive solution to gimbal lock

- the only distinguishing feature of quaternions is their ability to solve gimbal lock

But I don't think the article was claiming any of those things. Perhaps an exhaustive sidebar into the various differences and pros/cons of various rotation methods could have been useful? My two reasons weren't exhaustive either!

... on the other hand, it also might be reasonable to practice restraint - keeping the article focused on a single topic, generalization into N dimensions of solutions to gimbal lock specifically?

Non-exhaustive sidebar?

¯\_(ツ)_/¯


The article author's suggestion to use 4x4 matrices suffers from all these same problems. Using a pair of quaternions would have all the same advantages.


With quarternions, gimbal lock is impossible. Also the kinds of simple sums and matrix arithmetic that we use for quarternions are really cheap operations on a computer. They're so cheap, I wouldnt be surprised if axis-angle was just a syntactic sugar implemented on top of quarternions.


What's wrong with using a pair of quaternions, as described on Wikipedia? https://en.wikipedia.org/wiki/Rotations_in_4-dimensional_Euc...

AFAICT, this shouldn't gimbal lock.


The answer is in the article (whether it’s right or wrong I’m not opining):

> The answer is it’s not about what you use to represent your rotations (quaternions, matrices, or rotors), it’s about how you apply those rotations.


OK, so my question is what's wrong with breaking a rotation into left and right isoclinic parts? These parts commute, so this shouldn't gimbal lock.

Also, the article didn't mention this exact method. It's unclear what he's referring to.


It can be applied in a way that will gimbal lock, just like any other method of representing and applying rotations.

In this case if I understand isoclinic rotations right, what you would do every time you want to rotate your vector is construct this pair and multiply by the left and by the right to your vector.

Now what do you do when you're implementing the keyboard event to rotate by all 6 planes in 4D? Do you store 6 pairs of Quaternions, each pair changes when you press the corresponding key, and those are applied to your object every frame in a specific sequence?

If so, this will gimbal lock.

You can call this an "incorrect implemention", but Gimbal lock isn't a bug so much as a real phenomen that can be modelled with any rotation of representing rotations.


An orientation in 4D can be represented by a pair of quaternions. If you want to rotate (change your orientation) you would multiply one pair of quaternions (representing your current orientation) by another pair of quaternions (representing the rotation) componentwise, yielding a new orientation.

This will not gimbal lock.

Regarding using 4x4 matrices: This will result in unintended shears and stretches due to rounding errors. Renormalising the matrices will be complicated and expensive. Also, interpolation will be more complicated and expensive.


Everything you said is correct. We've described two ways to apply these rotations, one of which will gimbal lock.

And Quaternions of course have several advantages and you should use them for those reasons.

The goal of the article was simply to explain that avoiding gimbal lock isn't the main reason you should consider Quaternions, which most people commenting here seem to have already been aware of.


Alternatively, just use geometric algebra instead of quaternions.


Quaternions are just the special case of a rotor in 3 dimensions, so quaternions are the geometric algebra solution.


Thankfully it's buried at the end of the article:

> Beyond 4D, you could continue using NxN rotation matrices, but since GPUs don’t support that, I’d use rotors from Geometric Algebra to simplify implementation, which is how you generalize quaternions to higher dimensions.

All hail Clifford! https://arxiv.org/abs/1401.2371 for elegant and computationally efficient methods


This is maybe only somewhat related, but there's a problem I have been trying to solve for about a year now involving lost degrees of freedom that is similar to gimbal lock. Maybe the smart people reading this comment section would have some ideas.

It is frequently more efficient to perform energy minimizations of molecules in a basis of "internal coordinates", in which parameters such as the distance between bonded atoms, the angle formed by three sequentially-bonded atoms, and the dihedral angle[0] formed by four sequentially bonded atoms are used as a coordinate system for minimization. This coordinate system is a lot more complex than the more natural Cartesian coordinate system, but it's more efficient for large molecules because you can e.g. twist a molecule along some central bond without screwing up the bond distances on the far ends of the molecule. Implementing optimization using internal coordinates is not entirely trivial, as the manifold of physically-meaningful internal coordinates is not flat, and internal coordinates tend to be over-specified, but it is often worth implementing despite the complexity.

Anyway, one major problem with this approach is that if three sequentially-bonded atoms ever become collinear (which admittedly is not super common, but it's common enough that we have to consider it), the Jacobian of that angle relative to the Cartesian coordinate system becomes ill-defined (because it is physically impossible to increase the angle, and the angle will decrease if any of the atoms moves in any direction orthogonal to the line that the 3 atoms lie on). Worse, any dihedral angle involving those 3 atoms becomes ill-defined, and if a structure passes through a configuration where the 3 atoms are collinear, the dihedral angle will discontinuously change by 180 degrees.

This causes all sorts of problems during energy minimization. The standard way to "solve" this problem is to define "dummy atoms", fictitious sites that are only used to construct well-defined bending and dihedral angles. But to prevent the dummy atom from "drifting away" since no real forces are acting on it (and to counteract the creation of 3 physically meaningless degrees of freedom), it becomes necessary to add constraints. So now we've turned an unconstrained minimization in a non-euclidean coordinate system to a constrained minimization in a non-euclidean coordinate system. And it's very ambiguous where the best place to put this "dummy atom" is, or how to define its constraints. And what happens if 4 atoms become collinear? Moreover, this is usually done manually, using "chemical intuition", and I am interested in writing code that works automatically for high-throughput applications where it would be infeasible to manually define dummy atoms.

My hope was that there would be some way to augment the traditional definition of the dihedral angle to avoid this singularity/discontinuity, but my attempts at implementing this failed to improve upon anything. My current approach tries to automatically add these dummy atoms, but I find that this has a tendency to make optimization performance drastically worse (presumably due to the sudden addition of constraints).

This problem has been on the backburner for some time now, but I would like to engineer some sort of solution that is more consistent and performant than my current approach, while still being entirely automated, without needing any "chemical intuition".

[0] https://en.wikipedia.org/wiki/Dihedral_angle#In_stereochemis...


> if a structure passes through a configuration where the 3 atoms are collinear, the dihedral angle will discontinuously change by 180 degrees.

Here's a stab in the dark: Instead of using the dihedral angle $\phi$, why not use:

  - $\cos(2 \phi)$,

  - $\sin(2 \phi)$,

  - $\cos^2(\phi)$

  - $\sin^2(\phi)$

  - the pair $(\cos(2\phi), \sin(2\phi))$,

  - the homogeneous coordinate $[R\cos(2\phi) : R\sin(2\phi)]$ where $R \neq 0$.
A change in $\phi$ by 180 degrees won't have any effect then.


>"Generalizing to N-dimensions

To generalize this solution to higher dimensions we need two things:

A way to represent rotations. So far in 3D we’ve used 3x3 matrices and quaternions.

A way to describe a rotation around an arbitrary axis.

Side note: it’s more generalizable to talk about “rotating within a plane” instead of “rotating around an axis”. In 2D there is only one plane of rotation, the XY plane. In 3D there are 3 planes of rotation. In 4D there are 6 planes. In N-D there are (N choose 2) planes.

I ended up choosing 4x4 matrices to represent my rotations, because 4x4 matrix operations are supported on the GPU and so I could rotate the geometry in my vertex shader, in the same way I would in 3D."

PDS: I was reading this, and something clicked -- if we think about it, a Matrix (in mathematics) is a 2D structure, that is, it has rows and columns, or a set of x/y (horizontal/vertical) axes, if you prefer.

Yet, we're using that 2D surface -- to represent higher-dimensional surfaces, 3D, 4D, [n]D etc.

So this is sort of like how we can go from 2D to [n]D.

But wait, it gets better!

We can also go from 2D (on the low end) down to 1D!

How would this be accomplished?

Well, while mathematicians (and most others!) would prefer the 2D representation of a matrix, that 2D matrix could also be represented in 1D (straight line), as a series of numbers, that is, you write down the numbers in the first row or column, then you write down the numbers in the second row or column, and you finish this process, much like the raster beam of old CRT TV's traced out an image on the screen!

So that gets us down to 1D, for the data of an [n]D surface.

But wait, things get even better!

That's because on the low-end, we could (although mathematicians and other normal people would despise the transformation!), we can also transform this 1D line/linear representation to 0D!

Now, just what in the heck is 0D, you might ask?

0D is TIME.

In other words (and again, no mathematician or normal person for that matter!) would want to do this, but you could take that line, that linear sequence of numbers (or whatever mathematical expressions comprises that sequence), and display them sequentially, one at a time, with a given time interval for each.

Sort of like a powerpoint presentation, where each number becomes one frame.

Or like a cartoon flip-book, where each frame is displayed for a certain small unit of time!

Of course, those things are still sort of 2D -- 0D would be more like an electrical pulse travelling down a wire, as a signal, during a unit of time, much like the information for a pixel, a given part of the screen would, in an old CRT tube where the electron beam reaches that part of the screen at a very specific point in TIME. Well, that signal, travelling through that one dimensional path, but only existing during a very brief unit of TIME -- that would be the 0D representation.

So knowing all of this then, we could then go from 0D -- to [n]D for any set of data.

Although most of us would be more comfortable with the staple 2D matrix notation that we all know and love from mathematics...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: