GCC inline assembly looks incredibly cursed. Back in the day, the Borland tool suite (Turbo/Borland C++, Turbo Pascal) had inline assembly that looked more like the D compiler example above.
GCC does, however, know what to do with a .s file, so you can write your assembly routines outside your C(++) source and just compile them in like a C module, which is what I did last time I was hardcore slinging x86 opcodes.
It's the attempt to tell the host language what you're doing with the arguments that makes a real mess. Module scope is roughly the same as putting it in a separate file, i.e. less horrible. C++ has better string literal escaping options which would help.
It's very easy to get the constraints wrong and have the aggregate still "work", until unrelated changes months later perturb the register allocation slightly such that it no longer runs as hoped.
It's documented and usable but working with it is never a very good time.
The inline asm syntax was designed to match the Intel asm documentation.
A hidden feature of the D inline assembler is the compiler knows which registers are read/written, so there's no need for the programmer to explicitly specify it. The compiler will also figure out the right addressing mode for local variables, so the programmer is relieved of that, too.
Deriving the read/write behaviour from the instruction definition is so far superior to the gcc approach that I wonder how we ended up here. That is a very good call by the D toolchain.
On the other hand, this inline assembly meshes very well with the register allocator and instruction scheduler in modern compilers. So it's perfect for teaching the compiler about the few special purpose instructions it doesn't know about without compromising the performance of the rest of the code.
The one in the parent comment does not; it's basically a black box and the compiler has to save and restore the entire state of the function around it. Total performance killer unless you write large chunks of code in inline assembly (and then you're probably better off just using an assembly file).
You could also assemble the assembly using an assembler into a separate object file and export a symbol from there, and let the linker do the job, right?