Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AT&T Syntax versus Intel Syntax (2001) (mcgill.ca)
133 points by susam on Nov 13, 2022 | hide | past | favorite | 105 comments


Here's the why for the curious. It's just a historical quirk.

It's "AT&T syntax" because it dates to the 1978 AT&T Labs effort to port UNIX to the 8086. [1] While the 8086 did not have virtual memory or hardware protection, its memory segmentation model was still adequate to support UNIX. It was the first microprocessor practically capable of running UNIX, and this was realized before the chip was even released. The porting effort started immediately. (Though most of the energy would soon switch to the 68000 when that was released a year later.)

The AT&T folks did not wait for Intel's assembler. (Written in Fortran, to run on mainframes, or on Intel's development systems). Nor did they closely model their assembler after it. They just took the assembler they already had for the PDP-11 and adapted it with minimal changes for the 8086. Quick and dirty. Which was okay. You're not supposed to write assembly on UNIX systems, anyway. Only the poor people who had to write kernel drivers and compilers would ever have to deal with it.

[1] https://www.bell-labs.com/usr/dmr/www/otherports/newp.pdf (see section III)


It's funny how many things come down to 'UNIX hackers did it this way when they had to work with a PC'


I think there's a bit more to the story. It was before my time, but as I understand it the most widely used Unix for the 8086 was XENIX (initially a Microsoft product, later sold to SCO), which used Intel-syntax MASM as its assembler.

XENIX for 386 was based on AT&T System V/386, which introduced the AT&T syntax to 32-bit x86. I've found some references to 32-bit XENIX still using an assembler called "masm" but I don't know if it was still based on Microsoft's MASM or just called that for compatibility, or whether it was AT&T or Intel syntax. Also by that point compilers and assemblers weren't included in the base OS anymore, but a "development kit" sold separately.

The Minix compiler and assembler also used Intel syntax.


Replying to my own comment since I was curious - it was certainly Microsoft MASM, and cc was the Microsoft C Compiler as well. [1] So probably Intel syntax.

There was also something called "SCO UNIX" which was apparently more of a straight adaptation of AT&T System V and might have used a pcc compiler and AT&T syntax. It's hard to figure out what happened when because there are a zillion different versions of Xenix/SCO Unix/OpenServer out there.

[1] https://www.scosales.com/ta/kb/100187.html


I cannot say much about which Assembler Xenix used, but in 1992 it was clearly still K&R C, the version I had access to was either lacking X installation or pure text based terminals.

It was so expensive in Escudos, that for teaching an high school class about UNIX, the teacher would bring a tower (286 or early 386 model) that would be timeshared with the whole class, meaning taking 15 minute slots seating at it, while having prepared the exercises, as much as we could, in Turbo C 2.0 on MS-DOS.

It definitly wasn't SCO UNIX though.


Recently having finished The Cuckoo's Egg by Cliff Stole, the book was the first thing I thought of when seeing this article. At the time he describes how the usage of one OS and/or syntax over the other even suggested one's geographical region.


IMHO the most confusing part is that AT&T/GAS syntax inverts the comparisons, which are otherwise natural in Intel syntax:

    cmp eax, ebx   ; eax ? ebx
    jg foo         ; jump if eax > ebx
Related: http://x86asm.net/articles/what-i-dislike-about-gas/


Also the % before register names is completely unnecessary, it just another extra character to type.


It disambiguates labels from registers, assuming of course you allow labels to have register names. eg this is valid:

        mov rax,%rax                                                            
  rax:  .ascii "hello\0"
Stupid perhaps, but valid.


Probably an artifact of the original use of the assembler being to process compiler output, and causing variable names to appear.

One of the x86 assemblers I remember trying out in the late 80s had a slight variant of Intel syntax in that everything that wasn't a reserved word/mnemonic was automatically assumed to be a label.


I wonder if it would have been more ergonomic to have the labels be % prefixed instead of registers.


The above-linked article specifically points out that registers are referenced much more often than labels.


Yep, the "nice" thing about AT&T is that, from a parsing perspective, you don't really need to know much about the target architecture to be able to determine what's an instruction vs. a register vs. a label, etc.


In grade school, they showed us a movie where researchers attached special glasses to a volunteer, and the volunteer had to wear them 24/7 or a mask for sleeping. The glasses turned everything upside down.

After a few days, suddenly everything was right side up for the volunteer. Until he took the glasses off ...

The movie ended with a warning to never try this yourself.

Anyhow, I'm reminded of that movie every time I have to switch between AT&T vs Intel syntax. My brain just hurts, and the asm code I write is all wrong.

So, the dmd D compiler's inline assembler syntax is Intel regardless of the OS, and the builtin disassembler is Intel syntax.


> glasses turned everything upside down

https://www.madsci.org/posts/archives/mar97/858984531.Ns.r.h...

"""

[...] first investigated by George Stratton in the 1890 s

[...] he ran another experiment where he wore it for eight days in a row. On the fourth day, things seemed to be upright rather than inverted. On the fifth day, he was able to walk around his house fairly normally but he found that if he looked at objects very carefully, they again seemed to be inverted. On the whole, Stratton reported that his environment never really felt normal especially his body parts, although it was difficult to describe exactly how he felt. He also found that after removing the reversing lenses, it took several hours for his vision to return to normal.

[...]

"""


Thanks for the references. I saw the black&white film around 1967.


You're welcome. I had heard the story before, but couldn't remember the attributed date, got curious and looked it up.


I love that dmd follows the home computer school of having inline Assembly done properly, instead of mini-languages stuffed inside strings.


Interesting bit of trivia, Prof. Ratzer, the author of this piece, was the first graduate student in computing at McGill University [1] and one of the founding members of the School of Computer Science [2], which just recently celebrated its 50th anniversary. [3]

[1]: https://en.wikipedia.org/wiki/McGill_University_School_of_Co...

[2]: https://www.cs.mcgill.ca/~ratzer/backup/welcome.html

[3]: https://mcgill.imodules.com/controls/email_marketing/view_in...


I'm in the Intel syntax camp, pretty thoroughly. I use other assemblers, too, and the Intel syntax is a lot more similar to ARM, RISC-V, and assemblies used by DSPs (which are surprisingly C-like). The order of operations, the order of comparisons, the addressing syntax, and the lack of spurious wingdings characters all make it easier to read and write.

The one thing AT&T syntax has going for it is the "strong typing" of operand widths: `addl` is slightly more readable than `add` + operand-inferred width.

If you want the C preprocessor, you can have it with Intel syntax too (gas can use either), but I think the preprocessing syntax designed for assembly is cleaner with assembly code, and some of those preprocessors are surprisingly powerful.


In a dumb idea to depend only on GAS, before it had good support for Intel syntax, I ported some code from Intel syntax into AT&T many moons ago, quite dumb idea, if i was doing it today I would have listed nasm or yasm as a requirement and be done with it.


I'm pretty sure all this predates the existence of gas


Only slightly. System V/386 was 1987, first release of GAS was 1989 (and GCC got 386 support in 1988)


Intel format was pre-unix (probably 8080) ... and AT&T of course was from the late 70s, originally from PDPs


I am pretty sure of its status about 25 years ago.


I never liked the syntax used by gas. It feels like something intended to be part of a C compiler, not like something you'd use for the joy of assembly language.


I initially learned the Intel syntax, and preferred it for awhile. But the more I work with non-x86 CPUs, the more I prefer AT&T just because it feels less different than everything else.


Much agreed. Intel syntax just seems somewhat alien.


That's interesting. My entire world revolves around x86 and ARM, so "Intel" syntax (which to me mostly means op dest, src) is what seems normal to me.


Order of dst and src i have no strong feelings about, it's all the rest that i find weird about intel syntax.


dest = src

That's how I think of it.

ALSO: I really do not like the MOV instruction; I much prefer LD. The Z-80 instruction names got everything mostly right.


I started with Z80, and later moved to X86. To me intel order is the correct order, and everything else is weird and wrong!


Intel syntax is more similar to ARM, MIPS, and even RISC-V's official syntax than AT&T.


But all the Intel instruction documentation unsurprisingly uses the Intel syntax!


A point in favor of the Intel syntax is that it is, well, Intel's. It's much easier to use listings published by Intel, with only a few changes to the macros and label usage needed.


Hmm. "AT&T"? I thought it at least came from DEC via the PDP-11,7 assembler.

And I'm guessing Intel didn't invent doing it backwards in their own either.


Wikipedia confirms it.

https://en.wikipedia.org/wiki/X86_assembly_language#Syntax

"The AT&T syntax is nearly universal to all other architectures with the same mov order; it was originally a syntax for PDP-11 assembly. The Intel syntax is specific to the x86 architecture, and is the one used in the x86 platform's documentation."

https://en.wikipedia.org/wiki/As_(Unix)

"As of November 1971, an assembler invoked as as was available for Unix. Implemented by Bell Labs staff, it was based upon the Digital Equipment Corporation's PAL-11R assembler."


It's the AT&T syntax because AT&T are the one who unleashed that on the world against the wishes of everyone. See the sibling comment:

> The AT&T folks did not even wait for Intel's assembler [...] Nor did they closely model their assembler after it. They just took the assembler they already had for the PDP-11 and adapted it with minimal changes for the 8086.


Wow, is your bias showing. "Against the wishes of everyone"? How about the wishes of those who wanted an x86 assembler before Intel got around to releasing theirs? And who wanted to be able to not have to switch their brains into a backward syntax when working on an x86?


Probably "against the wishes of assembly language programmers who were familiar with microprocessors like the 8080 and Z-80 - and those who would have preferred to standardize on the official intel syntax rather than deal with two different and incompatible assembly language syntaxes."


So, reading the comments here I got the impression that the 8086/8088 order was backwards from everything else before it. Did the 8080 and Z-80 have the same order as the 8086?

But even then, you have worlds converging. The 8086 was (barely) capable of running Unix, whereas the Z-80 definitely was not. Was the 8086 a part of the PDP-11 minicomputer world, or was it part of the 8080/Z-80 world? Well, it was both. Complaining that one world produced an assembler before the other did is... a bit exclusivist. Also whiny.


The 8086 was (barely) capable of running Unix, whereas the Z-80 definitely was not.

https://github.com/chettrick/uzics

You're wrong ;-)

Neither the 8086 nor the Z80 have virtual memory. The x86 has a larger instruction set and address space (1MB) but as it turns out, early Unix kernels didn't need that much RAM either.


AT&T as in Bell Labs in the work that would become Unix.


Similar thing also happens on 68k: Motorola syntax v.s. "MIT" syntax which is probably only used by GNU toolchain


Practically, 68k is far more usable in AT&T syntax than x86. When I used to do PalmPilot development, you could basically write standard 68k asm with just some extra %s sprinkled before registers and as would be fine with it. The x86 AT&T syntax is far more alien compared to the syntax in the official manuals, with arguments backward and nonstandard instruction names like addl and movabsq.


I never understood the need for names for movabsq. The top comment on this HN discussion is about how this is all a historical quirk, people wanting to get a job done quick and dirty. But movabsq is a new instruction introduced only with x86-64 (then known as AMD64). With the 64-bit transition why didn't they take the opportunity to clean up and reorganize this mess?


On Linux, is there a way to convert an assembly language file from one syntax to the other?

I know that there are ways to ask GCC to emit one syntax or the other, as well as ways to assemble code in either syntax. However I don't know any program that just translates one to the other.


Assemble and disassemble? While that will of course lose your labels etc. :-D


I like AT&T more, because it have no of that "dword ptr" nonsense. AT&T more logical when it comes to differentiating between an immediate value and an address of a value. Everything else can be done both ways and do not matter much, but using dollar sign or not using it to distinguish immediates from addresses is a really nice touch.


AT&T vs. Intel syntax has caused many arguments in some of my circles. AT&T is an atrocity and if you disagree... you're wrong. :)


  op src dest 
is the logical order, the rest of the syntax I don’t care about much.


I used to think this was more intuitive, but after using both for a while I came to the conclusion that putting the destination first is much more practical, because my eyes can scan the left column to quickly find where a register was last written to. If the destination is last, it doesn’t appear in a consistent location horizontally.


Be that as it may, the AT&T indirect memory reference `section:disp(base, index,scale)` is an abomination unto God.

At least the Intel one makes actual mathematical sense: `section:[base + index*scale + disp]`


That “logical” order is only because you’re trying to read it like a sentence (“mov/add ebx into eax”) when you should be reading it like a formula or what it actually is - code. And that’s fine, but considering Intel created the chip, it makes sense that they should decide how the assembly syntax should be, not AT&T.

The only reason “AT&T syntax” exists for x86 is because people working at AT&T refused to use Intel as the authoritative reference on the syntax, and, instead, decided to follow the convention of the PDP, Motorola, etc. family and friends. Hence why `as` (and subsequently `gas`) have that as the default.


In mathematics variable = value, variable receives value, ergo op dest src.

That is logic.


...math has no such order. Half the point of algebra is that the two sides of the equation are semantically equivalent and can be swapped at will.


And it has no state. Half the point of computation is maintaining state for the purpose of efficiency. So, computation gets an "assignment" operator where pure math lacks one, pure math being relegated to subscripts of time and indicator functions instead.


Good luck convincing anyone that value = variable makes sense.


My math teachers had no problem with '42 = x' vs 'x = 42' as long as the steps made sense to get there. In fact they'd probably comment that there was no need to go with 'x = 42' if I obviously took a circuitous route simply to end up with x on the left side of the equation, as that would have demonstrated a lack of internalizing some of the base ideas of algebra and it's approach of equation symmetry.


You should distinguish the meaning of = in mathematical and meta-mathematical statements. “42y = yx, therefore 42 = x” is fine; but “let 42 = x” is not.


Some older C coding styles recommend that order because before compilers added warnings, “if (17 = variable)” resulted in a diagnostic, while “if (variable = 17)” would not. Nowadays, I think most programmers prefer putting the fastest-changing expression first.


Yeah that is due to clever C semantics that that should never had allowed for anything other than boolean expressions in conditionals like proper grown languages.


Indeed. Besides, “let there be light” is clearly destination-first, so this matter was already resolved by Genesis 1:3.


Which direction was the Hebrew text written in?


That doesn’t follow. variable = value, ergo dest op src.


That is just an arbitrary notation, nothing to do with logic. In English at least, "add 4 to x" is more natural than "add to x 4".

Ergo op src dest


But English "add 4 to x" tells you nothing about where to store the result. : - )

"Add 4 to x and store result in x" vs "Let x be x + 4"?


Without any other context, you generally assume that the target is an accumulator.

So "add 4 coins to that bucket", generally means at the end of that operation the bucket contains at least four coins plus any coins that were already in the bucket.


Huh, I think my brain is not wired to think about `x` as a storage, it is closer to "add 3 to x for x = 2", which then gets reduced to "add 3 to 2" and then the destination is missing...


In mathematics variable = value ≡ value = variable


That's just a convention. The way we would order the sentence in English is also just a convention.

You know what the logical convention to use would have been? The order that every assembler on the planet already used.


Postfix , infix, prefix, crucifix.


That is extremely confusing for comparisons, which are effectively a subtraction.


I’m talking about moves and copies.


It's even more confusing if moves have a different operation order from arithmetic ops and comparisons, no?


No, each should have what makes sense, whether analogous to math or a shell command.


This implies that dst<-src "doesn't make sense"; but then why does practically every programming language use this ordering?


A minor historical mistake/misdesign around the equals sign. Looks like algebra but fundamentally different.


TFA says "The `source, dest' convention is maintained for compatibility with previous Unix assemblers."


Sketchy part is when operations read and write with the same argument. Op first is nice for that as it becomes op arg0 arg1.

I quite like the SSA style which tends to be dst0 dst1 opcode src0 src1 but that doesn't model assembly brilliantly. Perhaps that order with read-write arguments required to appear on both sides of opcode with the same symbol has some merit.


Which is my all memcpy-like functions in C/C++ standard library take arguments in dst, src order.


They are using a classic dollar sign for assignment, so clearly at&t is better.


> They are using a classic dollar sign for assignment

They're using a dollar sign for immediates.

As if you can't notice that it's a number.

    addl $4, %eax
that's 3 different completely unnecessary symbols for things which are not ambiguous in the first place:

- the operation width is provided by the registry

- a number is an immediate

- a register is named

Hence the much less noisy Intel syntax:

    add eax, 4


addl is not necessary if it is not ambiguous

    add $4, %eax
- is fine

Compare with

    add 4, %eax
The "less noisy" Intel syntax becomes:

    add eax, DWORD PTR [4]


Most assemblers for Intel syntax will let you write:

    add eax, [4]
if you desire. Indeed, many disassemblers will follow suit in unambiguous cases. IDA, for example, does this.


The only time “DWORD PTR” is required is when (1) you're working with old assemblers, or (2) you're using a memory operand with an immediate:

    add eax, [4]       ; inferred
    add [eax], 4       ; ambiguous
    add DWORD [eax], 4 ; explicit
A disassembler may output it when not necessary, however.


The comma is unnecessary too, so that’s 4 unneeded symbols.


Contemplate the AT&T vs Intel syntax for x86 addressing modes and say that again.



They are an atrocity in AT&T syntax. They are perfectly readable in Intel format. For example:

https://blog.yossarian.net/2020/06/13/How-x86_64-addresses-m...


That is an excellent write up, thanks.


AT&T syntax is the most elite syntax. I've used it to write some famous hacks, like Actually Portable Executable, which is a 16-bit BIOS bootloader shell script ELF / Mach-O / PE executable. People dislike it because writing assembly Bell Labs style requires great responsibility. What makes AT&T syntax so powerful is its tight cooperation with the linker. I don't think it would have been possible for me to invent APE had I been using tools like NASM and YASM and FASM because PC assemblers were traditionally written as isolated tools that weren't able to take a holistic view with linker scripts and the C preprocessor. https://raw.githubusercontent.com/jart/cosmopolitan/master/a...


I’m sorry, no. This is incredibly misleading. Even if the linker step and assembly are completely separated, everything you’ve built in Cosmopolitan and APE is 100% buildable in other tools. It might take more effort in some cases, but there’s more to a native stack than choice of tooling and syntax; if you genuinely know your stack and architecture, anything is possible.

Your accomplishments have zero to do with the ‘elite’ tooling you use (why are you gatekeeping and creating class distinctions out of assemblers?), and more that you’ve taken the time to really think about how memory is laid out and how the architecture works - which most of us who started out writing operating systems instead of Rails understand perfectly fine. Nothing about the relationship between gas and ld achieves uniqueness not seen in other native stacks. That’s just made up.

There are multiple operating systems built with hand written NASM. Arguing about assemblers like they matter for more than five seconds is tiring 1990s IRC stuff. They turn syntax into a byte layout. It’s like realizing oh, this assembler sucks at ELF, why don’t I just hand lay one out? and boom, you’re on the way to APE. And really, if you don’t like your assembler, an “elite” programmer would write their own, but that fell out of fashion before most of us were alive.

Given the knowledge it takes to write something like APE I know you know what I’m saying, too, so why are you misleading like this? It comes across like an overstretched attempt to promote your work for a reader who knows the stuff you’re talking about.


Isn't this a function of the tools you were using, not the syntax? Couldn't any of these tools support any syntax and do the same thing?


The line between syntax and functionality is pretty thin for an assembler.

I've definitely had code that some assemblers accepted and others didn't on the same arch, so even if they could be equally expressive in practice they aren't. Fairly sure that's also true of inline assembly on clang x64, had to change between intel and at&t for something a while ago.


Not really. Intel vs AT&T syntax is about how to generate machine code from mnemonics.

Cooperation with the linker comes with other directives and options. gas has tight cooperation with GNU ld via directives (which mostly are just implemented with ELF constructs AFAIK), not instruction syntax. It can actually be set to accept either AT&T or Intel syntax, without losing any support for other features.


The case that comes to mind was emitting a (correct) relocation for one mode and raising an error for the other. That was more likely to be llvm-mc than gas though, so results from on assembler may not apply to all assemblers.


Emitting a relocation with .reloc directive? Definitely sounds like an llvm-as bug.


No, it was an arithmetic expression related to labels. And yes I'd agree that it working in... I want to say intel and not at&t (might be the other way around) is a bug/missing feature. Didn't bother fixing or reporting it at the time though.


The amount of disagreement this individual testimony is generating is fascinating and frustrating to me.

I didn’t think it ever really makes sense to disagree with a comment like this, unless one is claiming that the content of your comment is intentionally untrue. Anyways, appreciate your perspective.


It's rare that I post unpopular opinions, but when I see an HN thread full of people talking about how terrible the tools are which helped me be successful, I just can't help but speak up. It's an important topic for me because I originally believed the consensus. Like all of you, I read the countless Internet comments which say things like GNU AS is meant only as a backend for GCC, that you must never write assembly, and if you do then AT&T syntax is the devil. As a result, I wasted half a year of my life trying to build APE using programs like nasm and fasm. It wasn't until I decided to use only GNU tools that I began making progress.


That’s a much different lens on the point you were making before. One could argue it’s an entirely different comment and, more importantly, that it yields an entire spectrum of conclusions that weren’t your original point. Before, you were saying “people don’t like AT&T because,” and a bunch of other pronouncements that made my triple-decade native eyebrow raise because they didn’t match my experiences, even squinting really hard. You’re also introducing ideas that are just completely wrong, like syntax having anything to do with the ld integration in gas (it doesn’t; read nearby).

Now you’re saying you were swayed by the idea that one shouldn’t use gas directly, which is an entirely different premise — and much more understandable.

You found a stack that worked for you to build APE. Those are your experiences. Pivoting those experiences to absolute observations about what is good or bad or elite is the error, here. And it’s ironic: you’re lamenting believing “the consensus” in your own journey while simultaneously influencing the consensus with conclusions that your available data didn’t earn. Passively responding to negative feedback toward positive replies suggests bad faith, too, so I’d ask if your contribution to the consensus is harmful, like those were that steered you another way.

I also bristle at a niche field having terms like “elite” used casually to influence people entering it, just like you were once upon a time, and I note you ignored that feedback in responding here. That’s also ironic because APE is challenging the idea of how native software is packaged and distributed, which itself requires outside the box thinking, and you’re using language and absolutist pronouncements to create a different box that has a right and wrong.

Your commentary here suggests to me that the loads of people waiting on you to change the world with APE might be waiting a while. That’s unfortunate and directly motivational.


I am curious to hear some specifics of why APE did not work with nasm or fasm. Your earlier comment mentioned "tight cooperation with the linker" -- I don't know what this means. My mental model is that all assemblers ultimately produce an object file. It's not clear to me what would make one object file integrate better with a linker than another, unless there are certain .o outputs that nasm or fasm were not capable of generating.


I don't think much hate is directed at the tools per say, but rather it's at the syntax. It's like how people complain about all the idiosyncrasies of English, without denying it's an important, useful language.

As Stroustrup put it: “There are only two kinds of languages: the ones people complain about and the ones nobody uses.”


Did any of those advantages that you relied on depend on AT&T syntax, though?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: