"Never abbreviate a variable" is a very strong statement, surely inspiring religious wars. And maybe there are edge cases: i as a loop counter, id for an integer primary key, whatever. But this example is something else entirely; "ttpfe" is honestly the worst variable name I have every seen.
Historically, i wasn't even an abbreviation: It was the first variable name which would be assumed to be integer by FORTRAN compilers which implicitly assigned types to variables based on name. The choice was probably further influenced by longstanding mathematical tradition, which uses i and j as indices.
(You could declare types and the compiler would respect it, leading to the old truism "GOD is REAL, unless declared INTEGER".)
(If you think that's the weirdest thing old FORTRAN did, look up the arithmetic IF statement sometime. Then, look up assigned GOTO.)
When hiring we ask for sample code from developers written specifically for the job application. We place a really high premium on readability. Occasionally we'll get code from devs that is very concise - not just abbreviated variable names, but complex long statements using ternary etc.
I'll usually ask them to resubmit code and really focus on readability and it turns out fine. But I think there might be a misconception among devs where the thought is that really compact code shows talent.
Our ideal coder is someone who cares deeply about performance and where the code is ridiculously easy to read and trace through. I mention performance, because in some situations making things more verbose can affect that.
Whenever we interview a candidate, the one thing we give high premium on is structure, the rest come next.
bad variable names, long complex code...can easily be adjusted. If the candidate doesn't have a good understanding of how to structure a code (what goes where, for what purpose, separation on concern...), it'll take much more effort to teach that candidate than to teach the bad variable namer.
Properly structured code naturally leads to a better performing software, even if/when a bug arises it'll be easier to spot / test.
All that being said, if a coder names a variables, 'a', 'aa', 'aaa', 'aaaa' that's a serious red flag.
I was joking. The example sort of looks like something that would fall out of minify -- although 'a' - 'z' would have to be used/visible in a scope before 'aa' would be generated as a name.
And that was the joke: that the original code "out-minified" minify - that running the original gibberish through minify would NOT make it any smaller :-)
The older I get, the less I value "cleverness" in programmers. I used to delight in writing code golf one-liners. But now I realize it detracts from solving the hard problems, and it makes simple problems harder.
It depends on whether The Compiler is Smart Enough™ — verbose code, for example, declares variables for values before using them in expressions.
So instead of
if (long.and(Complex(Expression)) && (other || long && expressions)
You use
mandatoryConditional = long.and(Complex(Expression))
optionalConditional = other
otherOptional = complex && expressions
if (mandatoryConditional && (optionalConditional || otherOptional)
Now you have more names hanging around, and longer code, but it's better to read, especially if these conditional variables are named properly. Some people fear that this could negatively impact execution time, but I don't think it will have any noticable impact in modern languages.
Another problem could occur where people verbosely allocate things, pre-computing and re-computing aggregate data structures, and a one-liner would've been more efficient since it goes over the same data only once.
In shell scripts, for example, accessing a longer variable takes longer time than accessing a shorter variable. In my testing, accessing a 100 character variable took around 20% longer than accessing a 1 character variable.
In the 90's I worked on a FoxPro application, also for DoD. My boss, a retired Navy Captain, used very terse variable names, partly because he was a hunt-and-peck typist, and partly because earlier languages allowed only two-character names.
But FoxPro allowed variable names of any length. However, it only recognized the first ten characters. To my boss's chagrin, I often used names longer than ten characters, risking collisions with other long names. That never happened.
But ten years after leaving, I was hired to port the product to Visual FoxPro, which does recognize the whole name. Some of the early commits were "Fixed inconsistently-used long variable name..."
Of course, in those days we weren't using linters, or tests, or source control, or reproducible builds... and yet still had a business. No wonder Alan Kay calls computing "not quite a field."
I remember in college in the mid 80s trying to get something to work on one of the Apple IIs in the library in BASIC, only to discover that my earlier experience with the disk based interpreter supported 6 char names, but the ROM based interpreter only recognized the first 2 chars, but would accept longer names it would not differentiate.
Hilarity ensues...
And, yeah, I also did a lot of XBase in the late 80s, though usually "Clipper", rather than "Fox".
Batch files made a decent build process, and you could be disciplined with regular arc/zip files for source - not too unlike "make clean" and svn. We certainly had a lot of cruddy manual processes otherwise back then, though.
I was recently implementing a geometry algorithm which I looked up on quora. It was described using typical vector notation, using r,s . t,u. Since I referenced the algorithm in the comments, I decided to use these same variable names in my code.
I think this is the right choice, but my code reviewer didn't. But he didn't click on the quora link.
Why is okay for mathemeticians to abbreviate things but programmers? Is it because they deal in more abstract entities where the name is irrelevant?
In math, notations are designed to make statements about the problem domain concise. Once you pass a certain degree of concision, longer names impede readability rather than enhancing it. That is because the ability to take in an entire complex expression or subexpression at a glance tells you things—and lets you see patterns—that wouldn't be as apparent if longer names were used. Programmers in the APL tradition understand this, but most programmers do not. (Many refuse to believe it's possible when they hear about it!)
In software, programmers have grown accustomed to a notion of readability that derives from large, complicated codebases where unless you have constantly repeated reminders of what is going on at the lowest levels (i.e. long descriptive names) there is no hope of understanding the program. In such a system, long descriptive names are the breadcrumbs without which you would be lost in the forest. But that is not true of all software; rather, it's an artifact of the irregularity and complexity of most large systems. It's far less true of concise programs that are regular and well-defined in their macro structure.
In the latter kind of system, there's a different tradeoff: macro-readability (the ability to take in complex expressions or subprograms at a glance) becomes possible, and it turns out to be more valuable than micro-readability (spelling out everything at the lowest levels with long names).
It also turns out that consistent naming conventions give you back most of what you lose by trading away micro-readability, and consistent naming conventions are possible in small, dense codebases. That of course is also how math is written: without consistent naming conventions and symmetries carefully chosen and enforced, mathematical writing would be less intelligible.
Edit: The fact that readability without descriptive names is widely thought to be impossible is probably because of how little progress we've made so far in developing good notations, and tools for developing good notations, in software. This may not be so hard to understand: it took many centuries to develop the standard mathematical notations and good ways of inventing new ones to suit new problems. Mathematics is the most advanced culture we have in this respect, and in computing we're arguably still just beginning to retrace those steps. If we wrote math the way we write software, mathematics as we know it wouldn't be possible.
Edit 2: The best thing on this is Whitehead's astonishingly sophisticated 1911 piece on the importance of good notation: http://introtologic.info/AboutLogicsite/whitehead%20Good%20N.... If you read it and translate what he's saying to programming, you can glimpse a form of software that would make what people today call "readable code" seem as primitive as mathematics before the advent of decimal numbers seems to us. The descriptive names that people today consider necessary for good code are examples of what Whitehead calls "operations of thought"—laborious mental operations that consume too much of our limited brainpower—which he contrasts to good notations that "relieve the brain of unnecessary work".
Applying Whitehead's argument to software suggests that we'll need to let go of descriptive names at the lowest levels in order to write more powerful programs than we can write today. But that doesn't mean writing software like we do now, only without descriptive names; it means developing better notations that let us do without them. Such a breakthrough will probably come from some weird margin, not from mainstream work in software, for the same reason that commerce done in Roman numerals didn't produce decimal numbers.
You're buying into a false dichotomy. Description should always exist. If you think there are too many characters then by all means apply a transformation on your personal copy to whatever symbols you prefer, but don't deprive everyone else of valuable context.
If you read it and translate what he's saying to programming, you can glimpse a form of software that would make what people today call "readable code" seem as primitive as mathematics before the advent of decimal numbers seems to us.
This is an extraordinary (and enticing and often advocated) claim that has, so far, failed to produce the extraordinary evidence. It says something that a person as concerned with notation as Knuth used mathematical notation for the analysis of algorithms and a primitive imperative machine language to describe behaviour.
I see no connection here to what I wrote, which has nothing to do with functional vs. imperative programming. I'm talking about names and readability in code.
Imperativeness is a separate matter. One can easily have it without longDescriptiveNames, and although I don't have Knuth handy, I imagine he did.
At first read the idea you propose is very attractive but I think you do need to address why APL didn't take off. Perhaps they chose a poor vocabulary, are there better ways to represent algorithms?
I'm sorry I didn't reply to this during the conversation, but am traveling this week. IMO the short answer is that questions like "why didn't APL take off" presuppose an orderliness to history that doesn't really exist. Plenty of historical factors (e.g. market dynamics) can intervene to prevent an idea from taking off. Presumably if an idea is really superior it will be rediscovered many times in multiple forms, and one of them will eventually spark.
> "Is it because they deal in more abstract entities where the name is irrelevant?"
Partially, but also because math equations don't really have strong maintainability needs. Another mathematician isn't going to walk in 6 months later, get confused, and blow up the universe.
Though I'd argue that in fact math equations should be better named. They're rather abstractly named as a matter of convention, but they would be more easily understandable in many cases if variables were more carefully named.
Another risk of "porting" geometry algorithms, especially complex ones, directly from their mathematical expressions, is that you don't gain insight into why they work. This makes debugging later difficult, since nobody who wrote the code actually understood what the algorithm was doing. Forcing yourself to rename variables into something sensible will also force you to understand the mechanics of what's happening.
>they would be more easily understandable in many cases if variables were more carefully named
A mathematician already knows the notation, so a a long mathematical expression with descriptive names will not make it more easy to understand (if you know the notation). I would argue that it can even make things worse and add cognitive load.
I find that as a piece of code skews more math- and geometric- centric, the variable names skew more towards math-like brevity as well. To some degree this is due to historical technical limitations (LAPACK and Fortran being one extreme -- dgbsv anyone?), but I see a ton of contemporary production code with one letter variable names. R is a rotation, n is a normal, p is momentum, whatever. I think a lot of the historical notation conventions carry over to code, and most peeps working in the domain day to day are down with whatever baggage is brought along. This is fine until you try to read francophone code... try reading a French a thesis and you'll find that they spur all conventions that the rest of the world has agreed upon :).
>Why is okay for mathemeticians to abbreviate things but programmers? Is it because they deal in more abstract entities where the name is irrelevant?
for example, xyz^2 in a piece of written math means something different than it would in a program (in the math case we are obviously not taking the variable "xyz" and squaring it). I guess what I am trying to say with this example is that perhaps variable names in math are one symbol because concatenation, in many cases, already means "multiply".
> Is it because they deal in more abstract entities where the name is irrelevant?
That sounds right. Also, there are standards for variable names that everyone is familiar with in math, so those letters actually have meaning. As most of my code is not purely for implementing pure math algorithms, I haven't almost ever written code where a variable couldn't have been named with a word describing it or its use. It makes understanding and maintaining the code a lot easier when it is self-documenting.
I theorize it's because math as we know it was written down on paper and blackboards. Those mediums don't have auto-complete, like any good code editor does.
I wish math notation would be replaced by s-expressions - easy to type and only like 3 concepts to learn for fluency.
I write a lot of code like that. Usually I start off with a huge comment block, detailing what the function does, outlining the algorithm used and explaining the notation. Only then I include both the link to the paper and and the full citation.
It's about as much work as writing the code in the first place, but it has saved me a lot of times when I or someone else had to go and work on it.
I suppose it's because a proof is contained within itself while a computer program may interact (in this case with another component within the same application) in non-obvious ways. Clarity in variable names helps to prevent this behavior, although it doesn't completely eliminate the risk.
Probably has to do with decades (centuries?) of knowledge that is passed on in a somewhat consistent and rigorous way. Programming is a relatively new field, with tons of self-taught practitioners.
Math is verb-oriented, so you shorten the variable names to make the structure clear. Most programming is noun oriented, so you clarify variable names at the expense of taking attention away from structure.
Math does name its nouns, but those tend to be structures rather than variables (groups, rings, fields etc.)
If the variables had been named timeOfTenPercentHeightOnTheFallingEdge and timeOfTenPercentHeightOnTheRisingEdge it probably would have still been hard to notice that they had been swapped in that one line.
I think the catch is that the 'f' and 'r' keys are right next to each-other, so if you accidentally type one instead of the other then you'd get the other variable by mistake.
That said, you raise a fair point - we don't know how this bug got there. If it was a simple typo like above then the verbose names would have prevented it. If it was a logic error by the programmer (For whatever reason), then you're right that the name wouldn't matter because they typed the one they intended, it just wasn't the right one to use.
Fat-finger is only one kind of error -- and one far less common than the good-ole brain-spasm which can easily replace "Falling" with "Rising".
The ability to read the code an find see the error is more important. And one that score even though ttpfe is a terrilbe name "timeOfTenPercentHeightOnTheFallingEdge", is even worse.
Yes, but the likelihood of mistyping them would be far less. 'f' and 'r' are right next to each other. 'Falling' and 'Rising' are a bit harder to unnoticeably fat-finger.
I definitely prefer more verbose names to abbreviated ones, but I'm not sure that never abbreviating a variable name is the right way to go either. Surely there's a middle ground between `Ttpfe` and `timeOfTenPercentHeightOnTheFallingEdge`?
I like using static types to avoid these sorts of problems. Modern languages like Swift, Rust and Haskell let you make zero-overhead type wrappers around other types.
So here they could have defined `newtype RisingEdge(Float)` and `newtype FallingEdge(Float)`, and then use those types in the function parameters as appropriate.
I miss Pascal "type" block definitions (or C typedefs, at least) when I have to work with too many Java generics.
I know, [Turbo] Pascal (etc) never had generics. But if they had, you would have been able to make a type with a generic expression, rather than a constantly repeated inline monstrosity of an expression.
This is incredibly powerful when done correctly. Ive come across a few similar features in F# that I wish were available in other languages, such as units of measure (F#) and active patterns.
Use context. If you can't, refactor so you can. If you can't, well, you're SOL and stuck with timeOfTenPercentHeightOnTheFallingEdge.
Edit: e.g. if all of your variables are "timeOfTenPercentHeight...", cut that part out of the name. Full names can be just as bad as abbreviations - e.g. if they're all "timeOfTenPercentHeight..." then they all start to blend together.
Exactly. Make a "times" record/object/tuple (whatever the local custom you have for an aggregate type is), perhaps with a "tenth" nested such item (10% = 1/10), with "fall" and "rise" members.
times.tenth.fall
times.tenth.rise
Hopefully, they aren't using some ancient version of FORTRAN.
My first paid work in the 80s was in a language with 8 or 10 character long names, depending on use, with only scalar values outside of ISAM tables. One compiler variant extended that with arrays (but not nested arrays). You had no choice but to make some weird names.
That said, when I started doing ANSI C with 32 char names, I felt like I had failed when I needed a variable that had to be that long to explain what it did. Either it had too broad a scope, or should have been bundled in something else (record/struct).
Yes exactly. If you factor out a class that deals exclusively with timeOfTenPercentHeight, then the variable names can be OnTheFallingEdge and OnTheOtherEdge or whatever...
One thing I often see elided in these naming wars is scope: am I the only person who gives longer names to globally visible things than to locals?
For example:
size_t *find_foo(const char *source_text)
{
const char *src = source_text;
for(; src != 'f'; src++)
... do stuff
... do more stuff
return src - source_text;
}
Now the meaning of "source_text" would not be evident, except for the name. But just glancing at the usage shows that "src" is clearly a working cursor into the source text.
But if I called it "working_cursor" would that really explain anything to the reader? If anything, giving a detailed name risks misleading readers in much the same way as stale comments can mislead.
The problem was not that the variable was abbreviated. The problem was that the abbreviated variable was so similar to another abbreviated variable that was used for a similar purpose.
I would add - there's a bit of a difference between an abbreviation that just keeps something from being ungodly verbose (20 letters instead of 50), and an abbreviation that shortens something so much that the original meaning is completely lost ("Ttpfe"). This is especially true for things in context - "time of ten percent height on the falling edge" is needlessly verbose, but in context "ten_percent_falling_edge" would probably be perfectly fine. And indeed, that's really just an expanded version of "Ttpfe", which is what was used anyway.
If you shorten everything to five-letter names, it's not really that surprising that it becomes an issue - And I mean, what's the real point of abbreviations when they are so short that it makes it harder to read the code?
Edit: It's worth pointing out though - if this is really old C code it may be justifiable. Back in the olden days of C (Older then C89 at least), only the first 8 characters of a symbol actually mattered. "blahblahone" and "blahblahtwo" would resolve to the same symbol. So shortening in this way could be necessary.
The line was adding about 5 or 6 terms, and I think it was actually an 8-character acronym that he had. It might have been like "time from ninety percent to ten percent falling edge" or "falling edge fall time ten percent" or something. Either way, it was awesome to find it.
My point was that the abbreviation is not a problem per se. If there was not another variable with a similar name, it would not have mattered at all.
This is an issue of names being similar. No matter how long a variable name is, if another variable name in the same namespace is superficially similar, this kind of thing can happen.
On the other hand, if there are no other variables with similar names, then you can abbreviate all you want and your chance of collisions will not increase.
Yeah, the problem isn't that the airplane is on fire, the problem is that it crashed into another burning airplane as both tumbled uncontrolled through the sky, hurdling toward the earth.
I say this tongue in cheek: With current IDEs having such great autocompletion, has anyone experimented with coding far outside of the ASCII character set? Programming language restrictions aside, I recognize the obvious troubles this would cause. And yet I do a lot of work on simulations where math formulas are converted to code, which means lots of compounded Greek names like "omegaSquared" or "epsilonMinus". Naming decisions becomes more challenging as subscripts and superscripts are added, yet alone matrix indices. At some point perhaps the symbolic name should be replaced with the descriptive name, such as "first_eccentricity_flat_to_fourth". But it sure would be nice to have access to something with such brevity.
I self-tested variable name lengths on my own code.
Three letters is enough to avoid most collisions. Words do not make sense yet.
At four letters most words become decipherable given an appropriate encoding.
At five letters a two word phrase may make sense.
I make a rough decision based on variable scope - shorter lifetime means shorter variable name, but I rarely go with just one letter as it reduces uniqueness.
If I need to use a really long phrase frequently I take a mathematician's approach and alias it to an abstract and highly unique symbol. The phrase may still exist in the addressed data structure, I just avoid it within the algorithm. Mathy code also has a tendency to encourage numbered variables, e.g. "x0, x1, x2".
I had approximately a month between when I wrote it and reread it. While it was serviceable in short-term memory, comprehension suffered noticeably in the long term with four-letter or less names.
At the time I did this I was pursuing Arthur Whitney's extremely terse style to see what advantages and disadvantages it brings. My main modification was to add a full description comment inlined into the declaration of every variable, a practice which I have mostly kept up with even after the experiment ended, e.g.:
var tz : String; /* time zone */
This meant that I was not testing on whether I could reconstruct the entire meaning in my head, just whether it made code flow better to use very short names. My discovery was that it does help, up to a point, because you can always create original short names that differentiate better than long abbreviations:
result = ttpfe + ttpre;
vs.
result = q + k;
Any time you copy-paste-modify, you create a risk of error. Including the whole phrase slows down your reading and brings diminishing returns on comprehension as it puts friction on actually working with the variable. With the highly differentiated "q" and "k", error has a lower likelihood of slipping in than the abbreviation because you've minimized the amount of noise in the data - it "chunks" better and you can read the whole algorithm more fluidly. The only problem with using such short names is the issue of reconstruction, which is why I opt for aliasing a long name if I need to work with it frequently, even if it means extra boilerplate to make a local variable. Abbreviation is the worst of both worlds and so I try not to do so much of it.
Given the original 8-character constraint of OP I might have tried something along the lines of:
"edge_t0" "fall0" "ef"
Because there is no way to convey the entire phrase, I see it as preferable to focus on a key aspect, expand that word, and push everything else into more symbolic content, especially numbers. Programmers are already trained to look carefully for uses of 0 vs. 1 in our code. That doesn't mean it's my go-to for an abstract symbol, but it works better than a bad abbreviation.
I would never publish anything that shouldn't be published. Pretty much everything here could be found in a few hours on wikipedia, except for the story aspect of it.
Heh. When I first learned to code as a kid, a variable was max two characters (the first of which had to be a letter A-Z and the other could be a letter or number).
I would suggest that the need for Englishy variable names is due to a weakness in programming languages and possibly the programming model itself. Why should a set of legitimate values for a computation benefit from how you refer to that set? Can that variable take on undesired values? Do you rely on that name and its comprehensibility to distinguish good from bad values? I sometimes find it hard to believe we still program this way.
We don't have to still program this way - you can write code with very strict types, with machine-checked proofs that it works correctly, etc, etc. We don't do this very often because it turns out this level of rigor is incredibly time-intensive.
While many of the Bell Labs guys didn't much like Pascal, Turbo Pascal managed to address almost all of the complaints I've ever seen, while preserving the good parts of Pascal (or Modula???).
Java must look irredeemable to the Bell Labs folks, though. Perhaps it is: UglyNames; limited structured constant literals; still too clunky lambdas for callbacks.
This is common in the sciences as well, since senior professors also had the 8-character limit (from fortran [0]). And functions are also named this way as well (see lapack/blas).
Some people also hate typing out slightly longer variable names and what not. I try to emphasize that a section of code will tend to be read more times that it is written, and therefore readability is more important. It's a frustrating battle sometimes, though.
[0] Exacerbating the problem of using the wrong variable is the fact that much existing code uses implicit types...
TL:DR Abbreviated variables are not always intuitive to others. I tend to agree. If you're going to use a pattern or abbreviation then be as non-creative as possible.
I think the usual rule is that the size of a variable's name should be a function of the size of the scope in which it's visible: If it's a global, it should have a long, descriptive name, perhaps with_underscores or CamelCase or similar. If it's a class member, abbreviate it some. Function locals get even shorter names, and loop indices or temporary variables only used in one part of the function can be single-character.