Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"What if it changes?" is a reasonable question to ask. But every time you do you are walking a tightrope. My rule of thumb is that we look at what is in use TODAY, and then write a decent abstraction around that. If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction that suits us TODAY not for the future. Bonus points if the abstraction allows us to extend easily in the future, but nothing should be justified with a "what if".

The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing. But I've come to love just doing integration testing. I still use unit testing to test complex logic, but things like "does this struct mapper work correctly" are ignored, we'll find out from our integration tests. If our integration tests work, we've fulfilled our part of the contract, that's all we care about. Focus on writing them and making them fast and easy to run. It's virtually no different to unit tests but just 10x easier to maintain.



> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

That is a good rule of thumb, and I often follow it too. But it does take some discernment to recognize cases where something would benefit from an abstraction or some common code, even if it is only used twice.

I used to work for a company that imported airspace data from the FAA (the US Federal Aviation Administration) and other sources. The FAA has two main kinds of airspace: Class Airspace and Special Use Airspace.

The data files that describe these are rather complex, but about 90% of the format is common between the two. In particular, the geographical data is the same, and that's what takes the most code to process.

I noticed that each of these importers was about 3000 lines of C++ code and close to 1000 lines of protobuf (protocol buffer) definitions. As you may guess, about 90% of the code and protobufs were the same between the two.

It seemed clear that one was written first, and then copied and pasted and edited here and there to make the second. So when a bug had to be fixed, it had to be fixed both places.

There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

When I asked the author about this code duplication, they cited the same principle of "copy for two, refactor for three" that you and I approve of.

But this was a case where it was spectacularly misapplied.


I think your example illustrates why it's so important to choose the right way to generalize/share code depending on the circumstances. I've found that when there's a 90% overlap between 2-3 use cases, many people tend to go with "one common code path for all that's shared and then inject the 10% difference in via components/callbacks/config vars". This works reasonably well when the flow of execution is the same and what changes is just the specifics of some of those steps. But if the differences are also in which steps even happen, then in my experience this approach couples the whole thing too tightly and makes it harder to reason about what actually happens in a given configuration.

What I like to do instead is break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique. I've found this way of generalizing is almost "cost free" because it doesn't really couple things at a high level, and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.


> ...break the shared code paths into a palette of smaller subfunctions/subcomponents and then have each use case have its own high level code path that picks and chooses from these subfunctions: One does ABCDE, another does ACDEX. It makes it supremely easy to reason about what each of them actually do, because they read almost like a recipe. It becomes a sequence of high level steps, some of which are used by several use cases, while others are unique.

Isn't this just the Command pattern? - https://en.wikipedia.org/wiki/Command_pattern


I love this. Refactor the first time. Remix the rest of the times.


Do you know if there’s a name for this pattern? I admire it all the time in Peter Norvig’s code. It leads to very approachable code.


I don't know if there is an official name, but in my head I call it "helpers/components/mixins are better than frameworks." Or, "if one happens to want to write a framework, one ought to try hard to refactor it 'inside-out' to a set of composable components."

The most important (though not only) issue with frameworks is that you typically can't compose/mix more than one together - every framework is "exclusive" and takes control of the code flow. Whereas "components" can usually be easily mixed with each other, and leave control of the code flow to the programmer.


I generally think of this as the same principle of "prefer composition over inheritence". Leave the top-level free to compose the behaviour it requires rather than inheriting the framework's behaviour, for exactly the reasons you describe.


This is frameworks vs libraries. In the first case the framework is calling the code with config and hooks to change behaviour. In the second case there are common library functions called from completely separate “application” code.


I don't know an official name for it. It seems like it's almost too basic - "subdivide into helper functions" - to make it into the Gang of Four or other design pattern collections. But in my head I'm calling it the "Recipe Pattern"


It sounds like a version of the strategy pattern to me.

https://en.wikipedia.org/wiki/Strategy_pattern


> and it's the kind of readability refactor that you'd often want to do anyway even if the code wasn't being shared.

Couldn't disagree more tbh. Some of the worst code I've ever had to work with has been over abstracted "recipe" code where I'm trying to descern complex processes based off two word descriptions of them in function names.

Doing this too much is a great way to turn a readable 100 line algorithm into a 250 line clusterfuck spread across 16 files.


> Doing this too much

ok, so you're talking about overdoing it. It's still a good approach when done right.


Not really, unless "done right" is for like a 2000 line function or something.

If code is running once in order, there's no reason to break it up into functions and take it out of execution order. That's just stupid.


Martin Fowler, in his book "Refactoring" outlines circumstances were you can leave bad code alone.

Basically if it works and you don't have to touch it to change it, leave it alone.


I think you've completely missed the point.


Oh god that reminds me. Our company did this but for a whole project.

It was back when a bunch of social networks released app platforms after Facebook's success. When hi5 released their platform, rather than refactoring for our codebase to work on multiple social networks... someone ended up just copying the whole fucking thing and did a global rename of Facebook to Hi5.

For the 3rd social network I refactored our Facebook codebase to work with as many as we wanted. But we never reigned in Hi5, because it had diverged dramatically since the copy. So we basically had two completely separate codebases: one that handled hi5, and one that had been refactored to be able to handle everything else (facebook, bebo, myspace, etc)


No bets on which one is buggier. Or which one's bugs (and also their fixes) break more networks.


Hi5 was less buggy because new features were just never ported to it - it was deemed not worth the effort.


I also got this heuristic from Martin Crawford. However I believe it applies to snippets (<100 lines of code at the very most) only, for the reason you gave. But even then, it sometimes happen that you find a bug in a 4 line snippet that you know was duplicated once, and have to hope you can find it through grep or commit history. So while being careful not to over-engineer and apply KISS/YAGNI ('you ain't gonna need it'), one-time duplication can be a pain.


I cannot edit my comment anymore, but I realized Crawford is the Martin of 'Forest Garden' fame. I was obviously meaning Martin Fowler, from the 'Refactoring' book.

Maybe we'll have 'Forest Software' in some time. 'A code forest is an ecosystem where the population of bugs, slugs, trees and weed balance themselves, requiring very little input from the engineer'.


> There wasn't any good path toward refactoring this code to reduce the duplication. Most of the C++ code referenced the protobufs directly, and even if most of the data in one had the same names as in the other, you couldn't just interchange or combine them.

That makes it sound like the problem is more of a spaghetti mess than duplication.

But I think the advice to copy something when you need two versions is supposed to be applied to specific functions or blocks or types. Not entire files. Then it wouldn't have duplicated the geographical code.

It's also important to have a good answer to how you'll identify duplicated bugs. I'm not sure how to best handle that.


If I needed to guess: They probably referenced the protobufs directly, because there are always 2 and "You have to tell it which one!".


What if the FAA updates the code for the coordinates of the one and not the other. Then your abstraction is moot.


Of course not, abstraction works even better there! Every point that differs will have either a conditional, or an abstract part to be implemented by child classes. So the abstraction lets you know at a glance what are the key points to look for.


> If something is used once, ignore any abstractions.

This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

The primary reason for building abstractions is not removing redundancy (DRY) nor allowing big changes, but making things simpler to reason about.

It is way simpler to analyze a program that separates input parsing from processing from output formatting. Such separation is valuable even if you don't plan to ever change the data formats. Flexibility is just added bonus.

If the implementation complexity (the "how") is a lot higher than the interface (the "what") then hiding such complexity behind an abstraction is likely a good idea, regardless of the number of uses or different implementations.


Nah, I’ll take your 50 line main() every day over those 10 files with 10 lines of boilerplate and one line of working code each. But at the end of the day you just need to roll with the style of the org you’re working with.

I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article. I just assume there is somebody there that is really invested in all those interfaces, adapters, and impls and I’m not here to start silly fights with them. The code will still work no matter how many pieces it’s cut into and how many unnecessary redirections you add so no worries.

But for my own stuff I like to keep things compact and readable.


Where did I write it would be 50 lines of code only? And where do you get the 10:1 boilerplate to real code ratio? Maybe just use a more expressive language if you can't build proper abstractions and need a lot of boilerplate?

And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

> I drop in to Java shops from time to time, and am more than happy to port my simple class structures that make sense and do things into the 18 level hierarchies described in the article.

I certainly agree with that, but that has nothing to do with abstraction. Abstraction and indirection are different things. Those terrible FizzBuzz Enterprise like hierarchies are typically a mixture of insufficient abstraction and far too much of indirection. Abstraction reduces complexity, while indirection increases it. AbstractFactoryOfFactoryOfProblems is indirection, not abstraction, contrary to what the name suggests.


And why go so extreme?

I wouldn’t. I’d break it up the same as you, with those 3 functions. After we’d shown we were going to be doing lots of similar things. But given the choice between too complicated and too simple, that’s the direction I’d lean.

Apologies if that wasn’t clear in context.


> And why go so extreme? A main() calling into 3 functions like load, process and save will be still plenty better than a single blob of IO mixed with computation and would still contain no boilerplate.

Sure. But a main() ordered into loading, processing, and saving would be similar amounts of better, despite not using the abstraction of functions.


Code style / formatting is a secondary thing. If someone made the effort of splitting it into well organized 3 pieces, and denoted those pieces somehow (by comments?), that also counts as an abstraction to me, even though it is not my preferred code style.


If you consider organization in general to be abstraction, then I think that might cause some overselling of abstraction and miscommunication with others.

Unless I'm the one using words weirdly here.


It is not just reordering the lines of code.

In order to organize code that way, you need to establish e.g. some data structures to represent input and output that are generic enough that they don't depend on the actual input/output formatting. There you have the abstraction.

The key thing is to be able to understand the processing code without the need to constantly think the data came from CSV delimited by semicolons. ;)


>> If something is used once, ignore any abstractions.

>

> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once.

I broadly agree with you, but devils advocate time: not all abstractions are at the same level.

Writing a static function `slurp()` that takes in a filename and returns the file contents isn't an abstraction in the same sense as having a `FILE *` type that the caller cannot look into which functions like `fprintf()` and `fscanf()` use to operate on files.

I think an opaque datatypes (like `FILE`) are "more abstract" than static functions defined in the same file you are currently reading.

IOW, "Abstraction" is not a binary condition, it is a spectrum from full transparency to full opacity.

Static functions in C would be full transparency (no abstraction at all).

Opaque datatypes in C would be full opacity (no visibility into the datatype's fields unless you have the sources, which you may not have).

C++ classes would be something in-between (the private fields are visible to the human reading the header).


I agree, and that's why I said that good abstractions are those which have good implementation complexity vs interface complexity ratio. File abstraction is a perfect example of this - a simple concept you can explain in 5 minutes to a kid, but implementations often several thousands lines of code long.

Also, the simpler the interface, usually the more contexts it can be used in. So those abstractions with nice interfaces naturally tend to be more reusable. But I argue this is the consequence, not the primary reason. You probably won't end up with good abstractions by mercilessly applying DRY.


> This is a terrible advice. According to this, a classic program that loads data from a file, processes it, then writes the results to another file should be a single giant main() that mixes input parsing, computation and output formatting. Assuming file formats don't change, all of those would be used only once. CS 101 style. :D

Yes, if the program will be written and tested exactly once, with no change requests to come later, it's perfectly fine to write it as one big main().

It all depends on what the stakeholders need, clear communication with them is the real trick.


Well, what if the program suddenly crashes and gives you a stacktrace pointing to main()? Assuming you were not the original author of the code, you'd have to read most of the code to understand it.

If the main was split into well defined, separate pieces, at least you could quickly rule out quite a lot of complexity. If it crashed in parsing, so wouldn't need to understand the processing logic, etc.

Sure it is easy to read one blob of code, if it is only 100 lines of code. But it is a different story if it is 10000 lines and now you have to figure out which of the 100 variables are responsible for keeping the state of the input parser and which are responsible for "business logic".


But writing it would be harder, no?

I mean if it's only like 50 short lines, that would be okay-ish, but in this case why do it in C and not use perl or awk?(i suppose you want fast text processing, so I won't suggest python). If the processing is hard, then you will need debugging (which is better in segregated functions) and to prototype a bit (unless I'm the only one who does that?).


I think the specific example mentioned might be subjective, but I agree with your point.

In my mind, the common emphasis on the DRY/WET thing with abstractions leads many people to miss the point of abstractions. They’re not about eliminating repetition or removing work, they’re about making the work a better fit for the problem. Code elimination is a common byproduct of abstractions, but occasionally the opposite may happen to.

I see an abstraction as being comprised of a model and a transformation. The villain isn’t premature abstractions, it’s abstractions where the model is no better (or worse!) for the problem than what’s being abstracted over.


I could not agree more with this.

I would add, though, that in my experience you can often identity parts of a design that are more likely to change than others (for example, due to “known unknowns”).

I’ve used microservices to solve this problem in the past. Write a service that does what you know today, and rewrite it tomorrow when you know more. The first step helps you identify the interfaces, the second step lets you improve the logic.

In my experience this approach gives you a good trade off between minimal abstraction and maximum flexibility.

(Of course lots of people pooh-pooh microservices as adding a bunch of complexity, but that hasn’t been my experience at all - quite the opposite in fact)


Microservices is just OOP/dependency-injection, but with RPCs instead of function calls.

The same criticisms for microservices (claims that it adds complexity, or too many pieces) are also seen for OOP.

Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.


I don't think the paragraph metaphor works well since written works are often read front to back, and the organizational hierarchy isn't so important on such a linear medium. There are books that buck the trends and IMO you don't really notice the weirdness once you get going. E.g. books with long sentences that take up the whole paragraph, or paragraphs that take up the whole page, or both at the same time. Some books don't have paragraphs at all, and some books don't have chapters.

Splitting material into individual books makes a little more sense as a metaphor, especially if it's not a linear series of books. You can't just split a mega-book into chunks. Each book needs to be somewhat freestanding. Between books, there is an additional purchasing decision introduced. The end of one book must convince you to go buy the next book, which must have an interesting cover and introduction so that you actually buy it. It might need to recap material in a previous book or duplicate material that occurs elsewhere non-linearly.

A new book has an expected cost and length. We expect to pay 5-20 dollars for a few hundred pages of paperback to read for many hours. We wouldn't want to pay cents for a few pages at a time every 5 minutes. (or if we did, it would require significantly different distribution like ereaders with micropayments or advertising). Some books are produced as serials and come with tradeoffs like a proliferation of chapters and a story that keeps on going.

Anyway, it's a very long way to say that some splitting is merely style, some splitting has deeper implications, the splits can be too big or too small, and some things might not need splits at all.


I'd like to argue against [quote].

[author] uses the [simile] to argue the [argument].

The obvious flaw in the [argument] is of course [counterargument].

[quote]: Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

[author]: Mr_P

[simile]: microservices or smaller classes are like paragraphs in an essay.

[argument]: since no one complains about breaking up an essay into paragraphs, no one should complain about breaking up a system into paragraphs.

[counterargument]: breaking up a system in smaller microservices or classes is not at all like breaking up an essay into paragraphs, which I think this comment has demonstrated.


> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

There are orders of magnitude different amounts of work in each of these cases. (I’m not saying it’s a lot of work but it’s still significantly more in some of those cases relative to the others.)


Perhaps "break up your book into chapters" is a better metaphor for microservices. Breaking a chapter into paragraphs makes me think more of OO design or functional decomposition.


It’s breaking up into whole books. Each has is stored, distributed, addressed and built separately. You have to become an expert at making the implied overhead efficient, because it will dominate everything you do.


> Curiously, while folks sometimes complain about breaking up a system into smaller microservices or smaller classes, nobody every complains about being asked to break up an essay into paragraphs.

They would if each paragraph of that essay lived at a different domain/url.


Even if each paragraph was its own file. It's just a bad metaphor.


A microservice contains many classes. Those classes are organized into packages and so many of them are necessarily “public.” The microservice boundary is a new kind of grouping, where even this collection of packages and public classes presents only one small interface to the rest of the architecture. AFAIK this is not a common or natural pattern in OOP and normal visibility rules schemes don’t support or encourage it.


My favorite books are the ones where you read a paragraph and realize, after the fact, that it's just 1 sentence.


  > If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction
I refactor for the second time. I don't like chasing bugs in multiple places.

My rule of thumb is that there are only three quantities in the software development industry: 0, 1 and infinity. If I have more than 1 of something, I support (a reasonable approximation of) infinite quantities of that something.


Agreed, except avoid the term "abstraction". When one starts to talk about abstractions, one stops thinking.

The right word is "generalization", and that's what you are actually doing: you start with a down-to-earth, "solve the problem you've got!" approach, and then when something similar comes up you generalize your first solution.

Perhaps part of the problem is that in OO, inheritance is usually promoting the opposite: you have a base class and then you specialize it. So the base class has to be "abstract" from day one, especially if you are a true follower of the Open Close Principle. I don't know about others, but for me abstractions are not divine revelations. I can only build an abstraction from a collection of cases that exhibit similarities. Abstracting from one real case and imaginary cases is more like "fabulation" than "abstraction".

The opposite cult is "plan to throw away one", except more than just one. Not very eco-friendly, some might say; it does not looks good at all when you are used to spend days writing abstractions, writing implementations, debugging them, and testing them. That's a hassle but at least once you are done, you can comfort yourself with the idea that you can just extend it... Hopefully. Provided the new feature (that your salesman just sold without asking if you could do it, pretending they thought your product did that already) is "compatible" with your design.

The one thing people may not know is how much faster, smaller and better the simpler design is. Simple is not that easy in unexpected ways. In my experience, "future proofing" and other habitual ways of doing things can be deeply embedded in your brain. You have to hunt them down. Simplifying feels to me like playing Tetris: a new simplification idea falls down, which removes two lines, and then you can remove one more line with the next simplification, etc.


Java in particular is missing certain language features necessary for easily changing code functionality. This leads to abstractions getting written in to the code so that they can be added if needed later.

A specific example is getters and setters for class variables. If another class directly accesses a variable, you have to change both classes to replace direct access with methods that do additional work. In other languages (Python specifically), you can change the callee so that direct access gets delegated to specific functions, and the caller doesn't have to care about that refactor.


Getter and setters are unnecessary. The thing that most people are trying to avoid by using these is mutating state. However a getter or setter does nothing to prevent this. A simple `const` keyword goes so much farther than adding useless indirection everywhere.

Edit: I suppose it may be argued that you need to set some other state when you set a member variable. If that's the case, then it's no longer a getter or a setter and the function should be treated differently.


Getters and setters are much more useful when accessing or setting the element should require some other function calls. Caching, memoization, and event-logging are examples where you might want this to happen.

You can say that's not a getter/setter, but then your definition is just different than the people you're responding to.


Caching, memoization, and event-logging can be handled by wrapper objects that implement the interface so the base object doesn't need to contain all these layers of outside concerns. Let each class focus on it's single area of use.

interface Store { Query() }

// these all have the Query() method

type/class MySQL implements Store

type/class Cache implements Store

type/class Logger implements Store

var db Store

db = new Logger(new Cache(new MySQL()))


However Getters/Setters are often the worst place to implement cross-cutting concerns, like caching, memoization and logging.

Of course, in more limited languages/environments they're probably the only tool you have, so there's that.


Getter and setter are not just for keeping state immutable. They allow an api to control _how_ state changes. The most obvious example is maintaining thread-safety in multi-threaded environments.

I get they can be cumbersome, but using them really matters especially as a project grows... an API that has a simple single client today may have many different (and concurrent!) ones tomorrow. The pain of using S&Gs now saves refactoring later.


The number of getters and setters I've written that never got changed into anything more than read/change variable has to be _hundreds_ of times more than the ones that ever did anything else.

At what point is it cheaper to just refactor into getters/setters later when needed? That point _has_ to be miles behind me.


True.

Another problem (from a class/library-consumer point of view) is having getters/setters suddenly becoming more expensive to call, blocking, or even having side effects after an update.

It often only affect the runtime behavior of the code.

Changing the interface, however, will give me a hit that something else has changed.


OOP languages shouldn't need getters and setters because there shouldn't be even a concept of variable access and mutation, just all method calls - that's what OOP is all about, after all, not just putting variables into bags and staying in a procedural mindset.


Smalltalk-style OO anyway: All You Can Do Is Send A Message.

That isn't the only type of OO. Look at CLOS in Common Lisp for a counterexample: https://wiki.c2.com/?HowObjectOrientedIsClos


That will just make everything more convoluted and less flexible. When you send a message over websockets you want a Datatype for each message type. It's not going to have any complicated method calls. You just insert the data or retrieve it on the other side. Since the framework expects you to define setters and getters you do it reluctantly.


I think the concern is: It's currently a getter/setter but might change later.

Maybe for debugging you want to log a callstack every time the field gets accessed, for example.

Or when you set the field, you should invalidate some cached value that uses it.


That's a design choice though -- if you're structuring your code to avoid mutable state, you're not going to have setters. And if you're structuring your code such that you're telling objects what to do, rather than pulling data out of them and acting on them remotely, then you're not necessarily going to have getters either.


To be fair, the the Open-Closed principle is basically an article of faith in Java (along with the rest of SOLID).


The getter setter nonsense is 99% compliance for specific frameworks like Hibernate or shudder, JSF but it caught on and now nobody wants to be seen without using ugly getters and setters which would be perfectly fine if the language natively supported them.


> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better.

That is just as bad as a general rule as "What if it ever changes, we need to abstract over it!". As always: It depends. If the abstraction to build is very simple, like making a magic number a named variable, which is threaded through some function calls, at the same time making things more readable, then I will rather do that, than copying copy and introducing the chance of introducing bugs in the future, by only updating one place. If the abstraction requires me to introduce 2 new design patterns to the code, which are only used in this one case ... well, yes, I would rather make a new function or object or class or whatever have you. Or I would think about my over all design and try to find a better one.

Generally, if one finds oneself in a situation, where one seems to be nudged towards duplicating anything, one should think about the general approach to the problem and whether the approach and the design of the solution are good. One should ask oneself: Why is it, that I cannot reuse part of my program? Why do I have to make a copy, to implement this feature? What is the changing aspect inside the copy? These questions will often lead to a better design, which might avoid further abstraction for the feature in question and might reflect the reality better or even in a simpler way.

This is similar in a way to starting to program inside configuration files (only possible in some formats). Generally it should not be done and a declarative description of the configuration should be found, on top of which a program can make appropriate decisions.


I agree that counting the number of times you repeat yourself is not the right metric to determine whether or not to introduce an abstraction. Abstraction is not compression. But I don't think it depends on how simple any abstraction would be either. Simplicity does play a role for pragmatic reasons of course but it's not the key question in this case.

The key question is whether there is a functional dependency or just a similarity between some lines of code. If there is a functional dependency, it should be modeled as such the first time it is repeated. If there is only coincidental similarity then introducing a dependency is simply incorrect, regardless of how often any code happens to get repeated.


I agree! Maybe one could say: Not all repetitions are of equal nature in terms of what causes them, and to understand the cause is important.


> If something is used once, ignore any abstractions. If it's used twice, just copy it, it's better. If it's used 3 or more times, look at writing an abstraction...

As others have said, this is a good rule of thumb in many cases because finding good abstractions is hard and so we often achieve code re-use through bad abstractions.

But really good abstractions add clarity to the code.

And thus, a good abstraction may be worth using when there is only two, or even just once instance of something.

If an abstraction causes a loss of clarity, developers should try to think if they can structure it better.

EDIT: This comment below talks about good example of how a good abstraction adds clarity, while a bad abstraction takes it away: https://news.ycombinator.com/item?id=31476408


When I'm asked "what if it changes?", I usually answer with something like "we'll solve it when, and if, it happens". I'm a fan of solving the task at hand, not more, not less. If I know for sure that we're going to add feature X in a future version, sure I'll prepare my code for its addition in advance. But if I don't know for certain whether something will happen, I act as if it won't. It's fine to refactor your code as the problem it solves evolves. You can't predict future, and you'll have to be able to deal with mispredictions if you try, too.


It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

If you write an integration test, and it fails, what's broken?


> It takes me twice a long to get my integration tests to work if I don't have unit tests making sure the parts work along the way.

That's a valid concern, but if your unit-tests are only for making sure that part you just wrote works as expected than just have a test-case up for that specific part, and change it when you move to the next part.

The value of unit-tests is supposed to be regression testing: when you change something that breaks a different unit in a different part of the stack.

> If you write an integration test, and it fails, what's broken?

Well, I debug it it the same way I debug any bug. After all, most bug reports are from a full execution in the field; I am probably already set up to debug the full application anyway[1].

[1] Once a bug reproduction is set up in a fairly automated way.


you know... you debug and find out


The thing about unit tests is that the better they are, the less you have to debug.

A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.


> The thing about unit tests is that the better they are, the less you have to debug.

>

> A very good one, with effective use of matchers, you don't even have to read the test to know what you did wrong. You just know from the error message what you broke.

Agree, and agreed. The counterpoint is that unit-tests take time to write and time to maintain - you have to balance that time spent against the time that you would spend debugging an integration test.


Integration tests take far more time to maintain. Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.

If your unit tests are hard, you need to refactor your code.


> Integration tests take far more time to maintain.

So? You're going to have them anyway or else you can't deploy.

> Major functionality changes can affect all of your tests. With unit tests they may invalidate a few, but that’s okay because they were cheap to begin with.


Orders of magnitude matter, and if you have a testing pyramid instead of a testing ice cream cone, there can be up to two orders of magnitude difference between the number of unit tests and integration tests.

If you start with unit tests, then the integration tests are just verifying the plumbing. That only changes when the architecture changes, which is hopefully a lot less than how often the requirements change substantially.


Not to mention most unit tests are utterly useless in reality and test things we know to be true (1 + 1 -level nonsense), not real edge cases.

The logic that usually gets ignored in unit tests is the ones that actually needs to be tested, but skipped because it is too difficult and might involve a few trips to the database which makes it tricky (in some scenarios you need valid data to get a valid test result, but you cannot just go grab a copy of production data to run some test).

And then there is the problem of testing related code, packages and artifacts being deployed to production which is really gross in my mind and bloats everything further.

A team I've worked on has resorted to building actual endpoints to trigger test code that live alongside other normal code (basically not a testing framework), so that they could trigger test and "prove the system works" by testing against production data at runtime.


Your message is just a collection of ad-hoc points with no structure, context or justification for any of them.


The "message" is a response to the last paragraph.

>The reason a lot of Java or C# code is written with all these abstractions is because it aids unit testing.

That is the justification for talking about testing. Code is being ripped apart to make it easier to test, while the tests that are used as a justification for ripping apart the code are low quality as 99% of the work in unit testing is thinking of and setting up the test case, not the actual test code.


"Copy code if it's used twice" is terrible advice. You are creating a landmine for future maintainers of your code (often yourself, of course). Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue - except that your tests will probably also reflect the duplication and you'll also forget to change the 2nd test.

The only possible justification for duplicating code would be that creating an appropriate abstraction is harder. Given that there are generally economies of testing when you factor out common code, that's usually just not true.

"Duplication is evil" is a more reliable mantra.


It's a rule of thumb, not a hard rule. If I have something I need to use in a separate project, I'm copying it. I'm not going to write a library just so I can import it into 2 different projects.

Yes it means stuff needs to be changed in 2 places. Yes it means someone can change one and not the other. But it also means that each thing can be maintained on it's own without worrying about another. In the early stages you don't know how much can truly be reused and whether you're just cornering yourself. I've had scenarios in the past where we've written an abstraction around some common code and then the third application we want to use it in just does not fit the model we initially thought of. Could we change the library? Yes obviously. But are we going to face the same issue on the 4th project to use this library? Probably. It's a large maintainance load. At some point you end up making breaking changes to the library and you're comitted to either maintaining multiple major versions, or maintaining an abstraction that is supposed to work for every scenario, which can be a huge time sink.

There are tradeoffs to be made. I'd rather lose the maintenance burden of library when consumers have vastly different needs and just take the hit if having to do a Sourcegraph search for usages of some code. This search would need to be done to find all consumers of the code anyway if it was a library. So the end result is rarely different in my experience.


Imo, the correct rule is "copy if it is more likely to be modified separately" and "create one method if it likely has to change at the same time".


Excellent advice! I wish more programmers paid attention to this than just the 2-3 rules. The 2-3 rule tends to create unintentional tight coupling between things that becomes an iron bar that is even more evil to rectify.


>Someone will inevitably change only one of the two versions at some point in the future and then you're going to have to rely on tests to catch the issue

It works both ways

Someone modifies code thats used in both places and breaks othef thing


It is important to carefully look into the functional context where that abstraction is used.

If you are looking for example into System Integration, Data Integration, ETL and so on, not using a canonical format from the beginning, will get you into the type of almost exponential grow in mappings between sources and targets.

https://www.bmc.com/blogs/canonical-data-model/

https://www.enterpriseintegrationpatterns.com/CanonicalDataM...


I think the test pyramid still has legs. Write both.

I do agree a lot of abstractions in C#/Java seems to be testing implementation stuff leaking into the abstraction layer. A lot of inversion of control in these languages seems purely to allow unit testing, which is kind of crazy.

Personally I prefer the "write everything in as functional a style as possible, then you'll need less IoC/DI". This can be done in C# and Java too, especially the modern versions.


I have a general rule:

Once is an incident. Deal with it.

Twice is a co-incident. Deal with it. But keep an eye out for it...

Third time? Ok, this needs properly sorting out.


Mr Bond, they have a saying in Chicago: 'Once is happenstance. Twice is coincidence. The third time it's enemy action' - Goldfinger


I got handed off a little online customer service chat application at a previous job, it had been written by someone I put at a similar skill level as mine but with different personality traits. One of his personality traits was to code to the spec and not consider "what if it changes".

This online chat had two functionalities, chat with a worker a and leave a message from worker with suggestions as to what to look at in response, there was no connection between these two functionalities specified and so my friend had written it without connection, it was difficult without doing a full rewrite to get state information from one part of the application to another one (this was written in JQuery)

Anyway, 6 months+ down the line it got respecified, now it needed shared state between the two parts of the application, which meant either significant rewrite or hacks, so hacks was chosen. Ugly hacks but worked (I think ugly hacks was the correct choice here definitely because chat application almost completely scrapped a year later for a bot)

After I was done I said "but why write it like that?" It was specified no state was needed between the two parts, "yeah but it should be obvious that is going to change, they would keep wanting to add functionality to it and probably share state as to the two communications channels"

tldr: there are some potential changes that seem more likely than others, and the architecture should take those potential changes into consideration.


Wow I always believed this but was too scared to admit because its not fashionable to think this way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: