100,000 e2e selenium tests? Sounds like a nightmare

columbo · on May 14, 2014

> I would definitely choose a comprehensive suite of automated unit tests over a comprehensive suite of end-to-end/system tests any day of the week.

Huh, after working on very large applications with tens of thousands of unit tests I'd choose the opposite.

I've found little value from unit tests. They're sometimes good enough for catching the lowest common errors. With very complex applications that involve a great deal of user input they wind up only being a CRUD layer test.

I've gone the salesforce route, instead of using selenium all my "unit tests" are executed against the rest api.

In order to test "Deleting a product from the system" I have to create a user, assign the user as an administrator, log off of the root admin, log in as the new user, create a product, delete a product, get the product list and verify the product isn't on the list.

Typically this would be answered in a unit test like this:

    p = productDao.create("My New Product")
    assert true p.delete()

This type of unit test really doesn't give me much confidence that the entire system is working.

ymmv

jameskilton · on May 14, 2014

No-one ever said that integration tests and unit tests are mutually exclusive.

Integration test the happy paths, small integration suite that proves everything works together.

Unit test the individual units of functionality or behavior, whichever you're more comfortable with. Unit tests aren't supposed to tell you that the whole system works. Unit tests exist for two main reasons: Confidence that you can refactor your code without breaking functionality, and proving that your components are doing what they're supposed to do. Yes you technically can accomplish this with just an integration test suite but you will find refactoring bugs much harder to track down because the test failures aren't telling you exactly what went wrong.

You can never have full confidence in a system with just one suite of tests. You really need multiple suites. And when you TDD it all the way down (Integration test first, then work through the stack TDD-ing each part, when done Integration test and all unit tests should pass, and feature is done!) it's not a chore but actually quite enjoyable.

falcolas · on May 14, 2014

In my experience, if your integration testing is only happy path testing, you're doing integration testing wrong.

Integration testing should, at a minimum, include:

    * Combinatorial testing
      * i.e. testing all possible input combinations for every pair of two inputs
    * Fuzz testing
    * Limit testing
    * Success path testing
    * Failure case testing

Integration testing will discover more component interaction bugs than unit testing alone, and doing only happy path testing is doing it wrong.

On the other hand, I wholly believe that testing done by the developer alone will never catch as many bugs as testing done by both the developer and a QA engineer to the mix will. They think differently than we do, and as such are adept at finding our blind spots.

jameskilton · on May 14, 2014

We may have a wording definition issue here. I've heard multiple definitions of "integration tests". I usually use them where some people use "acceptance tests", aka, top-down be-a-user type testing (selenium in Firefox, for example). Sorry if there was any confusion there.

Agreed having a QA team also go through the app is great for finding things that tests can't catch, like usability errors, or crazy edge cases you didn't think about, etc. I have gotten to work with a good QA team and it's amazing the things they find.

brazzy · on May 14, 2014

The problem is that refactoring definitely can also cause components to not do what they're supposed to do outside the happy path.

jameskilton · on May 14, 2014

Then you're not refactoring. Refactoring is defined as changing how the code is implemented without affecting the functionality / behavior of said code.

Also, Unit tests are where you ensure the non-happy-paths are functional (error handling, input robustness, etc).

Retric · on May 14, 2014

That's a ridiculous definition, that's like suggesting coding is turning requirements into bug free computer code. Refracting is attempting to replace working code with code better suited to your long term goals.

twic · on May 14, 2014

The standard definition of refactoring is that it doesn't change behaviour. As Martin Fowler put it:

Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.

You correctly identified a consequence of this in your later comment:

> In your definition if you introduce or remove a bug your not refactoring.

That is indeed the case.

And also correctly observed that:

> realistically for any sufficiently large change bugs can and will be both added and removed.

If a change adds or removes a bug, it was not a pure refactoring. It may have been an attempt at a pure refactoring, and it may have been otherwise successful, but the introduction of a change in behaviour means that it was not purely a refactoring. This isn't necessarily a bad thing - i'd rather bugs be removed than not! - but it's possible to distinguish the refactoring and bug-fixing aspects of the change. Ideally, i would like to see those aspects formally separated, for example into separate commits in source control. But even if not, we can at least use the terminology correctly and precisely.

Retric · on May 14, 2014

The term refactoring dates back at least to the 80's well before TDD showed up and it was used as a shorthand for cleaning up code without the focus on tiny changes. More importantly bugs are often related to side effects such as how long a method takes as such using even one more or one less cycle any any code path prevents code from being 'pure' in your definition as it may add or remove a bug as would changing it's memory footprint etc.

Granted, that may seem pedantic but if you look closely you realise nobody uses your 'ideal' definition in practice.

PS: Feel free to use / introduce new terms such as 'Pure refactoring' but understand they don't change what refactoring actually means. As to popularity wikipedia's "does not modify its conformance to functional requirements" suggests that's the commonly understood definition.

jameskilton · on May 14, 2014

I don't see how your definition of refactoring and mine differ.

Retric · on May 14, 2014

In your definition if you introduce or remove a bug your not refactoring. More basically for things like simulations or graphics you may trade accuracy for speed so your code may behave slightly differently, but be much faster. The important bit with refactoring is the goal is not fixing a specific bug, however realistically for any sufficiently large change bugs can and will be both added and removed.

Edit: "Typically, refactoring applies a series of standardised basic micro-refactorings, each of which is (usually) a tiny change in a computer program's source code that either preserves the behaviour of the software, or at least does not modify its conformance to functional requirements." http://en.wikipedia.org/wiki/Code_refactoring

jameskilton · on May 14, 2014

I feel we may be getting into bike-shedding territory here. When I'm refactoring, as part of the TDD Red/Green/Refactor cycle, I'm changing how code is implemented, usually to improve the design and readability, such that the functionality stays the same e.g. all of my tests pass without touching them. Anything I gain from the refactoring is bonus on top.

Refactoring without introducing bugs is the #1 reason people do TDD vs Test After. But if you don't trust the test suite to catch bugs introduced when refactoring then I'd argue the test suite itself probably isn't that useful and needs work.

On a side note I believe the term "refactoring" has been way overused these days. You can change code without it being "refactoring". When I've seen people use the term "refactoring" (and I've caught myself multiple times on this) they mean rewriting, for in most cases this rewriting means changing the code and changing the tests. If you have to change the tests because of changes to the code, that's not refactoring, that's just changing code.

brazzy · on May 14, 2014

Well, then by that definition you are not refactoring either, and in fact nobody is refactoring. Because any change can unintentionally change the behavior. That's what we call a bug, and them bugs don't care about your definition.

And Unit tests alone cannot ensure that the non-happy-paths are functional.

jlu · on May 14, 2014

@jameskilton you seem to be very insightful in regard of testing, mind if I ask a couple of questions in the future privately? Thanks.

jameskilton · on May 14, 2014

I'm always happy to discuss testing! My personal email is in my profile.

jlu · on May 14, 2014

Thanks, much appreciated.

dreamfactory2 · on May 14, 2014

> TDD it all the way down (Integration test first, then work through the stack TDD-ing each part, when done Integration test and all unit tests should pass, and feature is done!)

Interesting. I think this is probably right and also the reverse of what most people try to do (they start with units and work up)

ryanjshaw · on May 14, 2014

I find this approach has a number of advantages:

- it forces you to flesh out your conceptual/paper design in a deliverables-oriented way and makes it obvious where the gaps are, which gives you a good idea of how ready you are to wrap up design and start development, and in turn can be used to create your work breakdown structure and task prioritisation; sometimes it can result in identifying simplifications upfront (saving you time down the line)

- when you have a good idea of the scope and complexity of a solution like this, you're less likely to waste time on the less important stuff

- if you're the tech lead on a team, giving your developers the high-level components and tests means they're more likely to get things right (or at least limit the damage they do when they get things wrong)

Without component-level unit tests or design, you can easily dive into a solution without thinking it through and really regret it.

switch007 · on May 14, 2014

>I've found little value from unit tests.

Not even when re-factoring? Refactoring code without unit tests makes me extremely anxious and I try to avoid it.

columbo · on May 14, 2014

Yeah it all depends; unit tests can make refactoring even more difficult (especially if you heavily rely on mocking).

I've done major refactoring (moving from one language to another) and the testing was done entirely on the rest api. With several thousand tests I felt very confident that the user experience and expected behavior remain unchanged.

Honestly I don't hate unit testing and do I use it on a regular basis, the article was rather heavy handed.

twic · on May 14, 2014

Refactoring code without tests is definitely scary. But those tests don't need to be unit tests.

(caveat: i understand "unit test" to mean a test which tests a single class, or sometimes a very small number of classes, with any collaborators replaced with stubs or mocks)

In my experience, bigger tests - what my colleagues call integration tests, which might involve 3-30 classes and often the database - give me a lot more confidence in my changes.

Tests are only a useful safety net for changes which are entirely confined to the thing they test. If you're changing the way some particular method is implemented, say from looping over a collection to mapping a lambda over it, then the change is confined to that class, and a unit test can help. But if you're changing the method's contract, say from taking a collection of objects to taking one at a time, then the change takes in that class and its clients, and the unit test is useless - it tests that the method does something that it should no longer do! Yes, you can rewrite the unit test, but that does nothing to reassure you that the overall behaviour hasn't changed. You're starting from scratch.

The thing is, in my experience, valuable refactorings tend to take in more than one class. Sometimes much more than one class. I want to be able to make those refactorings with a safety net. Unit tests can't give me that.

MaybiusStrip · on May 14, 2014

I think it depends how tightly your business logic and your view are bound and how well your business logic is abstracted. I use Angular at work and 90% of the bugs we get can be caught by a unit test. On a rare occasion something will be mis-labeled on the markup and the two-way-binding "fails" silently, but usually our bugs are errors in the logic.

Whenever you fix a bug in your code, try to cover it with a unit test. If you can't, ask yourself whether it might be because the code is poorly abstracted.

I think there are definitely varying degrees of value in unit tests. Your example is definitely not super valuable, but unit tests on higher level methods can save a lot of trouble. When I code I want to rest assured that the different pieces I am using work. I want assurance that this class I'm about to use has an API that does exactly what it says. If I want to change modify the class, I want to see exactly what assumptions I'm breaking that users of this class might be making, and I want to be able to do this in under a second. This can have a drastic impact on developer velocity. I don't know of any other way of doing this than unit tests.

matwood · on May 14, 2014

Whenever you fix a bug in your code, try to cover it with a unit test. If you can't, ask yourself whether it might be because the code is poorly abstracted.

This, big time. Whenever I get a bug report I first write a test that fails because of the bug. Only then can I a) be sure I understand the bug and why it happened and b) be sure I fixed the bug and finally c) prevent a regression in the future.

Even for people who are not TDD or big on tests in general can make great use out of writing tests to verify and fix bugs.

meric · on May 14, 2014

Also it's much less tedious when developing a web app to test your code using unit tests, rather than doing curl commands and/or actually launching the frontend to test the API manually and trying to glean errors from the web inspector.

brown9-2 · on May 14, 2014

Typically this would be answered in a unit test like this:

This would be a really bad way to test all of "create a user, assign the user as an administrator, log off of the root admin, log in as the new user, create a product, delete a product, get the product list and verify the product isn't on the list".

You would be ignoring 80% of the flow.

The goal of unit testing is to test each piece in isolation and to provide confidence that when you have to assemble a lot of little pieces of code to form a larger piece of functionality, that each of those little pieces is working as it's contract says it should.

If your integration test containing 8 steps failed, would you know which piece of the puzzle caused the failure from the integration test report itself?

Of course, you could go the route of testing both small pieces of functionality and also testing large pieces (the integration tests), so that you can have overlapping levels of confidence.

briantakita · on May 14, 2014

> I've gone the salesforce route, instead of using selenium all my "unit tests" are executed against the rest api.

I've taken the same route for apis and for client side testing.

On my current project, I'm creating a rich client. I have an extensive jasmine test suite with edge case and race condition testing.

I created a library called jasmine-flow, to organize the tests into flows. This allows edge case testing while reducing duplicate test setup. In my case test suite time was reduced by 10x.

https://github.com/btakita/jasmine-flow

opendais · on May 14, 2014

I agree with you.

In my experience, the value in unit tests is in refactoring. I'd only write them as I refactor [to confirm the functionality matches before/after the refactor].

Most of the modifications we do to common libraries, etc. involve higher level rewrites that would break the unit tests anyway and the only way to confirm things still function is integration tests. :/

lmm · on May 14, 2014

Does deleting products ever break? I wouldn't bother with any tests, unit or functional, for a code path that doesn't have any logic in it - if it compiled then it's almost certainly correct.

Where testing is useful is when there's complex logic. And such logic is much easier to test at the unit level.

jameshart · on May 14, 2014

In most real systems, deletes of core business objects like 'product' are really only soft deletes; there are caching layers; deleting things has knock-on consequences for other things (if you delete a product when an instance of it is in a customer's shopping basket, what happens?) - it doesn't seem illogical to want to test that when you 'delete' a product, it disappears from the product catalog... but then you'd want to do other tests to verify it remains in a shopping basket, and is still visible in purchase history, and so on.

lmm · on May 14, 2014

Fair enough - but in that case we're really not talking about the kind of test the grandparent described, the one that looks like

    p = productDao.create("My New Product")
    assert true p.delete()

jameshart · on May 14, 2014

The grandparent was pointing out that a unit test like that doesn't give you any confidence in your system precisely because it fails to exercise the broader context - so while it tells you that 'yup, you deleted it', it doesn't tell you that 'other code is now treating it as deleted'. GP was arguing that a unit test was not useful but a functional test was.

You, on the other hand, argued that simple deletion didn't need ANY tests, unit or functional - I was responding with examples that demonstrated that functional integration tests for deletion are perfectly valid.

lmm · on May 15, 2014

You need to compare apples to apples. An integration test that tests some logic is better than a unit test that doesn't test any logic, duh. But if you're not testing any logic then the test is useless either way. And if you are testing something like cache invalidation, then you can do that just as well, probably better, at the unit level.

columbo · on May 14, 2014

Actually yes! The gist of the story was deleting a product was not removing it from the product list because of query caching. It was something you would only have found when running against the site itself as query caching doesn't exist at unit or integration testing time. That was an interesting one. Our rest layer tests found it, although even still it took awhile to figure out (how the #@!$ is it still in the list?)

pfg · on May 14, 2014

I can imagine deleting products may be failing due to some UI change. Maybe a button isn't visible in certain IE versions. Maybe the JS code executing the backend request is failing in some browsers. That's where e2e tests are useful.

lmm · on May 14, 2014

Those are purely failures in the UI layer. To the extent that "unit testing" a web frontend is possible, you could catch those problems with a pure unit test that mocked out the backend, it wouldn't have to be an end-to-end test.

sp332 · on May 14, 2014

Unit test are more useful for refactoring, making sure the system works the same before & after.

dasil003 · on May 14, 2014

> Now if we could just convince SalesForce to be more like Airbus and not fly a complete plane (or 50,000 planes) to test everything every-time they make a change...

I agree with the overall thrust of the article. Unit testing is indispensable, especially for me as a rubyist where there is basically nothing checked before runtime. But I find this conclusion that SalesForce is "doing it wrong" to be just as arrogant and presumptuous as DHH's blustery approach.

Now I don't know anything about SalesForce's architecture, but one thing I do know is that no system of that size exhibits the same ideals that can be maintained by a small team of brilliant developers working on the bleeding edge of some greenfield project with a set of stakeholders that fits in a single room. That's not to say that SalesForce couldn't use unit tests, but just that it's the height of arrogance for us as outsiders to presume one way or the other.

Scaling software development isn't solved by strict adherence to the dogma of idealistic methodologies any more than scaling traffic is solved by plugging in a "web-scale" nosql database into your Heroku. The reality is that scaling is done by removing bottlenecks over and over until the system that remains is an organic result reached through evolution rather than any architect's brilliant grand design.

noir_lord · on May 14, 2014

> But I find this conclusion that SalesForce is "doing it wrong" to be just as arrogant and presumptuous as DHH's blustery approach.

Considering this is a business that is spending vast sums of money on doing this I think you are spot-on, yes we all joke about businesses throwing good money after bad but someone had to sit down and defend this approach.

hugs · on May 14, 2014

Selenium creator and Sauce founder here. Not enough automated e2e is also a nightmare. (Ask me about my experience helping to fix HealthCare.gov sometime!) As usual, the right answer is: a little bit of every kind of test, depending on the risks involved. The article starts (and ends) quite hyperbolic, but comes to the same "everything in moderation" conclusion.

ffreire · on May 14, 2014

I'd actually be extremely interested in hearing more about your experiences with HealthCare.gov. I'm working on implementing test infrastructure for my current employer and would love to learn from folks more experienced than myself about the process!

hugs · on May 14, 2014

Thanks for taking the bait. :-) I don't have time to write about it today, but hope to someday. Since this won't be the last time the government creates a website, the government needs to get good at this stuff. The tech team did an amazing job rescuing the site, and I hope the geeky lessons (especially from the testing perspective) get shared someday.

ffreire · on May 14, 2014

I'll keep my eyes peeled, then!

crb002 · on May 14, 2014

I spent a year doing this for a Fortune 500. When you have a fractured back end there is no other option. Here is a talk I gave to our local Ruby users group. Code works out of the box on Nitrus.io, https://github.com/chadbrewbaker/LiterateFunctionalExplorati...

1) Use the page object model. http://martinfowler.com/bliki/PageObject.html

2) Think about your functional tests like tours through a city. http://ptgmedia.pearsoncmg.com/images/9780321803023/samplepa...

3) As your tests become more advanced abstract them locally to two simple questions. What page object did I come from, and what action did I do on that page to get to this page?

twic · on May 14, 2014

This article manages to spend quite a lot of time talking about its author's hurt feelings before it gets to anything that resembles an argument. When it does, it's this:

The real problem with end to end tests is that when end to end tests fail, most of the time you have no idea what went wrong so you spend a lot of time trying to find out why.

I've worked on big systems with predominantly (in some cases, exclusively) high-level test coverage. I can confirm that this indeed a thing. Unit tests very rarely fail in such a way that the cause of the failure is not immediately obvious. High-level tests often do. Not most of the time, but often.

Here's the thing, though. When a high-level test fails without an obvious cause, that is something that unit tests would not have caught at all.

Obvious causes are things that are localised defects in individual parts. Your product listing is showing in the wrong order? The error is in the sorting code. Orders in Japan are being billed with prices off by a factor of about a hundred? The error is in the currency conversion code. Unit tests would catch those errors and make their location blindingly obvious. High-level tests catch them, and make their location sufficiently obvious; obvious enough that pinpointing the problem is not a significant speed bump.

Non-obvious causes are things that are defects in interactions between multiple, often distant, parts. Only every second FTP'd file is triggering emails to users? Turns out the error handling code in the parser is disabling the scheduled refresh checking job, which is then not triggering the batch job to recalculate the numbers, which is then not feeding new data to the email job. Every single part is working as intended, but the composition of those parts is not. That's a tough problem, and when the high-level test catches it, you will have hours and hours of the wrong kind of fun unravelling the strands of cause and effect to pinpoint the problem. But unit tests would not have caught it at all.

enos_feedler · on May 14, 2014

Why do we have to choose one or the other? Its ridiculous to stack these test methodologies against each other. You need both for different reasons. Unit tests verify the function of a unit. System-level tests verify the integration of units with other units. Instead of writing an exhaustive list of both, you learn over time the kinds of tests you need for each scenario.

asdavey1 · on May 14, 2014

I tend to agree - but if I could redraw that triangle it would be more like a trapezium. Not enough emphasis is given to system tests - and it shows in the tooling and the community mentality.

On my current project no one would dream of adding a new feature without unit tests - but everyone turns a blind eye to integration and system tests because "they're hard".

giulianob · on May 14, 2014

That's the point of the testing pyramid at the end of the article. My rule of thumb is this: if you can test something at the unit testing level with mocks and feel confident about the behavior then do so. Otherwise, start moving up the pyramid (isolated integration tests > end of end integration tests > UI tests).

The higher you get the more time it takes for tests to run, the more difficult it is to identify where a failure is, the more brittle those tests will be because of inter-dependencies, and the more complex it is to write the test. The difficult part is understanding your system well enough to know where your unit tests aren't sufficient.

jonknee · on May 14, 2014

Your unit tests aren't going to catch browser bugs which sounds like the big reason there are so many "tests" (it's 7,500 tests across a lot of browsers). I bet they have a lot of unit tests too for components that aren't on the user facing web app.

The slide "Why Salesforce loves WebDriver" explains it perfectly. It's a feature--not a bug--that they experience latency (users do too and latency is realllly important for web apps).

It is a lot of VMs, but Salesforce is a $32B company entirely based in the cloud, a few million bucks of metal to test their breadwinner seems cheap.

briantakita · on May 14, 2014

> but I’d much rather replace those with even higher level system tests through Capybara or similar

I've been doing this for years. It's been working out very well. I'm glad test abstraction scope is being talked about now.

twic · on May 14, 2014

Oh cripes, me too! And i'm glad that there's a genuine plurality of views.

Test scope has been talked about for years, but almost always in the form of sermons by fervent believers in unit testing, castigating those who dare to lean more heavily on system and integration tests for their dangerous heresy.

Occasionally, these sermons admit the existence of a "testing pyramid", but take note of the necessary shape of that edifice: it's unit tests, but with a decorative cap of other kinds of test inexorably dwindling in volume as they reach higher levels. I'm not interested in this all this alleged pyramid science. Talk to me about testing towers, testing dolmens, then we can have a genuine conversation.

matlock · on May 14, 2014

I think this shows a different understanding than for example we follow at Codeship.

Tests aren't there to test our code, but to make our Users happy. e2e tests make sure we don't break features and are able to validate new features don't kill the most common workflows. Thus we can innovate quickly, build new features and refactor. This makes our users happy. Unit tests, controller tests or others are mainly an optimisation as writing an e2e test for every failure state is not an option.

It's a different understanding of why we test, but makes sure we focus on the right things with it.

msoad · on May 14, 2014

I'm doing a lot of end-to-end testing recently. We have 100% coverage in unit level but every time we run end-to-end tests, a new issue pops up.

To my eye, all those unit tests are expect(true).toBeTrue() and no more!

binarydogs · on May 14, 2014

I find it strange that anyone would opt to move entirely away from one type of testing at all.

The benefits between unit tests and e2e system tests are varied and surely that's the main benefit?

asaikali · on May 14, 2014

I think the difference between an Airbus Airplane and a Software System is that once Airbus chooses a material that it meets it characteristics it does not go back an change that material. And the material does not change by itself. So the airbus test pyramid works because the bottom is stable.

In my personal experience in software systems is that the bottom is not stable you never end up selecting a material and having component that never changes after you build it. All layers of an application tend to change as the application changes and user needs change. Therefore you can't really say that oh we tested the bottom of the pyramid and know for sure that it works.

Writing end to end tests for application is quite hard work and requires a lot of though to design an application that can tested both at the unit level, the component level, and the system level.

Even though end to end testing is very hard it's value is massive as an industry we should be focused on lowering the cost of end to end testing rather than saying that unit is good enough.

joshdance · on May 14, 2014

From my experience, we don't need to discuss if we should use unit tests, or integration tests. We just actually need to write tests. They are often left to last, never completed or poor. If you have an exhaustive set of either, good for you!

dominotw · on May 14, 2014

Oh the weekly HN unit tests suck/unit tests are design tool discussion.