Sunday, January 31, 2016

The value and the limits of unit testing

Swyfft’s website (https://swyfft.com) is now live – for a certain definition of “live”, in this case, meaning the coastal counties of Alabama. Alabama is where we hope to get some good initial sales metrics, before we expand out to more states.

When I joined Swyfft back in April, depending on how you looked at it, the website was maybe 90% done, and maybe only about 10% done. I’m sure you’ve heard the old developers’ saying, “The first 90% of a project takes 90% of the time. The last 10% takes the other 90%.” And that was certainly true here.

In addition to adding more features, one of the biggest things we’ve been working on since April is automating our testing process. My strong preference, whenever it’s feasible, is to rely heavily on pure unit tests, with all dependencies mocked out. It’s easier to isolate the code under test when you do it that way, it’s easier to get the tests running on your build server, and the tests themselves run a lot faster.

Unfortunately, that approach has limited utility at Swyfft. The main reason is that our system relies heavily on over a million rows of meta-data, spread over 20 or 30 tables. Many of our most critical tests can be summarized as, “Does this code interact correctly with this metadata?” And by definition, that’s the sort of thing you can’t do with pure unit tests.

Consequently, I’ve split up our automated tests into roughly three categories:

  1. Unit tests. These don’t talk to the real database, and any dependencies are strictly mocked out (we use Moq for this).
  2. Integration tests. The entry point for these tests can be anywhere from the API controller in the website, down to the lowest-level repository, but the key is that they actually talk to the real database – though other dependencies (such as Authorize.net, our payment processor, or IMS, our agency management system) are generally still mocked out.
  3. Acceptance tests. The line between these and integration tests is a little fuzzy sometimes, as we still use the same testing framework for both (we recently switched from MSTest to xUnit). But basically, the idea is that we don’t mock anything out, and each test exercises the system from front-to-back, usually in defined flows. We use Selenium to drive a web browser for tests that involve the website.

This works. Mostly.

Don’t get me wrong – our testing infrastructure is critical to Swyfft, and we couldn’t have gotten where we are without it. The thousand or so tests that we run on each build are an invaluable safety net. But there are plenty of things we’re going to need to improve. A few of them:

  1. Selenium is susceptible to random failures. About half of our test runs will trigger at least one unrepeatable Selenium test failure. We’ve tried different browser drivers through Selenium, and Chrome is maybe the least problematic, but every one I’ve tested seems to have similar problems, just to different degrees. Those random failures significantly decrease the value of having a build automation server: just because Jenkins rejects the build doesn’t mean anything if you know that there’s a 50% chance the build was rejected because of something that isn’t really a problem in your code.
  2. The whole thing is really slow. I prefer my test automation suite to run in under 60 seconds. Anymore than that, and running the tests become something you do when you think you’re done coding, rather than something you do repeatedly as you’re ginning up a new feature or performing a quick refactor. And unfortunately, our tests take about 15 minutes to run on a fast machine. You can work around the slowness, and we do, but it really interferes with getting realtime feedback.
  3. Keeping the DB in sync on our build automation server is a PITA. We could regenerate the DB for each run, but it takes about 15 minutes or so just to load the metadata, and the test runs are long enough already.
  4. We’ve done almost nothing about JavaScript unit testing. The site is simple enough so far that we can get by with UI-level automation testing, but as the site gets more and more complex, that’s not going to be sufficient. We could probably get rid of half of our problematic Selenium tests if we were able to verify the JavaScript (or rather, TypeScript) code at one or two levels lower down the stack.

I’d be interested in hearing how other folks have solved these problems. Please let me know!

No comments :