Would A Unit Test Have Stopped the Google "Malware" Bug?
A recent glitch in Google's search engine categorised every website in the index as a source of malware. Bloggers were quick to offer their opinions about the fault. TDD proponents, of course, opined that the problem would never have occurred if Google's engineers had written unit tests (although it must be said that there is no evidence that they did not). This prompted a perfectly reasonable response in the comments to that article: programmers cannot write tests for problems they have not thought of, so writing unit tests would not have helped.
It is true that programmers cannot write tests for situations they have not thought of, but it does not follow that writing tests would not help avoid this kind of situation.
When teaching Test-Driven Development I have observed that few programmers are skeptical enough about the code they write. They focus on success cases, ignore edge cases and error cases, try to bang out as many features as possible, and don't notice that those features are incomplete or will not work in some situations.
For example, as an introductory exercise I sometimes get students to write a class that extracts comments from Java source code. I give the students the definition of comment syntax from the Java language spec and ask them to implement a number of features. The first one is to parse a string containing Java source into a list of strings containing the comment text, without the comment delimiters. Further features do more sophisticated processing of the parsed comments. I am very insistent that the features are listed in priority order: earlier features are much more valuable than later ones. A solution that implements feature 2 but does not fully implement feature 1 is worthless.
I have found that, despite my insistence, most students will only partially implement the first feature before starting work on later features. Their code can parse single and multi-line comments but they ignore the fact that comment delimiters can appear in string literals even though the language spec explicitly describes this case and, more surprisingly, despite their own test code containing many string literals with embedded comments delimiters that are obviously not being treated as comments.
I guess this is why people often say that testers and programmers think differently. Programmers think about how code works; testers think about how it might not.
As programmers get more experienced with writing automated tests, they get better at thinking about how their code might fail. That's because they get plenty of opportunity to observe where their tests are insufficient and think about how to avoid such problems in the future. Practices that have helped me become more skeptical about the code I write and better at writing tests to capture that skepticism include:
- Test Triangulation
- Ping-Pong Pairing
- Writing a test to demonstrate a defect before diagnosing the cause.
- Writing a unit or integration test to demonstrate the cause of a defect, before fixing it.
- Reflecting on how well our test strategy is working in heartbeat retrospectives.
Having tests won't catch unexpected defects. Adding "just one unit test" won't help avoid a systemic failure like that which Google suffered because many units contribute to each system feature. But writing tests and reflecting on that practice as an integral part of the programmer's job will make it more likely that programmers will think of unusual failure cases, know what kind of test is most appropriate to exercise each case, and protect the system against them.
If you have any practices that help you hone your skepticism, please leave a comment about them.