The problem with test documentation

The agile manifesto explicitly values ”working software over comprehensive documentation.” In test, this means that actual testing is valued over test documentation. I would have put it this way: Focus is on the quality of the product, not on the quality of the documentation of the product.

We can probably all agree that it’s more fun to test and explore software than writing documentation. But it would be too radical, if we skipped writing documentation at all!

I think, however, that we testers should be more critical about the documentation we actually do produce, and that we should look for ways to improve the documentation we’re making.

The problem with documentation is not that too much time is spent writing it instead of actually testing the product. The problem is that the quality of the documentation is often just not good enough.

Most organizations have standards for test documentation and require their testers to write specific types of documents based on mandatory templates.

Templates are generally a good thing, but they can be problematic if you limit your writing process to ”filling in the gaps.” A good document contains useful information to the reader, so the most important step in the writing process is finding out what information is actually needed.

Not all sections of a template can be equally useful in all contexts (or useful at all), and very often you need to document things which there has not been left space for in the template.

But, finding out what to document is not trivial. Basically, it requires that you know the context of the project you are working on. Determining context can be difficult if you are new on a project, but asking questions can give you answers which will help you define the context.

Here are some questions, which I find useful:

  • Who will be reading the document? How will they be reading it? The challenge is to include content which the actual readers will find useful and to structure the content in a way so that they can find that information.
  • What information are they looking for? Try to get people to answer in concrete terms rather than using abstract vocabulary. The problem with written documentation is that stake holders will often not read it – most testers seem to prefer to explore systems on their own rather than reading documents, and managers will often not have time to read everything, but will only check the headline and certain details. But if readers get what they look for, chances are that they will read the document.
  • What kind of analysis do I need to carry out on the system to better understand it and test it? Writing a document about the system can often help me understanding it. I will discover missing knowledge and knowledge which I didn’t see initially. The writing process is part of the analysis.
  • Are there regulatory requirements which specify that testing should be documented to a certain detail and in a particular way? In this case, the test documentation is actually a product of the project.
  • Should the document assist knowledge transfer once I’m no longer on the project? In that case, the big question is what kind of knowledge should be ”transferred.”

I checked the section about documentation in Kaner, Bach and Pettichord’s Lessons Learned in Software Testing a few days ago. They have a better and longer list of context-free questions to ask about documentation which will help you find out what kind of documentation you need. They also list a few references to other lists of useful questions, so I suggest you take a look at that too.

Are there any specific types of documentation which is particularly useful? Indeed, there is:

  • Mind maps are becoming increasingly popular with testers, but ‘old style’ software diagrams like swimlane diagrams, state diagrams, and flow charts etc are still working well. The problem with mind maps is that they are often personal to the author of the map and not very easy to read.
  • Test scripts can be very useful in initial stages of knowledge transfer: A script can help another tester finding out how a certain procedure is performed in the system. However, a script will by itself tell the tester anything about the context of the script, and this is something which is often missed: Knowledge about a system under test is much more than knowing how to do things.
  • Check lists are actually much more useful than the name implies: A check list will list things to verify, but unlike a script will not specify in detail how to verify them. That information has to be available elsewhere e.g. in user manuals.
  • I always look for a document describing the system context in a top-down manner: What is the system doing, for who, and how? If it isn’t there, I don’t mind writing that document myself.
  • A catalogue of tools used in testing is often also very useful. Test tools are often not well documented (or not documented at all), and that can be a hurdle for new testers when they come aboard a project. A well written ”tips and tricks for testing system X” will get them up to speed faster and can act as a platform for sharing knowledge about testing specific parts of the system. I like Word documents for their self-contained’ness, but a Wiki could be better in many situations – the important thing is that such a document is actively maintained.

What are your preferred test documents?

Acceptance tests are not enough!

Acceptance testing is a key method in Agile. One way of defining acceptance tests are Gojko Adzic‘s ”Specification by example” paradigm which has gained quite a bit of momentum lately. I personally found it to be both refreshing and nice when I heard him present it at Agile Testing Days 2009, and I also found his book Bridging the Communication Gap a nice read.

Photo of Gojko Adzic demonstrating communication gaps in his keynote presentation at Agile Testing Days 2009
Gojko Adzic demonstrating communication gaps at Agile Testing Days 2009

I’m sceptical of the concept of acceptance testing. Not because verification of agreed functionality is not a good thing, but because it tends to shift attention to verification instead of exploration.

This will shift attention from testing to problem prevention. Is that bad, you ask? Isn’t it better to prevent problems than to discover them?

Well, most people think ”why didn’t I prevent this from happening?” when problems do happen. Feelings of regret are natural in that situation and that feeling can lead you into thinking you should improve your problem prevention. And maybe you should, but more examples aren’t going to do it!

Real testing is still necessary.

To explain why, I’ll consult one of the great early 20’th century mathematical philosophers: Kurt Gödel. In particular his first incompleteness theorem. It says that no consistent system of axioms whose theorems can be listed by an “effective procedure” is capable of proving all facts about the natural numbers.

What does this mean to us?

It means that we will never be able to list all things that can be done with this particular set of data.

A specification is a kind of listing of ”valid things to do” with data, thus Gödel’s theorem teaches us that there are infinitely more things to a system than any long list of requirements. This also applies when the requirements are listed as examples.

If you’re in the business to deliver products of only ”agreed quality” to a customer, you can be all right only verifying things which are explicity agreed. If something goes wrong you can always claim: ”It wasn’t in the specification!”

But if you’re striving for quality in a broader sense, verifying that the system works according to specifications is never going to be enough.

Gojko has made a good contribution to agile. Examples can be useful and efficient communication tools, and if they are used correctly they can help making users and other interested parties better aware of what’s going on on the other side. His contribution can help bridge a communication gap. It can also produce excellent input for automated unit tests.

Just don’t let it consume your precious testing time. The real testing goes far beyond verification of documented requirements!

If you want to learn more about this, I recommend you sign up for one of the Rapid Software Testing courses offered by James Bach and Michael Bolton.

Photo from Rapid Software Testing course in London November 2010 as offered by Electromind
Michael Bolton with one of the many interesting and challenging test exercises at the RST course

Covering test coverage

Rolf Østergaard @rolfostergaard suggested on twitter when I posted my previous blog that instead of counting defects and tests we take a look on test coverage. Certainly!

Mathematically, coverage relates the size of an area fully contained in another area, relative to the size of that other area. We could calculate the water coverage of the Earth or even how much of a floor a carpet could cover. Coverage can be expressed as a percentage.

But coverage is also a qualitative term. For example a book can cover a subject, or a piece of clothing can give proper (or improper!) body coverage.

So what is test coverage? Well, the term is often used to somehow describe how much of a system’s functionality is covered by testing.

Numbers are powerful and popular with some people, so a quantified coverage number would be nice to have. One such number is code coverage, which is calculated by dividing the number of code lines which have been executed at least once by to the total number of code lines in a program.

Another measurement relies on business requirements for the system being registered and numbered, and tests mapped to the requirements which they test. A suite of tests can then be said to cover a certain amount of requirements.

Numbers can hint something interesting. E.g. if your unit tests exercise only 10% of the code and it tends to be the same 10% on all of them, the chances are that something important will be missing from the unit tests. Or you could even have a lot of dead legacy code. This would be similar if you found that you actually only tested functionality in a few of the documented business requirements: Could the not-covered requirements be just noise?

No matter what, a coverage number can only give hints. It cannot give certanity.

Let’s imagine we can make a drawing of the functionality of a system; like a map. Everything on the map would be intended functionality, everything outside would be unaccepted. Let’s make another simplification and imagine for the moment that the map is the system, not just an image of it. Here is an example of such a simple system:

Drawing of a system being tested. Some tests verify valid functionality of the system, other tests verify that there are not functions in the system which should not be there. But tests are points.

The blue area is the system. The red spots are checks carried out as part of testing. Some of the checks are within the system, others are outside it. The ones within are expected to pass, the ones outside are expected to fail.

Note that there is no way to calculate the test coverage of this imaginative system. Firstly, because the area outside the system is infinite and we can’t calculate the coverage of an infinite area. Secondly, because the checks don’t have an area – they are merely points – so any coverage calculation will be infinitesimal.

Ah, you may argue, my tests aren’t composed of points but are scripts: They are linear!

Actually, a script is not a linear entity, it’s just a connected sequence of verification points, but even if it was linear, it wouldnt’ have an area: Lines are one-dimensional.

But my system is not a continous entity, it is quantified and consists only of the features listed in the requirement document.

Well that’s an interesting point.

The problem is that considering only documented requirements will never consider all functionality. Think about the 2.2250738585072012e-308 problem in Java string to float conversion. I’m certain there are no requirement documents on systems implemented in Java, which actually listed this number as being a specifically valid (or invalid) entry in input fields or on external integrations. The documents probably just said the system should accept floats for certain fields. However a program which stops responding because it enters an infinite loop is obviously not acceptible.

A requirement document is always incomplete. It describes how you hope the system will work, yet there’s more to a system than can be explicitly described by a requirements document.

Thus any testing relying explicitly on documented requirements cannot be complete – or have a correctly calculated coverage.

My message to Rolf Østergaard is this: If a tester makes a coverage analysis of what he has done, remember that no matter how the coverage is measured, any quantified value will only give hints about the testing. And if he reports 100% coverage and looks satisfied, I strongly suggest you start looking into what kind of testing he has actually done. It will probably be flawed.

Intelligent testing assists those who are responsible for quality in finding out how a system is actually working, it doesn’t assure quality.

Thanks to Darren McMillan for helpful review of this post.

The Communicative Power of Counting

Michael posted the following two comments on Twitter shortly after I published this post:

There’s nothing wrong with using numbers to add colour or warrant to a story. Problems start when numbers *become* the story.

Just as the map is not the territory, the numbers are not the story. I don’t think we are in opposition there.

I agree, we’re not in opposition. Consider this post an elaboration of a different perspective – inspired by Michaels tweets.


Michael Bolton posted some thought provoking Tweets the last few days:

Trying to measure quality into a product is like measuring height into a basket ball player.

Counting yesterday’s passing test cases is as relevant to the project as counting yesterday’s good weather is to the picnic

Counting test cases is like counting stories in today’s newspaper: the number tells you *nothing* you need to know.

Michael is a Tester with capital T and he is correct. But in this blog post, I’ll be in opposition to Michael. Not to prove that he’s wrong, not out of disrespect, but to make a point that while counting will not make us happy (or good testers), it can be a useful activity.

Numbers illustrate things about reality. They can also illustrate something about the state of a project.

A number can be a very bold statement with a lot of impact. The following (made up) statement illustrates this: The test team executed 50 test cases and reported 40 defects. Defect reporting trend did not lower over time. We estimate there’s 80% probablity that there are still unfound critical defects in the system.

80%? Where did that come from. And what are critical bugs?

Actually, the exact number is not that important. Probabilities are often not correct at all, but people have learnt to relate the word “probability” to a certain meaning telling us something about a possible future (200 years ago it had a static meaning, by the way, but that’s another story).

But that’s okay: If this statement represents my gut feeling as a tester, then it’s my obligation to communicate it to my manager so he can use it to make an informed decision about whether it’s safe to release the product to production now.

After all, my manger depends on me as a tester to take these decisions. If he disagrees with me and says ”oh, but only few of the defects you found are really critical”, then fine with me – he may have a much better view of what’s important with this product than I have as a test consultant – and in any case: he’s taking the resonsbility. And if he’s rejecting the statement, we can go through the testing and issues we found together. I’ll be happy to do so. But often managers are too busy to do that.

Communicating test results in detail is usually easy, but assisting a project manager making a quality assessment is really difficult. The fundamental problem is that as testers, by the time we’ve finished our testing, we have only turned known unknowns into known knowns. The yet unknown unknowns are still left for future discovery.

Test leadership is to a large extent about leading testers into the unknown, mapping it as we go along, discovering as much of it as possible. Testers find previously unknown knowledge. A talented ”information digger” can also contribute by turning ”forgotten unknowns” into known unknowns. (I’ll get along to defining ”forgotten unknowns” in a forthcoming blog entry, for now you’ll have to beleive that it’s something real.)

Counting won’t help much there. In fact it could lead us off the discovery path and into a state of false comfort, which will lead to missing discoveries.

But when I have an important message which I need to communicate rapidly, I count!