The many are smarter than the few: How crowds can forecast quality

This is a blog post which I’ve had underway since early May. It is about a new way of assessing quality. Let’s start with how we normally work:

Testers usually work alone or in small teams checking and exploring functionality, finding bugs, issues, and other artifacts. These artifacts do not by themselves say anything about the quality of the whole product, instead they document things which are relevant to quality.

In this blog, I’ll propose a different kind of testing, one which is organised in a way which is radically different from traditional testing – and which can produce a quite different type of result.

My inspiration is the 2004 book by James Surowiecki: ‘The Wisdom of Crowds‘ with the subtitle ‘Why the many are smarter than the few’. In the book, Surowiecki presents a thought provoking fact: That while some individuals are very good problem solvers or excellent forecasters, a diverse crowd of ordinary people can always do better than any single individual.

Surowiecki explains this in a very convincing manner and the book is an enlighting read. I find Surowiecki’s thoughts a welcome diversion from what most seems to be concerned about these days: The performance of the individual. Too often, we forget that most good solutions are not invented or implemented by any single person, but by groups of people. And that the performance of teams often depend more on the composition of the team than on the individuals in it.

As a tester, I enjoy working alone as well as in teams, but reading Surowiecki’s book made me think of ways to apply his thoughts to make quality assessments of a different kind than those traditional testing can make.

James Surowiecki: The Wisdom of Crowds

Let me start the description with an example of a question which traditional testing cannot easily answer, but which I think a new kind of assessment can:

A client approaches us with a product which is still under development and therefore not yet on the market. The client tells us that he needs a holistic quality assessment of the product and he asks us to provide the answer to a very simple question: Will it be a good product?

Though I can produce a wealth of information about a product I’m testing, answering this question is not possible by ordinary testing alone. I may be able to make up an opinion about the product based on my testing, and I can communicate this to my client, but it will always be a fundamentally subjective view.

And there is no practival way of assessing whether my view of the product matches that of the collective intelligence of the population of users of the future product. An expert in the field of forecasting product successes may do better than me, but in principle he may be just as wrong as I am – and the worst thing is that we will not know whether he’s right or wrong.

Humans are actually very good at coming up with answers to open ended questions: Quality is something that everyone tends to have an opinion about! But while a single human can (and according to Surowiecki will) make interpretation errors, Surowiecki points out that in a crowd, the errors will be evened out. Aggregated opinions can be a very reliable prediction of the quality of the finished product.

The crowd does not have to be a team of experts. Surowiecki points out that rather than focusing on maximizing the individual members’ domain knowledge and level of experience, the crowd should be put together to be as diverse as possible.

Obviously we have to supply some information about the product to the group – they can’t make up their minds about quality without knowing something about the product. Collecting information has to be done by someone and provided to group members. This is an important task which a ‘moderator’ has to do.

In the ideal situation, we will provide all available information the group: Prototypes, design documents, concept descriptions, ideas, diagrams – even code! The idea is to allow each individual member of the crowd use his own heuristic making up his mind about the question.

But that won’t work in practice. Asking all group members to read everything is just not effective. Besides, the documentation could lead them in wrong directions: They will focus on the most easily accessible parts and will avoid information for which they have to work a little to get to it.

So the moderator will have to make a ‘flat’ (as opposed to hiearachical) binder of different information from the product. What should it contain?

When I was learning sketching and drawing, I was introduced to the problem of drawing leaves on a tree or hair on an animal. I was taught a trick, which is to draw every 100th leave or every 10,000th hair accurately. It will then look correct to the viewer.

I suggest making the ‘information collection’ in the same way: Pick some documents, some diagrams, some code, some tests. Or even, pick some pages from some documents.

The idea is that the crowd members actually doesn’t need to see everytning – they only need enough to formulate an opinion. And then they should see different things, so we’re most certain that they will form different opinions about the system.

How about questions – what questions should we ask? We will have to ask them in a way so answers can be aggregated into a combined result. We may want ask them to give a score, which can then be averaged or in other ways analysed.

Surowiecki points out some important pitfalls that should be avoided. I’ll focus on what is often referred to as collective thinking. This is what happens when a group of people turns out to be ‘dumber’ than the individual members. A bullet proof way to get people to think in collectives is to let some members of the crowd influence other members: E.g. by putting a very charismatic person in the role of chairman or manager for the group. Surowiecki refers to several examples of how group thinking has lead to wrong decisions, and it is obvious that if we want to make an assessment which can be trusted, we have to avoid it. By all means.

So ‘voting’ should be secret, and we should generally prevent members from communicating with each other. If we do allow them to communicate, we should moderate the communication to ensure that individual members are not allowed to influence the opinions of other members.

Is crowd involvement in testing a new thing? I think so. I don’t think the concept has been described before.

On the other hand, many beta test programs have traits of it.

But where the crowd based quality assessments (or forecasts) can take place at any point in the development process, beta testing by definition takes place on an almost finished version of the product. And beta test programs produce the same types of results as ordinary testing: Bugs and other artifacts.

Holistic crowd testing is not an efficient bug factory. Its power is its ability to answer holistic questions about a product under development.

I’d like to set up a workshop in a forthcoming software testing conference where the idea can undergo further development. Let me know if you’re interested in participating, and I’ll let you know when and where.