Antifragility by Testing?

”There are two classes of things [] One class of things that gain from disorder, and one class of things that are harmed by disorder.”

Nassim Nicholas Taleb, author of the best seller ”Black Swan” is out with a new book: ”Antifragile: How to live in a world we don’t understand”. He gave a lecture at the London School of Economics on December 6th 2012 during his book tour. The lecture is available as a podcast here.

”Technology is inherently fragile.”

The words are Nassim Taleb’s , and the statement should not surprise any testers: Testers can find bugs in even the best pieces of computer software: It is only a question of having useful testing heuristics, how much effort we’re using, and about observation skills of course.

”In psychology, people talk about post traumatic disorder. But very few talk about post traumatic growth.”

I am a big fan of Nassim Taleb for his original philosophical thinking and his ability to think and speak clearly about subjects which are very complex and sometimes even counter intuitive.

Taleb has a lot to teach us in testing, and it is very obvious to me that fragility is something that we should start looking for.

”The difference between the cat and the washing machine […] is you need to fix [the washing machine] on every occasion. The organic self-heals.”

Computer systems do not self-heal – they are inherently fragile.

Photo of Nassim Nicolas Taleb giving a lecture
Nassim Nicholas Taleb (photo: Bloomberg)

But let’s step back for a moment, taking a broader look. Let’s look at the systems incorporating the computers and the software: Organizations using information technology to run their business, communities using IT to stay connected, factories with machinery, workers, managers and computers to run them. Can any of these systems be described as antifragile?

”You should never let an error go to waste.”

My question is ”Can testing be applied in such a way that it not only detects fragility, but instead facilitates the development of anti-fragility?”

I believe that the answer is yes. And yes; there are antifragile systems out there incorporating computers and IT.

Please consider a recent test activity you participated in. Now think about the system which was building the product you were testing (my apologies for using manufacturing terms here): The people, the project teams, the organization. Such a system is organized into layers, and while the bottom layer (where the technology is) is usually inherently fragile, some of the higher level layers were perhaps antifragile?

This is where I see the role of testing coming in:

”It’s not about trial and error – it’s about trial and small error.”

In this very statement, Nassim Taleb, in my humble opinion, speaks clearly about what testing is about. The antifragile system for developing products grow stronger when testers find problems, since not only will the system learn from experience; no the antifragile system will prepare itself for things that are worse than what was experienced.

Put in another way: The antifragile software project does not just fix the bugs it encounters. The antifragile software project fundamentally eliminates problems based on knowledge from the small problems testers find.

So my message to project and and program managers is this: Don’t hire testers to find the defects. You should hire great testers to ensure your projects experience many small problems, allowing them to grow stronger and build better products: If your project systems are anti-fragile by structure, leadership and management, not only will bugs found by testers not be found in production: The overall quality of the product will be better!

And that, to me, is where the real business value of testing is!

Thanks to Jesper L. Ottosen for very constructive reviewing and commenting of drafts of this blog post.

Reklamer

12 thoughts on “Antifragility by Testing?

  1. Hi Anders,

    Thanks for writing this post. The topic is interesting!

    You wrote:
    “My question is ”Can testing be applied in such a way that it not only detects fragility, but instead facilitates the development of anti-fragility?”

    I believe that the answer is yes.”

    Do you have an example where this belief would be true? I was thinking first about a case where a tester tells a programmer what kind of tests he would start with. But after thinking about it for a while, I don’t think this would be antifragile. Secondly I thought about a case where a tester tries to find bugs from the software and the programmers try to team up against those bug reports. This might be in the antifragile category, and it might even work with some people, but initially it sounds like hard to implement. Even dangerous and something that I see potentially harming the value understanding of testing.

    However, I have seen how programmers and testers work together so that they protect themselves from the client. Would that be antifragile to you? Initially that also sounds like hard to implement in a constructive manner.

    You also wrote:
    “You should hire great testers to ensure your projects experience many small problems, allowing them to grow stronger and build better products…”

    What do you mean with testers ensuring the project has small problems? Would any “small” (obviously, we could philosophize about the meaning of “small”) problem suffice or the problem would need to be of certain kind? Could you give an example of such a problem that will lead to antifragility? I am also wondering if we could draw a heuristic line(s) somewhere for a problem being too big for causing antifragility.

    I think retrospective (or any kind of situation where the work is analysed) could be seen as a catalyst for potentially antifragiling (that doesn’t sound like a real word) the product and/or the team. A good retrospective could give input for further planning and/or design.

    I’d like to see comments and ideas around the topic as it’s rather new to me. Sounds like something I would like to explore a lot more.

    Best regards,
    Jari

    1. Hi Jari, thanks for commenting. I’m glad you find the topic interesting.

      The best example I know of of an antifragile development project is the Apollo project to land men on the moon. Although at least one test did have fatal consequences, Apollo 13 showed that the team had not only learnt how to handle actual problems they have been seeing during testing, they were able to turn a true black swan in a fragile system (the space craft) into a grey swan: A lost mission, but no fatal consequences. In fact, the result was so convincing that the top manager of the project (president Nixon) declared the mission a success despite the fact that they didn’t land on the moon. See for example this article: http://spectrum.ieee.org/aerospace/space-flight/apollo-13-we-have-a-solution

      In software, we’re usually not very good at doing things in antifragile ways. I’m afraid I can’t come up with a well known example.

      I agree with you that any kind of teaming up to protect teams from clients is a bad idea, and might also indicate fragility. The same would apply to your example where “programmers are teaming up against” bug reports. They key word here is “against”: To be antifragile, you have to work with your testers, not against them. Testers and developers can compete against each other, but not work against each other.

      Ask yourself: Is a process more like something inside a washing machine, or is it like a cat. The cat is the antifragile.

      “Small” is anything that doesn’t destroy something. Taleb uses the following example: If you jump 10 times from a wall 1 m high, you will stress your body in a non-destructive way and your organism will grow stronger. But if you jump once from a wall 10 m high, you’ll probably die. Antifragility only works to a certain level.

      So in a software project, it’s often better to find 10 medium-severity bugs than 1 show stopper. Why? Because while show stoppers are always fixed, they are usually “only fixed”. Nobody cares why they showed up in the first place. Not all of the 10 medium severity bugs, but it is my experience that programmers working with them start thinking and looking for patterns. They start anticipating other problems, and often fix them before you get a chance to test it.

      But we could also say that any bug found in testing is a “small problem”. At least compared to bugs found in production, which can be very “big problems”, even if they’re trivial.

      I agree very much that a retrospective generally supports antifragility.

      Anders

    2. Great post and great comment. I agree that we can induce some anti fragility by bringing in experts early – an example I have seen is in using expert “crowd-sourced(it shd be community sourced?)” exercises to detect and then address behaviours in an exploratory mode. they catch many failure points that conventional tests and testers ignore because of the latter’s fixation with requirement conformity. This works specially well with ecommerce/social sites I’d think.. On second thoughts I feel that even the testing in production is designed to increase anti fragility by inducing lower stressors on the system?

    1. I’m glad you like the article. Automated checks are more like washing machines than cats to me, so by themselves, they cannot support antifragility. Thanks for the reference to your blog, which I will read later.

  2. Some cautionary notes here.

    On trial and (small) error: Taleb is careful to note that there are different aspects to “small” when we’re thinking about error. In software, the error itself may be small and easy to fix (say, a one-byte typo), but the consequences can be monstrous (your spacecraft augurs into the surface of the planet) . Similarly, you can make a whopping mistake in your conception of something (say, completely misunderstanding how some process works, and programming it based on your misunderstanding). Yet if the problem is discovered as part of the tinkering processes of exploratory development and testing, the problem can be fixed before the product is released, and thus the consequences of the errors are small. Taleb talks about this in his essay on the Fourth Quadrant, which also appears in the second edition of The Black Swan.

    On fragility vs. robustness vs. antifragility: The opposite of “fragile” is not “robust” or “resilient” or “strong”. As Taleb points out, robust things merely stand up to randomness, turbulence, disruption, and stress. Anti-fragile things benefit from stress and perturbation. This leads to a key distinction between two different approaches to testing. Confirmatory testing (or checking) emphasizes repetition and anticipated problems. A confirmatory approach probes for reliability in the sense of consistency, reinforcing the idea that what we knew before remains the same. We live in a changing, variable, complex, surprising (and human!) world, An exploratory approach emphasize tinkering, experimentation, galumphing, and unanticipated problems. An exploratory approach probes for reliability in the sense of adaptability to the unexpected. As Anders points out, this makes the system of development anti-fragile, benefiting from mild stress. But, alas, it doesn’t make the technology itself robust; at best, it pushes the very fragile in the direction of the somewhat-less-fragile.

    Finally, I disagree with Michael, above, An automated test does not check that the bug has been fixed; it can’t do that. An automated test (an automated check, really), at best, speeds up our ability to check that the program’s behaviour conforms to some specific expectation. That expectation is fragile.

    However, automation CAN be used in the service of an exploratory approach when we combine with with variation (especially) randomization and a high-speed oracle. So certain kinds of automation can help to drive discoveries; that’s anti-fragile.

    —Michael B.

    1. Thank you very much, Michael, for your clarifying and cautionary notes!

      Best,
      Anders

  3. Anders – Thank you for this insightful post. I think the Apollo example you gave is an excellent example of what Taleb identified as deep redundancy. That is redundancy in engineering, technical, and management practices at the time. That allowed the team to quickly shift the mission and save the astronauts.

    The concepts he presents in that book are very deep and I believe strike at the core of what we should be paying attention to in software development and testing.

Skriv et svar

Please log in using one of these methods to post your comment:

WordPress.com Logo

Du kommenterer med din WordPress.com konto. Log Out / Skift )

Twitter picture

Du kommenterer med din Twitter konto. Log Out / Skift )

Facebook photo

Du kommenterer med din Facebook konto. Log Out / Skift )

Google+ photo

Du kommenterer med din Google+ konto. Log Out / Skift )

Connecting to %s