Introducing STPA – a new Test Analysis Technique

At the core of innovation in IT is someone getting the idea of connecting existing services and data in new ways to create new and better services. The old wisdom behind it is this:

The Whole is Greater than the Sum of its parts
– Aristotele

There is a flipside to this type of innovation that the opposite is also true: The whole can become more problematic than the negative sums of all the known risks.

My experience as a tester and test manager is that projects generally manage risks in individual subsystems and components quite well.

But I have on occasions found that we have difficulty imagining and properly taking care of things that might go wrong when a new system is connected to the infrastructure, subjected to real production data and actual business processes, and exposed to the dynamics of real users and the environment.

Safety, Accidents and Software Testing

Some years ago, I researched and came across the works of Dr. Nancy Leveson and found them very interesting. She is approaching the problem of making complex systems safe in a different way than most.

Leveson is professor of aeronautical engineering at MIT and author of Safeware (1994) and Engineering A Safer World (2011).

In the 2011 book, she describes her Systems-Theoretic Accident Model and Process – STAMP. STAMP gives up the idea that accidents are causal events and instead perceives safety as an emergent property of a system.

I read the book a while ago, but has only recently managed to begin the transformation of her ideas to software testing.

It actually took a tutorial and some conversations with both Dr. Leveson and her colleague Dr. John Thomas at the 5th European STAMP/STPA workshop in Reykjavik, Iceland in September to completely wrap my head around these ideas.

I’m now working on an actual case and an article, but have decided to write this blog as a teaser for other testers to look into Leveson’s work. There are quality resources freely available which can help testers (I list them at the end of this blog).

The part of STAMP I’m looking at is the STPA technique for hazard analysis.

According to Leveson, hazard analysis can be described as “investigating an accident before it occurs”. Hazards can be thought of as a specific type of bug, one with potentially hazardous consequences.

STPA is interesting to me as a tester for a few reasons:

  • As an analysis technique, STPA helps identify potential causes of complex problems before business, human, and societal assets are damaged.
  • One can analyze a system and figure out how individual parts need to behave for the whole system to be safe.
  • This means that we can test parts for total systems safety.
  • It works top-down and does not require access to knowledge of all implementation details.
  • Rather, it can even work on incomplete models of a system that’s in the process of being built.

To work, STPA requires a few assumptions to be made:

  • The complete system of human and automated processes can be modeled as a “control model”.
  • A control model consists of interconnected processes that issue control actions and receive feedback/input.
  • Safety is an emergent property of the actual system including users and operators, it is not something that is “hardwired” into the system.

I’d like to talk a bit about the processes and the control model. In IT we might think of the elements in the control model as user stories consisting of descriptions of actors controlling or triggering “something” which in turn produce some kind of output. The output is fed as input either to other processes or back to the actor.

The actual implementation details should be left out initially. The control structure is a mainly a model of interconnections between user stories.

Given the control model sufficiently developed, the STPA analysis itself is a two step activity where one iterates through each user story in the control structure to figure out exactly what is required from them individually to make the whole system safe. I won’t go into details here about how it works, but I can say that it’s actually surprisingly simple – once you get the hang of it.

36574241164_d2989109b0_o.jpg
Dr. John Thomas presented an inspiring tutorial on STPA at the conference.

Safety in IT

I have mentioned Knight Capital Group’s new trading algorithm on this blog before as it’s a good example of a “black swan project” (thanks to Bernie Berger for facilitating the discussion about it at the first WOTBLACK workshop).

Knight was one of the more aggressive investment companies in Wall Street. In 2012 they developed a new trading algorithm which was tested using a simulation engine. However, the deployment of the algorithm to the production environment turned out to be unsafe: Although only to be used in testing, the simulation engine was deployed and started in production resulting in fake data being fed to the trading algorithm. After 45 minutes of running this system on the market (without any kind of monitoring), Knight Capital Group was bankrupt. Although no persons were harmed, the losses were massive.

Commonly only some IT systems are considered “safety critical” because they have potential to cause harm to someone or something. Cases like that of Knight Capital indicate to me that we need to expand this perspective and consider safety a property of all systems that are considered critical to a business, society, the environment or individuals.

Safety is a relevant to consider whenever there are risks that significant business, environmental, human, personal or societal assets can be damaged by actions performed by a system.

STAMP/STPA and the Future of Testing

So, STPA offers a way to analyze systems. Let’s get this back to testing.

Software testing relies fundamentally on testers’ critical thinking abilities to imagine scenarios and generate test ideas using systematic and exploratory approaches.

This type of testing is challenged at the moment by

  • Growing complexity of systems
  • Limited time to test
  • Problems performing in-depth, good coverage end-to-end testing

DevOps and CD (continuous delivery) attempts to address these issues, but they also amplify the challenges.

I find we’re as professional testers more and more often finding ourselves trapped into frustrating “races against the clock” because of the innovation of new and more complex designs.

Rapid Software Testing seems the only sustainable testing methodology out there that can deal with it, but we still need to get a good grip on the complexity of the systems we’re testing.

Cynefin is a set of theories which are already helping testers embrace new levels of complexity in both projects and products. I’m actively using Cynefin myself.

STAMP is another set of theories that I think are worth looking closely at. Compared to Cynefin, STAMP embraces a systems theoretical perspective and offers processes for analyzing systems and identify component level requirements that are necessary for safety. If phrased appropriately, these requirements are direct equivalents of test ideas.

STAMP/STPA has been around for more than a decade and is already in wide use in engineering. It is solid material from one of the worlds’ leading engineering universities.

At the Vrije Universiteit in Amsterdam, the Netherlands they have people taching STPA to students in software testing.

The automobile industry is adopting STPA rapidly to manage the huge complexity of interconnected systems with millions of lines of code.

And there are many other cases.

If you are curious to know more, I suggest you take a look at the resources below. If you wish to discuss this or corporate with me on this, please write me on twitter @andersdinsen or e-mail, or join me at the second WOTBLACK workshop in New York on December 3rd, where we might find good time to talk about this and other emerging ideas.

Resources

Thanks to John Thomas and Jess Ingrassellino for reviewing drafts of this blog post. Errors you may find are mine, though.

DSC_0146
This photo shows machinery in an Icelandic geothermal power plant. Water heated to 300 deg C by the underground magma flows up and drives turbines and produces warm water for Reykjavik.
Reklamer

I’ll be playing and (talking about) failing at ConTEST New York

I’m really looking forward to ConTEST in New York on November 29th – December 1st.

I will be presenting in two sessions at the conference: One on play, which I’ll do with Jess Ingrassellino, and one sharing my experiences performing great testing by embracing failure.

I know what you’re thinking: “I don’t play at work; I work, and I certainly don’t fail at my job.”

I appreciate that. Really!

But we also know that people who play well perform better, and that the best way of learning is through failure. In these turbulent times, playing and maintaining a readiness for learning seems more important than ever.

I think that soon HR people will want to read about failures, not successes, in resumes. People will reflect, talk and care about failures more than successes. We need to create a positive brand out of the failures of course, i.e. share narratives about what we have learned – and might still be learning.

Apart from that, I can’t tell you much about my talk on failure yet, as I’m still thinking about how to structure it and which of my own failures I will be sharing. They keep popping up and deciding which ones I’ll start with, go through, and end with is difficult.

Jess and I did our session on play first time at the CounterPlay conference in Aarhus, Denmark in March, then a few days later in Copenhagen, so I can share some more on that.

One of the good things about Denmark is that we have a culture that generally value playing.

We finally now even have wide support for more play in the parliament as they are currently working on http://jyllands-posten.dk/indland/ECE9963902/ny-paedagogik-efter-20-aar-leg-skal-afloese-laering-i-daginstitutioner/http://jyllands-posten.dk/indland/ECE9963902/ny-paedagogik-efter-20-aar-leg-skal-afloese-laering-i-daginstitutioner/changing legislation to stop kindergartens from having agendas fully focused on learning. They are putting free play back at the top for our children. The decision is backed by strong research showing that children that play freely perform better when they grow up.

I spent my time in kindergarten in a forest, where we played and explored all day long. I like going back to the particular forest from time to time and feel like “little Anders” again.

I take this as a reminder that we benefit from to re-connecting to our inner playful child from time to time. Tt makes us happy, but also makes us better performers. Even when problems queue and we need to be ok with being at risk failing.

The session at ConTEST will be a safe place to play. We will introduce participants to musical exercises that everybody can perform.

Jess has a doctorate in music education and is a virtuos violionist, and we will experience her play her beautiful instrument and teach us to perform in ways we probably thought we could not.

ConTEST has allocated us one hour, and we will make sure we have time to engage conversation about the good things we find in playing – conversations which you can take with you and continue at work.

A tester who participated in our workshop when we did it in Copenhagen recently came back to me about his experience:

“I didn’t get exactly what happened…”
“But you seemed to enjoy it?”
“Yeah!”

And that’s really all Jess and I ask you to: Engage and enjoy.

You may not feel you “get it”, but that’s part of playing: Performing without having to necessarily “get it”.

I hope you’ll join me at ConTEST!

 

DSC_6577
Lacking a photo of me playing, here are my sons Jens and Troels playing with our poodle Terry in a forest.

 

With Cynefin, I can justify skepticism about inappropriate approaches and co-create better ones

As testers we need to better understand and be explicit about problems in testing that don’t have known, clear, or obvious solutions. Cynefin can help by transforming the way we, our teams, and our stakeholders think about testing problems.

Ben Kelly and James Christie has written very good blogs about Cynefin and testing. Liz Keogh was one of the first to write about Cynefin in development. At the bottom of this post, I have included a video with David Snowden and a link to an article I found interesting when I read it.

With this blog post I’m sharing elements of my own understanding of Cynefin and why I think it’s important. I think of Cynefin itself as a conceptual framework useful for comprehending dynamic and complex systems, but it is also a multi faceted “tool” which can help create context dependent conceptual frameworks, both tacit and explicit, so that we can better solve problems.

But before diving into that (and in particular explain what a conceptual framework is), I’d like to share something about my background.

Product design and the historic mistakes of software development

I used to study product design in university in the early 90’s. Creating new and innovative products does not follow obvious processes. Most engineering classes taught us methods and tools, but product design classes were different.

We were taught to get into the field, study real users in their real contexts, develop understandings of their problems, come up with prototypes and models of product ideas, and then try out these prototypes with the users.

Discussing an early draft of this post with James Christie, he mentioned that one of the historic mistakes of software development has been the assumption that it is a manufacturing process, whereas in reality it is far more like research and development. He finds it odd that we called it development, while at the same time refusing to believe that it really was a development activity.

SAFe, “the new black” in software delivery, is a good example of how even new methodologies in our industry are still based on paradigms rooted in knowledge about organizing manufacturing. “The Phoenix Project”, a popular novel about DevOps states on the back cover that managing IT is similar to factory management.

What I was taught back in the 90’s still help me when I try to understand why many problems remain unsolved despite hard work and many attempts on the opposite. I find that sometimes the wrong types of solutions are applied, solutions which don’t take into consideration the true nature of the issues we are trying to get rid of, or the innovations we’re trying to make.

Knight Capital Group, a testing failure

The case of Knight Capital Group is interesting from both innovation, risk and software testing perspectives, and I think it exemplifies the types of problems we get when we miss the complexity of our contexts.

Knight Capital Group was one of the more aggressive investment companies in Wall Street. In 2012 they developed a new trading algorithm. The algorithm was tested using a simulation engine, I assume to ensure to that stakeholders that the new algorithm would generate great revenues.

The testing of the algorithm was not enough to ensure revenues, however. In fact, the outcome of deploying to algorithm to production was enormous losses and the eventual bankruptcy of the company after only 45 minues of trading. What went wrong?

SEC, Securities and Exchange Commission of the U.S.A.:

[…] Knight did not have a system of risk management controls and supervisory procedures reasonably designed to manage the financial, regulatory, and other risks of market access […] Knight’s failures resulted in it accumulating an unintended multi-billion dollar portfolio of securities in approximately forty-five minutes on August 1 and, ultimately, Knight lost more than $460 million […]

But let’s assume a testing perspective.

It think it’s interesting that the technical root cause of the accident was that a component designed to be used to test the algorithm by generating artificial data was deployed into production along with the algorithm itself.

This test component created a stream of random data and was of course not supposed to run in production since it was designed to generate a stream of random data about worthless stock.

I find it strangely fascinating that the technical component that caused the accident was designed for testing.

Why didn’t someone ensure that the deployment scripts excluded the testing components?

Was it software testing that failed? It is not uncommon that software testing is entirely focused on obvious, functional and isolated performance perspectives of the system under test.

The component did it’s job: Helped test the new product. The testing strategy (probably undocumented) however, obviously did not consider possible side effects of the component.

I think Cynefin could have helped.

Cynefin transforms thinking

Let’s imagine we’re test managers at Knight and that we choose to use Cynefin to help us develop the testing strategy for the new algorithm. 

David Snowden talks about Cynefin as a ‘sensemaking tool’ and if you engage Knights’ management, financial, IT-operations, and development people in a facilitated session with a focus on risks and testing,

I’m pretty sure the outcome would be the identification of the type of risk that ended up causing the bankruptcy of the company, and either prevented it by explicitly testing the deployment process, or made sure operations and finace put the necessary “risk management controls and supervisory procedures” in place.

I think so because I have observed how Cynefin sessions with their brainstormings are great for forming strategies to deal with the problems, issues, challenges, opportunities etc that we are facing. It helps people talking seriously about the nature of problems and issues, transforming them into smaller chunks that we can work with, and to help escalate things that require escalation.

Cynefin seems to be efficient breaking the traditional domination of boxed, linear and causal thinking that prevent problem solving of anything but the simplest problems.

My interpretation of what is happening is that Cynefin helps extend the language of those participating in sessions.

Decision makers a Knight Capital did not think about possible negative outcomes of the testing software. They had a simplistic view of their business risks. Cynefin could have helped them by extending their ‘sensemaking’ to more complex risks than those they were focusing on.

In the following I’ll dive a bit more into why I understand the sensemaking part of Cynefin to be a language-extending tool.

Language and Conceptual Frameworks

Language is an every-day thing that we don’t think much about.

Yet it is the very framework which contains our thinking.

While we can know things we cannot express (tacit knowledge), we cannot actively think outside the frame language creates.

Many philosophers have thought about this, but here I’d like to refer to physicist Niels Bohr (1885-1962) who in several of his lectures, articles, and personal letters talks about the importance of language in science.

Science is in a way about sensemaking through knowledge gathering and poetically (paraphrasing from my memory) Bohr describes language as the string that suspends our knowledge above a void of endless amounts of experiences.

In “The Unity of Science”, a lecture given at Columbia University, New York in 1954, Bohr introduce language as a “conceptual framework”:

“[it] is important […] to realize that all knowledge is originally represented within a conceptual framework adapted to account for previous experience, and that any such frame may prove too narrow to comprehend new experiences.”

And:

“When speaking of a conceptual framework, we merely refer to an unambiguous logical representation of relations between experience.”

Bohr was the father of quantum physics, which is more than new laws about nature. It introduced new and complimentary concepts like uncertainty, and non-deterministic relations between events. The extension was made for quite practical purposes, namely the comprehension of observations, but has turned out to be quite useful:

“By means of the quantum mechanical formalism, a detailed account of an immense amount of experimental evidence regarding the physical and chemical properties of matter has been achieved.”

The rest is history, so to speak.

This is relevant to software testing and Cynefin because I think that the conceptual frameworks based on the thinking developed during industrialism are far from capable of explaining what is going on in software development and therefore also in testing.

Further, Cynefin seems to be an efficient enabler to create extensions to the old thinking frameworks in the particular contexts in which we use it.

Cynefin and software testing

Software development is generally not following simple processes. Development is obviously a human, creative activity. Good software development seems to me to be much more like a series of innovations with the intention to enable someone doing things in better ways.

Testing should follows that.

But if language limits us to different types of linear and causal thinking, we will always be missing that there is generally no simple, algorithmic or even causal connection between the stages of (1) understanding a new testing problem, (2) coming up with ideas, and (3) choosing solutions which are effective, socially acceptable, possible to perform, and safe and useful.

Experienced testers know this, but knowledge is often not enough.

James Christie added in his comments to the early draft mentioned above that as testers, with Cynefin we can better justify our skepticisms about inappropriate and simplistic approaches. Cynefin can make it less likely that we will be accused of applying subjective personal judgment.

I would like to add that the extended conceptual framework which Cynefin enables with us and our teams and stakeholders further more allow us to discover new and better approaches to problem solving

David Snowden on Cynefin

This video is a very good, quick introduction to Cynefin. Listen to David Snowden himself explain it:

 

AI personally found this article from 2003 a very good introduction to Cynefin:

The new dynamics of strategy: Sense-making in a complex and complicated world (liked page contains a link to download the article)

 

Efter 15 år som freelancer tør jeg godt tvivle på mig selv

I disse dage er det 15 år siden jeg tog springet og gik freelance. Det har jeg ikke fortrudt!

Jeg er blevet hyret ind som eksperten, der skal gøre det komplicerede enkelt og løse problemer. For det meste I lange kontrakter, men altid som den frie fugl. Jeg elsker det faktisk!

Konsulentjobbet kræver masser af kærlighed: Kærlighed til problemerne, der skal løses og kærlighed til de mennesker som har problemer. Ja, og kunden. Der følger også mere kedelige ting med: Kontrakter, fakturering,… den slags. De er en del af gamet.

I gamet er også en forventning om performance: At vi hurtigt kan gå ind og “levere varen” – uden slinger i valsen.

Ydmyghed er faktisk utrolig afgørende. For, – hånden på hjertet – konsulenter er langt fra perfekte og slet ikke ufejlbarlige.

Specialistrollen og forventningen om den sikre performance må aldrig komme til at betyde, at man ender med næsen i sky. Jeg kan godt blive lidt flov, hvis jeg ind imellem møder en anden konsulent med en attitude i retning af at de er universaleksperter, der altid ved bedst.

Jeg synes jeg selv er rimeligt god til at undgå den attitude. For mig hjælper det at jeg jævnligt mindes om nogle af de fejl jeg har begået. Efter 15 år i rollen har jeg ikke længere tal på, hvor tit jeg har fejlet i en opgave. Pinligt, men sandt. Og nu har jeg sagt det!

Den klassiske pinlige situation for mig som tester er et ”bugslip”: Kunden vil gerne have testet, at systemet vi arbejder med viser netop et bestemt resultat og jeg er hyret ind til at dokumentere kvaliteten af systemet inden vi går i produktion med det.

Jeg er testekspert og har indsigt i teknikken og projektet. Jeg udfører ordren. Det ser fint ud. Vi overholder planen. Alt er godt.

Men så kommer der melding om en fejl i produktion, og endda et åbenlyst problem som jeg simpelthen overså da jeg testede.

I sådan en situation er det ikke rart, at være i mine sko. Puh, jeg husker hver eneste gang det er sket, og det er mere end en gang! Det ligger desværre i testerjobbet, at det sker. Jeg prøver, at afstemme forventningerne om det, men sjovt er det aldrig.

Den situation og andre fejl jeg har haft del i har lært mig, at nok er det ret vigtigt at bruge sin erfaring og ekspertise, men det er også vigtigt at kunne tvivle på sig selv. Ja, tvivle: At vide, at ekspertise tit er langt fra nok til at garantere succes.

Sommetider er det faktisk netop ekspertisen, der står i vejen for at man gør det godt.

En generel ting jeg har tænkt lidt over (men ikke tænkt færdig) er, at vi alle faktisk burde blive bedre til at improvisere. Altså improvisere ærligt og blive dygtige til det: Fejle kontrolleret, observere det vi kan lære – og gøre bedre, fejle lidt mindre, evaluere, gøre det meget bedre.

Altså blive bedre til at undgå at lade os blænde af tidligere gode erfaringer – og derfor misse det åbenlyse.

Jeg tror i alle tilfælde på, at det er en kvalitet, når jeg som konsulent tager tvivlen med på arbejde – som en god ven, der hjælper til at jeg gør mit bedste. Og jeg tror på, at det er en kvalitet, hvis jeg deler tvivlen på en konstruktiv måde, så vi i fællesskab kan bruge den til at gøre vores bedste.

Ekspertisen og erfaringen er stadig vigtig. Men tvivlen må vi alig glemme.

I øvrigt føler jeg mig klar til at tage 15 år mere. Måske ses vi derude! Og bliv ikke overrasket, hvis jeg er eksperten, der tvivler.

The Art of Doubting

As a software tester, it is my job to question things. Questioning involves doubt, but is that doubt of a certain kind? Perhaps; let’s call it ‘good doubt’.

Monday May 15th 2017, I facilitated a philosophical, protreptic salon in Copenhagen about the art of doubting. The protreptic is a dialogue or conversation which has the objective of making us aware and connecting us to personal and shared values.

Doubt is interesting for many reasons. Self-doubt is probably something we all have and can relate to. But there seems to be value in a different kind of doubt than that with which we doubt ourselves.

Doubt is related to certainty. Confidence can be calculated statistically, and that seems to be the opposite of doubt.

Science almost depends on doubt: Even the firmest scientific knowledge is rooted in someone formulating a hypothesis and proving it by doubting it and attempting to prove it wrong.

Even religion, faith, seems to be related to doubt.

It is always interesting to examine the origins of a worud. The Danish and German words “tvivl” and “Zweifel” have the same meaning as the English doubt and all relate to the duo; two; zwei; to.

That appears to indicate that when we doubt we can be in “two minds”, so to speak.

So is doubt a special type of reflection, “System-2”, or slow thinking?

The protreptic is always about the general in terms of the personal. We examine our relations to doubt.

“What is it that our doubts wants or desires for us?” was one of my protreptic questions during the salon.

We circled a lot around that particular question. Finding an answer was difficult and we came back to self-doubt, which can be difficult to live with. Self-doubt can even harm our images, both the external ones and those that are internal to ourselves.

Leaders are usually expected not to have self-doubt: A prime minister risk loosing the next election if he doubts his own decisions and qualities. A CEO that doubts her own actions will drive the share value of the company down.

But there is a good doubt, and good doubt seems to be of a helpful nature.

Good leadership requires having the courage to doubt. It seems to help us act wisely and based on our experiences.

During the salon, my personal image of doubt changed. In the beginning I thought of doubt as a kind of cognitive function, perhaps a process I had to go through. Doubting could even be an event.

But at the end of the salon, my image of doubt changed into that a good friend walking with me through my life. Continuously present, if I want him.

With that image we found an answer to the question: Doubt is my friend. A friend who wants my actions to be driven not only by my instincts or simple gut feelings. A friends that help me shape my actions by my values.

dsc_4588_jpeg-srgb