The LeadTech Diet #9

On lockdowns, risk management and antifragility

Sep 07, 2021

The latest delta variant COVID-19 outbreak in Sydney has made me think a lot about risk management and the fragility of systems — whether said systems are technological, political , or societal.

I remember thinking during the outbreak last year how lucky I am to live in Australia, with “great leadership” that prevented the pandemic from claiming many more lives than it could have had we not taken the actions we did regarding international border closures and lockdowns.

Fast forward to June 2021 and said “great leadership” is looking little more than geographical luck — by that I mean that being a continent, Australia can very easily prevent things — and people — from getting in or out by closing its borders.

The delta strain of the virus caught an unprepared Australia. What follows is my attempt at an analogy between the handling of the pandemic event and technology systems through the lens a Black Swan event. Like all analogies, this one too will be imperfect.

What Is a Black Swan?
A black swan is an unpredictable event that is beyond what is normally expected of a situation and has potentially severe consequences. Black swan events are characterized by their extreme rarity, severe impact, and the widespread insistence they were obvious in hindsight. — Investopedia

Based on the definition above, we can agree that the COVID-19 virus and ensuing pandemic are an example of such a Black Swan event. It then triggered a number of procedures and policies aimed at controlling and overcoming the outbreak. Some countries did that better than others.

Can risk be managed?

This points to a fundamental flaw in the way we approach risk management: we cannot say with any reliability that a remote event is more likely than any other. What we can focus on instead is whether a system is more or less fragile than any other should such an event happen. — Paraphrased from Antifragile.

And in the event of an outbreak of a new, more contagious virus strain, we can say that our political and health system in Australia are particularly fragile.

Australia secured 300,000 doses of the AstraZeneca vaccine — which arrived in February 2021 — plus the ability to manufacture a further 50 million doses onshore.

In contrast, by November 2020, the Australian Government had only secured 10 million doses of the Pfizer vaccine. It wasn’t until 2021 that the Government tried to secure a further 10 million doses — talk about putting most of your eggs in one basket!

Banking on Astrazeneca to vaccinate the vast majority of the population

Any plan or strategy that relies on a single solution to accomplish its goal creates a single point of failure. Any disruption to this weak link puts the whole endeavour at risk.

Fear mongering

AstraZenica was reported to have caused blood clots in a small percentage of people. As a result, our government prevented people younger than 50 years of age from getting the shot, reserving it exclusively to older citizens.

The problem is that this was a knee jerk reaction due to the failure in assessing its actual risk.

To put it in perspective, the risk of blood clots due to AstraZeneca is about 0.000031% — or 3.1 cases out of 100,000 people in the under 50 age group. Compare that to the risk of developing blood clots by women taking contraceptives, which is 0.003% — or 1 in 300 cases. Nobody is out there yelling contraceptives are dangerous - yet, we basically told the population the shot was not safe.

The Delta variant

This back-fired big time. When the delta variant showed its ugly face in Australia, Sydney - and later other cities - went into lockdown due to the low percentage of the population who have had a shot.

What about Pfizer? Well - we didn’t order enough shots. As such they were reserved to front-line and emergency workers (as they should!). Seeing this mess, the government back flipped and said the AstraZeneca vaccine was now safe for the younger demographic.

Too little, too late. People did not want the AstraZeneca shot. They no longer trusted the government’s judgement. Conspiracy theorists rejoiced.

What does this have to do with tech?

Have you ever worked in team where you only ever tested and accounted for the happy path? Where you relied on your 3rd party services to be available 100% of the time? Where your users always behaved in predictable ways? Where yours systems were never under attack? Yup, me too.

This isn’t that different from what’s happening in Australia. It’s what happens when we build systems and policies that are fragile and fail to account for Black Swans.

It’s not about preventing Black Swans from happening. It’s about being ready for when they do.

Robustness vs. Antifragillity

Antifragility is beyond resilience or robustness. The resilient resists shocks and stays the same; the antifragile gets better — Nassim Nicholas Taleb, in Antifragile

We talked a lot about fragile systems - so that may lead you to believe we should strive to build robust systems instead. This is where I like the idea of Antifragility as the true opposite of fragility - with robust being somewhere between the two.

Robust Systems stay the same; Antifragile systems get better

As an example, imagine you run an e-commerce site that goes through seasonal traffic. You have a few options here when that traffic invariably ramps up around the holidays:

you can deny service to new customers once your system is under stress, thus preserving its integrity and the experience of users who managed to get in first;
or you can ramp up your infrastructure resources to match the load, thus improving your system’s ability to handle additional stress;

If we adopt the terminology we introduced earlier, the first option makes your system robust. The second, makes it antifragile.

We can see another example at play in the process of developing and shipping new software products as part of a team:

a robust process would prevent changes from being deployed to preserve the stability of the system. It fears change.
the antifragile process embraces change and ships it daily - or multiple times a day - so that the team - and the system - improve and can better handle stressors over time.

Conclusion

If you’ve stayed with me so far, I thank you. This post took a while to put together and I hope the real world example from the pandemic makes the point that preventing rare events should not be the goal. Let us instead instead focus on being ready and having the right structures to respond to such events when they do happen.

Let us be antifragile.

If you have a question for me or a suggestion for the newsletter , please submit it here - I’ll address as many as possible in coming issues.