Chaos Engineering: the Point of Adding Bugs on Purpose

Chaos engineering is a kind of contradiction: it works against the very system it is protecting in order to build an environment that is more resilient and more secure. How does it work? How is introducing errors useful and how does it help to secure the digital environment? Understanding this discipline can lead to substantial improvements.

What is it?

The concept of chaos engineering is based on four principles defined by Netflix. These principles consist of defining a “stable” state, making a hypothesis of the state that will follow, introducing variables that reflect events true to reality, and trying to break the hypothesis (in that order).

Through a series of tests, characteristics of the infrastructure, such as availability, security, and performance, are assessed. The goal is to resolve problems in these distributed systems in order to bolster recovery capabilities for the entire system. This means, in short, getting structures that withstand extreme conditions.

Resilience and “antifragility”

The concept of chaos engineering is only understood if we understand the definition of “antifragility”, a term coined by Nassim Nicholas Taleb. This is the precursor concept of chaos engineering and, in turn, is based on resilience. Resilience is defined as the ability to absorb disturbances. These disturbances are caused by stressors, or stress factors, that trigger destabilization.

It is a concept widely used in living organisms (ecology, physiology, psychology, etc.) and refers to the ability to overcome problems actively and adapt to the situation. “Antifragility” goes beyond resilience since it implies the evolution of a system, which would be able to grow from the stress to which it has been subjected to adapt to new failures.

Panda Adaptive Defense is a tool that keeps a close eye on the principles of antifragility and adds resilience to the company, while increasing visibility into the state of the corporate network.

The Simian Army

Taking all this into account, large companies such as Netflix or Amazon see in chaos engineering the possibility of testing their infrastructure to make their systems more mature and increasingly robust — and also more evolved. In short, more resilient. Since performing an analysis and correcting a problem in a repetitive and escalating way is a very difficult task, they use heuristic strategies focused on prioritizing decision-making aimed simply at resolving problems.

Thus, Netflix, for example, uses its own suite of applications called the Simian Army, which tests the stability of its network. Simian Army has more than a dozen stressors that test the system in various ways. Security Monkey, for expample, is just one “piece” of the Simian Army. It implements a security strategy into cloud-computing platforms based on chaos engineering.

How can chaos engineering help companies?

The first question is, why should a company consider using chaos engineering?

Implementing a strategy based on chaos engineering helps to work the antifragility of a platform, including meeting the control objectives and requirements of PCI-DSS in case of audits. Thus, any company could benefit greatly from implementing a tool such as Security Monkey in its security strategy.

This would require a “chaosification” of the platform in a controlled manner, which could consist of actions of the following type: disable SG (Security Groups) rules, modify files at random, randomly listen to ports, inject malicious traffic into the VPC (Virtual Private Cloud), randomly kill processes while they are taking place… and the list of havoc-wreaking could go on.

Thanks to this tool (or strategy), a deeper visibility of the consequences of attacks can be achieved with the intention of improving defenses. This, in the long run, is the basis of a more mature and reliable system, capable of recovering from attacks and reducing losses in the face of a serious security incident, something that should be mandatory for any high availability service.

6 comments

ricoh customer service says:

February 22, 2018 at 10:59 am

Panda Adaptive Defense is a tool that keeps a close eye on the principles of antifragility and adds resilience to the company while increasing visibility into the state of the corporate network.

1. Panda Security says:
  
  February 22, 2018 at 1:09 pm
  
  Thanks for your feedback!
  
Sage Support says:

October 12, 2018 at 8:43 am

Hey, This is a great article. I was searching for the same and I found the best content here. Thanks for sharing the info. keep on doing the great work.

QuickBooks Integration Help says:

December 17, 2018 at 12:08 pm

I came across Panda Security on the internet & decided to give it a try. So far I am absolutely happy with it & will continue to use it. 🙂

apsaraofindia says:

November 23, 2020 at 12:31 pm

I have been looking for this information for quite some times. Will look around your website .

Nasha Band says:

August 6, 2022 at 11:23 am

Your blog is greatly appreciated, and I am very glad to read the blog. Got some knowledge. I want to tell you that you always keep posting blogs on new topics, we will look forward to your blog.

Chaos Engineering: the Point of Adding Bugs on Purpose

What is it?

Resilience and “antifragility”

The Simian Army

How can chaos engineering help companies?

Quantum computing: What is it?

Cyber Sabotage at the Winter Olympics

You May also Like

Panda Security Days in Sweden

GDPR: Enabling Digital Transformation in the EU

The 5 Pillars of IT Security (according to the NHS)

Panda Internet Security and PAV for Desktops earn ICSA certification for Windows 7 64-bit

Tor Messenger, the new way of chatting anonymously

6 comments

Leave a Reply Cancel reply