Illumio Blog
March 14, 2017

Security is No Match for Human Error

Chris Westphal,

Find me on:

"To err is human, to persist in error is diabolical." 
— Georges Canguilhem

Recently we saw a big disturbance in the force. A big portion of the world’s largest cloud service provider, Amazon Web Services (AWS), was taken down owing to human error. A single admin making a simple mistake had widespread impact across companies that run their applications in AWS and on everyone who consumes AWS-hosted SaaS apps, including thousands of enterprises and millions of consumers.

“Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.” AWS response to Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region.

You’d think that AWS is a well-oiled machine and this kind of thing doesn’t happen to them, yet with a single click of the mouse, large chunks of critical infrastructure were taken offline for hours.

This should be a warning that nobody, regardless of how big and how sophisticated you are, is immune to errors or their impact. Imagine how easily this could happen in your environments.

HUMANS MAKE MISTAKES

Let’s face it, humans aren’t perfect. We make mistakes. Maybe we’re distracted, not sleeping enough, rushing to get to that next meeting, or just too much mundane repetition has us zoning out. Anything could potentially cause us to make an error.

Think about the hundreds, thousands, or millions of lines of configuration and policy that you’ve created or will create and ultimately must manage. As the list of policies increases, the potential for error also increases. What will be the impact of errors? How will you know when something’s been misconfigured?

WHAT'S THE IMPACT?

The recent AWS incident was due to an error in infrastructure configuration but it could very easily have been security policy. Let’s think about it for a minute. When a mistake was made with infrastructure configuration, we saw the impact relatively quickly. Storage went down and service was disrupted – a pretty clear sign of trouble.

With a misconfigured security policy, would the impact be as immediate and visible? What are the signs that something is wrong?

When someone takes advantage of a hole in the security fabric, we often don't see the effects until it's too late.

A misconfiguration of security policy is almost a silent killer. Once a hole has been exploited, the damage is done and the impact can be much more significant than the downtime that we saw with the AWS incident. You could be sitting vulnerable for days, months, or even years.


HOW CAN YOU REDUCE RISK?

If it’s humans that make mistakes, why not reduce the dependency on the humans? After all, that’s why we’ve built computers, right?

There are things that humans are good at that computers will never master, but computers are good at simplifying complex problems and automating to reduce the need for humans to repeat mundane tasks.

Let’s consider two ways to leverage technology to reduce dependency on humans and reduce the potential for errors:

  1. Reduce complexity by reducing the number of policies: This is all about reducing the opportunity for error. Fewer policies created mean fewer opportunities to get one of those policies wrong. Fewer policies to manage means that when it comes time to add something new (like a firewall rule or ACL), there are fewer things that can break in the process. It’s also easier to evaluate and assess a smaller list of existing policies to make sure they’re the right ones to provide the desired level of protection.

    Getting to a reduced set of policies might be easier said than done but can be accomplished by either manually assessing and reevaluating your current policies or looking at solutions that provide an approach to policy that makes it easier for humans to work with. For example, this could mean solutions that abstract all the complexity and details behind the scenes and present a more human-understandable approach to policy creation. Hint: you’re not going to get this from your NGFW vendors.

  2. Automation to reduce the potential for human error: There are some things that computers are better at than humans. Computers never oversleep, never take a day off, never get caught daydreaming, and they never make an error. Computers are great tools for automation. With automation comes simplicity for humans, which means fewer repetitive, mundane tasks and fewer opportunities to make mistakes. Automation helps take care of the boring tasks so you can focus on more complex and interesting projects.

    Automation can be used to scale or react to changes by managing policies so humans don't have to. Pushing policy to hundreds or thousands of devices? Automate it. Need to update policy as your application grows Automate it. Need to move workloads and want to make sure everything remains secure Automate it.

It’s always a preference to learn from the mistakes of others and this is a perfect opportunity to take a step back and think about how to set a goal to reduce the potential for errors in your environments. Making things easier for humans by reducing your number of security policies and leveraging automation will go a long way toward that goal. And reducing errors goes a long way toward improving security in any environment.

Topics: Cloud Computing, Adaptive Security

Share this post: