This article was originally pubished on InfoQ.
“The maxim ‘Nothing prevails but perfection,’ may be spelled PARALYSIS.”―Winston S. Churchill
“The tragedy of life is not that man loses but that he almost wins.”―Heywood Broun
“Oops!...I did it again.”―Britney Spears
Enterprise security teams are charged with maintaining the “perfect” set of security policies. In their pursuit of the perfect security policy, they are often the department of slow (because the pursuit of perfection takes time). At the same time, “to err is human…”
As Winston Churchill said, the pursuit of perfection is paralyzing. Even if the perfect security policy is achieved, any application changes, data center changes, migrations, or policy changes erode at that perfection like water flowing over a rock, and eventually a crack will occur. Thanks to virtualization, public cloud, and constant application delivery, the probability of an error getting introduced has never been higher.
With the stakes so high, and jobs on the line, you can bet that every change in security policy is contemplated, scrutinized, re-scrutinized, and planned before finally being executed. As a firewall team manager for one of our customers said to me, “Some people only need to get the right answer sometimes. In my business, my team and I need to get the right answer EVERY time.”
So why is it so hard to get the right answer every time? Let’s examine how a security policy is created and is ultimately implemented.
When security policies are changed, usually there is a compelling event that precedes the change. Examples of compelling events include:
- New data center being rolled out
- Additional infrastructure being added (for instance, new servers introduced into an existing data center
- Changes to an existing application
- Introduction of a new application
Understanding the Complexity
To understand the complexity of adding a new security policy, let’s home in on what it takes to add and secure a new application in a data center. In this example, each tier of the application will be isolated in order to limit exposure and reduce the surface area of attack to bad actors.
What is the perfect security policy?
When this application is being deployed, the development team will send a ticket to the security team that might look like this:
“We are deploying a new order-expediting application. It is a three-tier application: The top tier runs Apache on port 443 and it needs to be available to the sales teams and the order processing teams in headquarters. The top tier talks to a processing tier on port 5339 using TCP, and it talks to a MySQL database.”
The security team then takes this description and determines which data center it will be deployed in, figures out where the tiers will be placed in the data center, what network changes will be needed, what VLANs need to be created, and based on where the tiers of the application will be placed, what firewall changes will need to be made to accommodate the new application.
The security team then prepares a schedule of all of the changes that need to be made. This includes new VLANs, IP address changes, firewall rule changes—often impacting multiple firewalls—changes to their automation tools, and changes to their switching infrastructure. The schedule of changes needs to be approved by a change review board.
Finally, the output for one firewall might look something like this:
Consider that this is for just one firewall. And then consider how many firewalls and switches with ACLs it takes to permit all of the needed connections. Finally, remember that the permit/deny IP address and port protocol rules look nothing like the original plain English trouble ticket.
Given the fundamental disconnect between how a security policy is described and how the security policy is implemented, the question has to be asked: “Is the perfect security policy even achievable? And if it is possible, what is the probability of introducing an error once the security policy is inevitably changed?”
For that matter, what is the perfect security policy?
The “ideal” security policy marries the current state (running context) of all workloads in a data center, the applications that those workloads take part in, the environment the applications run in (for instance, development, PCI, production), and the minimum of ports that need to be open to make the application work.
This would effectively reduce the exposure of each workload and every application to the bare minimum. This would prevent the spread of east/west threats. Finally, if the application evolved, changed, or was migrated to a new data center, the security policy would be portable to run in whatever infrastructure the application landed in.
Knowing the Risks
For a firewall team in the pursuit of achieving the perfect security policy, the risks of getting an imperfect answer include:
- Exposure to failing an audit
- Bringing down one or more applications
- Exposure to bad actors
- Bringing down an entire network
Consider the time and planning that a firewall team needs go through to develop the best possible, most accurate security policy with internal pressures that include:
- Teams that need to spin up new applications
- Management saying the business needs to go faster
- Pressures to migrate apps to public cloud
- Security controls that are available in one data center and not another
- Constantly evolving applications
To make matters worse, the number of changes, coupled with the rate of change, only increases the probability of a configuration error.
The number of changes, coupled with the rate of change, only increases the probability of a configuration error.
If every three-tier application requires 15 firewall rules (five for each tier), and an organization has 1,000 applications, then 15,000 firewall rules seems (relatively) reasonable. Now consider that 30 percent of the applications change in any given year, and that applications need to be migrated from development, staging, and production environments and possibly up to a public cloud. Also consider that the same application might live in multiple locations/data centers, and because of data residency issues, in some countries there needs to be additional policies that restrict data access within the country.
I like to use the following equation to determine a team’s probable exposure:
(a * r) + (c * r) * e
Doing the Math
In order to evaluate your organization’s exposure, use the following to figure out your exposure to risk and the probability that some of your rules have errors.
Write down the following data:
- The number of applications in your data center
- The average number of ACLs or firewall rules that you have per application
- The number of applications that will be updated in the coming year due to migration, changes by internal teams, consolidation, etc.
Next, divide the total number of errors that your team gets for every N number of changes. According to Ray Panko, Professor of IT Management at the University of Hawaii, the rate of errors for complex actions is 5 percent. Perhaps, though, your team only makes one error for every 100 changes; your number would then be 1 percent.
Finally, apply the equation to discover your team’s probable exposure:
- Multiply the number of applications by the number of rules per application.
- Multiply the number of applications that will change in the next year by the number of rules per application.
- Add those two numbers together and multiply the sum by your error rate.
For a company that has 500 apps * 7 rules per app + 20 percent of those apps being updated with an error rate of 2 percent, the total likely exposure would be 84 manual configuration errors.
The rate of errors for complex actions is 5 percent.
Now consider that as the number of applications goes up, the probability of getting something wrong goes up too, and it only takes one error for a hacker to win.
We live in the age of “the hack”, where every day there are stories about organizations being infiltrated, with hackers stealing sensitive information. Enterprises are forced to make difficult decisions:
- Ignore the problem and run their data centers “wide open” where once something is “inside” the data center it can go anywhere; or
- Confront the problem by driving enforcement and segmentation deeper into the data center.
Clearly driving segmentation deeper into the data center is needed to mitigate risk. The problem is that deeper segmentation requires the manual changes described above (and more).
Without more granular security policies, organizations end up in the news as having been compromised.
Let’s face it, humans make errors. That will never change. So the best way to reduce the probability of configuration errors is to decrease the number of human touch points. In a world where teams cannot afford to ever get the wrong answer, the algorithm is the answer.
Let’s face it, humans make errors.
The disconnect between descriptive security policies and their implementation is acute; what hasn’t been discussed is that IT has evolved from manual racking of servers to the world of automation, where compute instances can be created, joined to applications, and removed from service in a matter of minutes. Anyone with a credit card can spin up a new workload or new application inside of Amazon Web Services and Microsoft’s Azure cloud. Juxtapose minutes with the time it takes to manually provision subnets, zones, and VLANs within a network.
Morgan Stanley realized that the only way to secure applications in an automated world was to decouple security from the underlying network, then use algorithms to solve the computing problem. Jim Rosenthal, COO of Morgan Stanley, said in the Wall Street Journal that using automated security, tied to the context of the workload rather than the network, massively reduced the number of firewall rules.
It’s becoming increasingly difficult for manual security to keep up in the world of automation, and even if manual security could keep up with the rate of automation, the probability of configuration errors goes up because of the complexity of creating and provisioning security rules. It’s clear that a more automated approach is needed.