As organizations increasingly shift focus to threat detection and response, there’s one issue that seems to get worse over time: alert triage. Prioritizing security alerts has been a critical function for security teams since the early 1990s — so why does it remain such a challenge decades later?
Security teams are overwhelmed by daily alerts from security information and event management (SIEM), endpoint detection and response (EDR), and other detection tools, mostly due to the sheer volume of alerts generated, which is growing all the time, along with a rise in low-quality alerts and false positives. Many teams find they can only analyze a fraction of the thousands of alerts that come in each day, leaving threats that often go unnoticed for months.
This problem is believed to have played a part in Target’s massive data breach in 2013. The company’s security team reportedly failed to take action after detecting potentially malicious activity. Experts speculated that Target had been receiving hundreds of alerts every day that were mistakenly dismissed as false.
Alert triaging has become even more challenging over the past decade as the attack surface has grown. While Windows and Linux server threats were once the main focus of attackers, cloud, mobile and Internet of Things technologies have given rise to their own threat detection needs, leading to new massive piles of alerts. Alert triaging is foundational, yet almost every organization struggles to manage it efficiently.
What’s standing in the way? Here are four issues:
Issue 1: Today’s environments are more diverse.
In the past, when everyone used the same server operating system, one firm could create a flowchart for its alert response, target, and history, and it could apply to other organizations with the same system. This made it easier for security teams to address the volume of alerts that were similar.
But these days, security tools and IT environments can range from advanced deception tools and machine learning, to traditional IT, cloud and industrial IoT technologies. One organization may still use on-premises mainframes, while another may be cloud-native, which makes having a unified approach impossible.
Issue 2: Many advanced threats don’t appear in alerts.
Even when you can successfully identify, analyze, and act on an alert, you must also account for the threats that you can’t see. Ransomware, for example, clearly announces itself, but the process for other crimeware (and especially for stealthy state-sponsored threats) isn’t as obvious.
Sophisticated threat actors who can easily go undetected are on the rise. According to the Council on Foreign Relations (CFR), state-sponsored attacks have increased 122% from 2014 to 2018, and attackers have found ways to evade detection by hiding in encrypted traffic, which the Electronic Frontier Foundation now says comprises more than half of total traffic on the Web. These cybercriminals leverage SSL tunnels to sneak malware into the corporate network, hide commands, control traffic, and exfiltrate data.
Even achieving alert triage perfection won’t make detection and response capabilities truly world-class because attackers will always find ways to adapt to new detection tools and methods. This is forcing security teams to actively hunt for threats, rather than merely gathering (and then clearing) alerts.
Issue 3: Automation and machine learning tools have limits.
To create alert playbooks, some security teams lean on security orchestration, automation, and response (SOAR) technology. But too often, they buy these tools without understanding the prerequisites or even the reasoning behind their implementation. I’ve seen a security leader authorize the purchase of a SOAR tool to avoid coding a playbook, only to discover that the first task after implementation was to define his playbook in the tool via Python.
Furthermore, many detection technologies produce a litany of false positives and fail to completely cover monitored assets. I’ve worked with several security professionals who have grown frustrated with new anomaly-based systems that produce vague and untraceable warnings. In some cases, machine learning (ML)-based systems have actually produced more false positives than legacy tools do.
While security teams increasingly rely on ML technology to solve alert problems, I’ve also seen many reports that ML-produced alerts can be more difficult to confirm in real security operations, as they often lack transparency and context. For instance, an ML tool may spot a behavioral outlier, but it’s unclear what the client should do next, where it should look, or who it should ask.
Issue 4: If not robots, then what?
“Security is a process, not a product.” This quote from Bruce Schneier still stands. In theory, the solution is simple: Hire a small but skilled security team to define, refine, and implement mature alert response processes and customize tools and workflows to address alert overload. But in practice, the cost of assembling an expert team makes this option less feasible for many organizations. What we need is a combination of human brains confirming the alerts that are prepared by the machines in a way that’s optimal for human decision.
To put this into practice, security teams should apply business context to alerts (such as the implications of an alert’s severity), conduct alert triage steps outside of IT, and address the alerts presented by the system. Otherwise, you run the risk of missing important alerts and becoming the next massive data breach to dominate headlines.
Anton is a recognized security expert in the field of log management, SIEM and PCI DSS compliance. He is the author of several books and serves on advisory boards of several security startups. Before joining Chronicle, Anton was a research vice president and Distinguished … View Full Bio