According to the Verizon Data Breach Investigations report, more than half of all malware goes undetected for months; in fact it was 224 days in 2016.
Today, we have two ways to detect anomalistic behavior on the network. One way is through baselining and deductive algorithms which try and determine when an anomaly exists based on deviations from ‘normal” traffic flows and patterns. The other way is to map network activity to a malware behavioral model to calibrate certain known malware stages with a pattern of infiltration. This approach uses abductive reasoning based algorithms.
Both are effective in limited ways.
The first approach relies on a baselining that often takes as long as 6 weeks and can therefore include any malicious traffic that is already lodged in the network. The second problem is that its reasoning is based on deductive analytics which looks for evidence to support a particular hypothesis.
The second approach ignores baselining and instead watches for behavioral patterns that match a multi-stage model that mimics the steps malware must go through in order to successfully accomplish its purpose. It uses abductive reasoning algorithms to develop a hypothesis on the basis of collected evidence. This approach is much truer to the nature of malware behavior and has a much better chance of abducting malicious code before it becomes a breach.
But, both of these approaches are inadequate to defend against the current and most advanced strains of malware. These modern strains are polymorphic and metamorphic in nature and change not only their profiles and shapes upon entry to the network but they also obfuscate the typical pathways by morphing the stages of development. It is much more difficult to detect the presence of malware today than it was just a year ago.
This is where predictive analytics comes in.
In order to get a more focused picture of an infection, you need both contextual and adjacent data to dial up the resolution. You need an aggregate of social media and enterprise social network data, sentiment analysis from the wild, a broad stream of aggregated and curated threat intelligence and the actual NetFlow and syslog data from the subject network. And you need a predictive analytics engine to make sense of it all.
In order for all that to happen, you also need a huge chunk of compute power, advanced machine learning and cognitive pipelining, which is a requisite natural language processing technique. Combining and contextualizing all of this data consumes a lot of compute power, requires a terabyte sized data lake and a terabyte sized data warehouse and some very fast and clever algorithms to make sense of the sea of data.
But, all of it is now doable and there are several companies working on combining these components right now with advanced machine learning and the expectation that they will be able to bring them to market in the next few months.
The result will be an automated process that can replace the tedious research steps that descriptive and predictive analytics require. This will lead in turn to an acceleration of the security analyst’s ability to discover the real malicious traffic within the anomalous outliers.
The impact on current cybersecurity defenses will be several-fold. First, the ability to detect sophisticated malware before it can settle into the network will increase dramatically, and will include the ability to detect adaptive behaviors like feints and pattern randomization. Second, we will see a new capability to forecast future threats and to detect, with high confidence, new threats that have never been seen before. Third, it will reduce our dependence upon scarce and expensive human resources to analyze the results of cybersecurity monitoring.
The missing link beyond discovery and identification is the enhanced or artificial intelligence component required to reduce that dependence on human heuristics even further and produce much more accurate results. We have seen extensive testing of an AI and supervised machine learning engine investigating cybersecurity threat data at MIT’s CSail labs reduce the time it takes to analyze a stream of threat data by 85% with a 10x increase in accuracy.
It is inevitable the entire information security industry will need to re-tool to address these ever-changing threats and the only way to identify them in any useful way is via advanced and high speed, real-time analysis of big data by trained non-human actors.
By leveraging these technologies in combination, we will be able to advance quickly to dissipate much of the asymmetry in the attacker/defender dynamic and actually begin to get a leg up on the bad guys.
In the near future, there will be no cybersecurity without predictive analytics.