Having a large, varied dataset for training the AI models is essential in order to actually improve cybersecurity with the new technology, experts tell CRN. And not everyone has the data.
The rush for cybersecurity vendors to tap into generative AI is in full swing — as anyone who attended last week’s RSA Conference, and perused the many booths touting the technology, will confirm.
When it comes to generative AI and security products, “everybody’s putting it in — whether it’s meaningful or not,” said Ryan LaSalle, a senior managing director and North America lead for Accenture Security, during an interview at RSAC 2023.
As with so many things in technology today, data will be the biggest factor for determining if the applications are truly useful or not, experts told CRN.
Put another way, to use generative AI in a way that really makes a difference for security, it’s going to require data: A lot of it, and from many different sources. And not everyone has that data.
[Related: Here’s What 15 Top CEOs And Cybersecurity Experts Told Us At RSAC 2023]
At the RSA Conference in San Francisco, an array of companies announced new or updated security tools that utilize large language models (LLMs) such as OpenAI’s GPT-4.
LLMs are the underlying technology behind ChatGPT, OpenAI’s massively popular chatbot, and models such as GPT-3 and GPT-4 have been trained on vast quantities of publicly available data.
To reach their full potential for usefulness in cybersecurity, however, LLMs must actually be tailored to security with training on additional data sources. “The way this becomes super valuable is on large datasets [of] security data,” said Robert Boyce, a managing director and global lead for cyber resilience services at Accenture, in an interview.
That’s why Microsoft, when unveiling its generative AI-powered Security Copilot in late March, emphasized that the tool uses GPT-4 in combination with the tech giant’s own security-focused AI model. Indeed, Microsoft, according to Boyce, “has the dataset to be able to make it impactful.”
Microsoft rival Google, of course, is another company with both generative AI technology as well as abundant data. But companies such as Accenture have the advantage of being able to work with both companies, and many others, to leverage security data that can help improve its use of generative AI.
The IT consulting giant last week announced an expansion of its security partnership with Google Cloud, in part around the use of Google’s new security-specific large language model, Sec-PaLM.
“For us and what we’re doing — whether it’s partnering with Microsoft, Google, across the board — I think we will have one of the largest security datasets that has heterogeneous data,” Boyce told CRN. “Microsoft has Microsoft data, [but] we will have data from Microsoft, from Palo [Alto Networks], from CrowdStrike, from you name it.”
And that, he said, is where things become “more interesting.”
Boyce is not the only one thinking along such lines. At RSAC last week, cybersecurity vendor SentinelOne debuted a generative AI-powered threat hunting tool — dubbed Purple AI — followed by a new security-focused data lake that the tool can work with. Purple AI “sits on top of the data lake so that it has access to all the data that you put in,” said Tomer Weingarten, co-founder and CEO of SentinelOne, in an interview with CRN.
Whether it’s a firewall, email security product or identity security tool, “now Purple can answer questions from all of these different sources,” he said. “That’s where it becomes just immensely powerful. And it’s going to get better as we train it more.”
SentinelOne is “experimenting” with multiple large language models, including GPT-4 and Google’s Flan-T5, and is exploring others such as LLMs from Anthropic and Cohere, Weingarten said. But for generative AI to make a real difference in cybersecurity, the LLM is not as important as the training data available, he said.
“I think it really is more about training the algorithms, and less about the algorithms themselves,” Weingarten said.
For Accenture, using generative AI in combination with varied security datasets is a massive opportunity, Boyce said. Doing so will allow its cybersecurity specialists to “start thinking more proactively — predicting what we don’t know yet,” he said.
That means that Accenture should be able to use the technology to “find things that we haven’t even thought about — these ‘unknown unknowns’ that we’ve been talking about for years, but that we’ve never been able to figure out how to [find],” Boyce said.
In other words, with a variety of security datasets and generative AI to more easily interact with it, “we can ask better questions,” he said.
“I don’t think we’re asking the right questions of our security data now, as a community. We don’t even know what to be asking,” Boyce said. “We’re just asking stuff that we already know. ‘Are these IOCs present in this dataset?’ Important — but not going to get us to a protection strategy that’s adding a really high level of confidence for your cyber resilience.”
All of the generative AI hype aside, Boyce has no doubt that for cybersecurity, the technology is going to be “super disruptive.” And the types of use cases that have surfaced in cybersecurity so far are just the beginning.
“I don’t think we’ve even started to think about what the possibilities are,” he said.
Good Data Required
Sam King, CEO of application security vendor Veracode, agreed — saying that even with the rush by so many vendors to tout generative AI in their products, “I think it does have the potential to have broad applicability” in the security sphere.
The week before RSAC, Veracode debuted a product that uses generative AI to provide remediation suggestions for code security flaws. The company plans to explore additional areas for generative AI going forward, she said.
Generally speaking, cybersecurity is “a good area of application for generative AI — because we have a lot of data, and we’ve got to sort through a lot of data,” King said.
At the same time, “what you need to think through is, what is the problem area you’re applying it to?” she said. “Do you have a good dataset for that problem area? Can you train it on good data? And go from there.”