Using AI to identify cybercrime masterminds – Sophos News | #cybercrime | #infosec


Online criminal forums, both on the public internet and on the “dark web” of Tor .onion sites, are a rich resource for threat intelligence researchers.   The Sophos Counter Threat Unit (CTU) have a team of darkweb researchers collecting intelligence and interacting with darkweb forums, but combing through these posts is a time-consuming and resource-intensive task, and it’s always possible that things are missed.

As we strive to make better use of AI and data analysis,  Sophos AI researcher Francois Labreche, working with Estelle Ruellan of Flare and the Université de Montréal and Masarah Paquet-Clouston  of the Université de Montréal, set out to see if they could approach the problem of identifying key actors on the dark web in a more automated way. Their work, originally presented at the 2024 APWG Symposium on Electronic Crime Research, has recently been published as a paper.

The approach

The research team combined a modification of a framework developed by criminologists Martin Bouchard and Holly Nguyen to separate professional criminals from amateurs in an analysis of the criminal cannabis industry with social-network analysis. With this, they were able to connect accounts posting in forums to exploits of recent Common Vulnerabilities and Exposures (CVEs), either based upon the naming of the CVE or by matching the post to the CVEs’ corresponding Common Attack Pattern Enumerations and Classifications (CAPECs) defined by MITRE.

Using the Flare threat research search engine, they gathered 11,558 posts by 4,441 individuals from between January 2015 and July 2023 on 124 different e-crime forums. The posts mentioned 6,232 different CVEs. The researchers used the data to create a bimodal social network that connected CAPECs to individual actors based on the contents of the actors’ posts. In this initial stage, they focused the dataset down to eliminate, for instance, CVEs that have no assigned CAPECs, and overly general attack methods that many threat actors use (and the posters who only discussed those general-purpose CVEs). Filtering such as this ultimately whittled the dataset down to 2,321 actors and 263 CAPECs.

The research team then used the Leiden community detection algorithm to cluster the actors into communities (“Communities of Interest”) with a shared interest in particular attack patterns. At this stage, eight communities stood out as relatively distinct. On average, individual actors were connected to 13 different CAPECs, while CAPECs were linked with 118 actors.

Color key for Figure 1a, above

Figure 1: Bimodal actor-CAPEC networks, colored according to Communities of Interest; the CAPECs are shown in red for clarity

Pinpointing the key actors

Next, key actors were identified based on the expertise they exhibited in each community. Three factors were used to measure level of expertise:

1)  Skill Level: This was based on the measurement of skill required to use a CAPEC, as assessed by MITRE: ‘Low,’ ‘Medium,’ or ‘High,’ using the highest skill level among all the scenarios related to the attack pattern, to prevent underestimating actors’ skills. This was done for every CAPEC associated with the actor. To establish a representative skill level, the researchers used the 70th percentile value from each actor’s list of CAPECs and their associated skill levels. (For example, if John Doe discussed 8 CVEs that MITRE maps to 10 CAPECs – 5 rated High by MITRE, 4 rated Medium, and one rated Low – his representative skill level would be considered High.) Choosing this percentile value ensured that only actors with over 30 percent of their values equivalent to “High” would be classified as actually highly skilled.

OVERALL DISTRIBUTION OF SKILL LEVEL VALUES

Skill Level Value  CAPECs % of Skill Level Values among all values in actors’ list
Low 118 (44.87%) 57.71%
Medium 66 (25.09%) 24.14%
High 79 (30.04%) 18.14%

 

SKILL LEVEL VALUES PROPORTION STATISTICS

Skill Level Value Average proportion of
members in the list of
actors
Median 75th percentile Std
High 29.07% 23.08% 50.00% 30.76%
Medium 36.12% 30.77% 50.00% 32.41%
Low 33.74% 33.33% 66.66% 31.72%

Figure 2: A breakdown of the skill-level assessments of the actors analyzed in the research

2)  Commitment Level: This was quantified by the proportion of ‘in-interest’ posts (posts relating to a set of related CAPECs based on similar Communities of Interest) relative to an actor’s total posts. Actors who had three or fewer posts were disregarded, reducing the set to be evaluated to 359 actors.

3)  Activity Rate: The researchers added this element to the Bouchard/Nguyen framework to quantify each actor’s activity level in forums. It was measured by dividing the number of posts with a CVE and corresponding CAPEC by the number of days of the actor’s activity on the relevant forums. Activity rate actually turns out to be inverse to the skill level at which threat actors operate. More highly skilled actors have been on the forums for a long time, so their relative activity rate is much lower, despite having significant numbers of posts.

DESCRIPTIVE STATISTICS OF SAMPLE

Mean Std Min Median 75th percentile Max
Length of Skill Level values list 99.42 255.76 4 25 85 3449
Skill Level (70th percentile value) 2.19 0.64 1 2 3 3
Number of posts (CVE with CAPEC) 14.55 31.37 4 6 10 375
% commitment 36.68 29.61 0 25 50 100
Activity time (days) 449.07 545.02 1 227.00 690.00 2669.00
Activity rate 0.72 1.90 0.002 0.04 0.20 14.00

Figure 3: A breakdown of the skill, commitment, and activity rate scores for the sample group

As shown above, the sample for the identification of key actors consisted of 359 actors. The average actor had 36.68% of posts committed to their Community of Interest and had a skill level of 2.19 (‘Medium’). The average activity rate was 0.72.

 COMMUNITIES OF INTEREST (COI) OVERVIEW

Community Community

of Interest

Nodes CAPEC Actors % one timers Mean out-degree per actor Std (out-degree) Mean number of specialized posts Std (posts)
0 Privilege
escalation
544 19 525 65.14 4 7.11 2 4.76
1 Web-based 497 26 471 71.97 5 12.98 3 18.33
2 General / Diverse 431 103 328 56.10 14 33.15 7 24.89
3 XSS 319 10 309 71.52 2 1.18 1 1.46
4 Recon 298 55 243 51.44 61 9.04 3 6.99
5 Impersonation 296 25 271 54.61 12 7.88 3 5.49
6 Persistence 116 22 94 41.49 26 25.76 5 7.96
7 OIVMM 83 3 80 85.00 1 0.31 1 1.62

Figure 4. The relative scores of actors grouped into each Community of Interest

14 needles in a haystack
Finally, to identify the truly key actors — those with high enough skill level and commitment and activity rate to identify them as experts in their domains — the researchers used the K-means clustering algorithm.  Using the three measurements created for each actor’s relationship with CAPECs, the 359 actors were clustered into eight clusters with similar levels of all three measurements.

Cluster chart showing distributions of accounts by activity rate, skill level, and perceived commitment

 OVERVIEW OF CLUSTERS

Cluster

Bouchard & Nguyen framework *

Centroid [Skill; Commitment; Activity]

Number
of actors

% of sample population

0 Amateurs [2.00; 22.47; 0.11] [Mid; Low; Discrete] 143 39.83
1 Pro-Amateurs [2.81; 97.62; 5.14] [High; High; Short-lived] 21 5.85
2 Professionals [2.96; 90.37; 0.28] [High; High; Active] 14 3.90
3 Pro-Amateurs [2.96; 25.32; 0.12] [High; Low; Discrete] 86 23.96
4 Amateurs [1.05; 24.32; 0.05] [Low; Low; Discrete] 43 11.98
5 Average Career Criminals [1.86; 84.81; 0.50] [Low; High; Active] 36 10.02
6 Pro-Amateurs [2.38; 18.46; 10.67] [Mid; Low; Hyperactive] 5 1.39
7 Amateurs [1.95; 24.51; 4.14] [Mid; Low; Hyperactive] 11 3.06

Figure 5: An analysis of the eight clusters with scoring based on the methodology from the framework developed from the work of criminologists Martin Bouchard and Holly Nguyen; as described above, activity rate was added as a modification to that framework. Note the low number of truly professional actors, even among the dataset of 359

One cluster of 14 actors was graded as “Professionals” — key individuals; the best in their field; with high skill and commitment and low activity rate, again because of the length of their involvement with the forums (an average of 159 days) and a post rate that averaged about one post every 3-4 days.  They focused on very specific communities of interest and did not post much beyond them, with a commitment level of 90.37%. There are inherent limitations to the analysis approach in this research— primarily because of the reliance on MITRE’s CAPEC and CVE mapping and the skill levels assigned by MITRE.

Conclusion

The research process includes defining problems and seeing how various structured approaches might lead to greater insight.  Derivatives of the approach described in this research could be used by threat intelligence teams to develop a less biased approach to identifying e-crime masterminds, and Sophos CTU will now start looking at the outputs of this data to see if it can shape or improve our existing human-led research in this area.

 

 



Source link

——————————————————–


Click Here For The Original Source.

.........................

National Cyber Security

FREE
VIEW