Clustering and K-Means

Identify Outlier Access with “Clustering and K-Means”

It’s the gift that keeps on giving! Gurucul’s #MachineLearningMadness sessions at Black Hat USA 2018. We continue to roll out the details of the Machine Learning Models that Gurucul Risk Analytics uses to detect and stop insider threats, data exfiltration, privileged access abuse, fraud and more.

Gurucul Machine Learning Model: Clustering and K-Means

How does the Clustering and K-Means machine learning model work, what does it do?  This powerful model groups data into clusters. In our case, it’s looking at different variables to determine who has access to what. It looks at activity — log patterns, HR attributes and behavior patterns. It also looks at access — what do you have access to? What should you have access to?

Dynamic Peer Grouping is a mechanism within Gurucul Risk Analytics that detects dynamic clusters using real-time data and short term data groups. Gurucul Risk Analytics implements machine learning defined peer groups for unsupervised and supervised algorithms. These are clusters of users put into multiple contiguous groups based on attributes derived through analytics, rather than simply relying on static attributes about the users or accounts. Gurucul Risk Analytics then computes baseline behavior models for each of these dynamic peer groups and leverages them for comparative outlier calculation when a dynamic peer group member executes anomalous events. Machine learning algorithms, which are self-learning and self-training, continually update these dynamic baselines.

Use Case: Identify Outlier Access

Clustering and K-Means can be used for traditional role mining – to clean up access by providing additional visibility to access that is being used. The average user has more than 100 entitlements and that can be very difficult to manage manually. Through the use of the Clustering and K-Means machine learning model, we can detect access outliers by analyzing what’s going on with dynamic peer groups of users.

Let’s look at an example.

On a lovely Saturday afternoon, the company access data shows an employee from IT working on your production finance system. This is seemingly an outlier activity for an IT employee, as it’s not typical for someone in this role to be accessing a production finance system, much less on a Saturday afternoon.  So, is this risky activity? As well, at the exact same time and on the same day, you have a business analyst accessing and working on that same production finance application.

If we examine these two access activities individually, we might perceive a problem. Yet, if we combine these two access data points dynamically, the situation may appear to be less risky.  Read on.

Now, let’s add an additional person from the Finance organization, a financial analyst, and they are also accessing the same production finance application and on the same Saturday. We have three instances of three different people, from different work groups, all accessing the production finance system at the same time and on the same day. So, what’s going on?

What’s most likely taking place in this scenario is these employees are working together to perform a system upgrade or are resolving a production issue occurring in the financial system. From a real-world viewpoint, where we can examine traditional static data attributes such as job title or department number, these three employees would not be considered a relevant peer group. From a behavioral analytics standpoint, these three employees do comprise a dynamically generated peer group, as there is system data logging their actions of accessing the same production finance system at the same time.

Dynamic peer groups are clusters of users that are created as Gurucul Risk Analytics ingests log data, in near real time, all internal to the machine learning algorithms. Dynamic peer groups are fairly transient, yet they can be retained for future reference.

What are the Benefits of Clustering and K-Means?

The benefits of the Clustering and K-Means machine learning model in Gurucul Risk Analytics are numerous. Key features are the ability to flag and remediate or revoke questionable access. We know that most user identities are over provisioned, and if those identities are compromised, tremendous damage can occur.

Clustering and K-Means helps to refine, resolve and reduce false positives. By employing Clustering and K-Means machine learning, plus applying dynamic peer grouping technology, Gurucul Risk Analytics can reduce false positives 10x compared to the use of static groups from directories like Active Directory.

If you need help reducing false positives with risk analytics, contact us for details.

Share this page: