ABCs of UEBA: O is for OUTLIER

Outlier Detection in UEBA: Discover how User and Entity Behavior Analytics (UEBA) identifies outliers to detect insider threats, prevent data breaches, and enhance cybersecurity with machine learning.

The goal of User and Entity Behavior Analytics (UEBA) is to detect and stop known and unknown threats. The key to predicting unknown threats is to monitor user and entity behavior – to recognize when that behavior starts being anomalous, and then to ascertain whether that anomalous behavior is risky. The term “outlier” is used to describe data that is outside the norm, different from expected behavior. And as we’ve already learned, context is required to determine if outlier behavior is malicious – or just abnormal. This post will share machine learning models used by Gurucul UEBA to detect miscreant outliers.

Clustering and K-Means ML Model

The Clustering and K-Means machine learning model groups data into clusters. It looks at different variables to determine who has access to what. It looks at activity logs, HR attributes and behavior patterns. It also looks at access: what do users have access to? What should users have access to? Gurucul UEBA uses this model to detect outlier access.

Gurucul uses Clustering and K-Means for what we affectionately call Dynamic Peer Grouping. This is a mechanism to detect dynamic clusters using real-time data and short term data groups. Gurucul UEBA implements machine learning defined peer groups for unsupervised and supervised algorithms. These are clusters of users put into multiple contiguous groups based on attributes derived through analytics, rather than simply relying on static attributes about the users or accounts. An example would be: everyone in a supermarket at 10:00am on a Saturday. The static group would be Saturday morning supermarket employees – as they are known members of this group. The dynamic peer group would be shoppers and employees, as shoppers are dynamically added to the group as they shop.

Gurucul computes behavior baselines for dynamic peer groups and leverages them for comparative outlier calculation when a dynamic peer group member executes anomalous events. Let’s say someone robs the supermarket – that is outlier and risky behavior.

How Clustering and K-Means Detects Outlier Access

Clustering and K-Means can be used for traditional role mining – to clean up access by providing additional visibility to access that is being used. The average user has more than 100 entitlements and that can be very difficult to manage manually. Through the use of the Clustering and K-Means machine learning model, we can detect access outliers by analyzing what’s going on with dynamic peer groups of users.

Let’s look at an example.

On a Sunday afternoon, the company access data shows an employee from IT working on your production finance system. This is an outlier activity for an IT employee, as it’s not typical for someone in this role to be accessing a production finance system, much less on a Sunday afternoon. Is this activity risky? Let’s say you have a business analyst accessing and working on that same production finance system at the exact same time.

If we examine these two access activities individually, we might perceive a problem. Yet, if we combine these two access data points dynamically, the situation may appear to be less risky. What’s most likely taking place in this scenario is these employees are working together to perform a system upgrade or resolve a production issue. From a real-world viewpoint, where we can examine traditional static data attributes such as job title or department number, these employees would not be considered a relevant peer group. From a behavioral analytics standpoint, these employees do comprise a dynamically generated peer group, as there is system data logging their actions of accessing the same production finance system at the same time.

Dynamic peer groups are clusters of users that are created as Gurucul UEBA ingests log data, in near real time, all internal to the machine learning algorithms. Dynamic peer groups are fairly transient, yet they can be retained for future reference. By employing Clustering and K-Means machine learning, plus applying dynamic peer grouping technology, Gurucul UEBA can reduce false positives ten times more when compared with using static groups from directories like Active Directory.

Linear Regression ML Model

The Linear Regression machine learning model compares user’s online activities on one axis, and user accounts with similar entitlements on the other axis. Events that stray from the norm are questionable. This is one method used by Gurucul UEBA to identify anomalous outlier activity.

Linear Regression is also used to detect privileged access abuse. This ML model compares the activities of “privileged account” identities with authorized access and account activities of other user accounts with similar sets of entitlements. This model zeroes in on any events that have strayed from what’s believed to be normal behavior for that user account.

How Linear Regression Identifies Privileged Access Abuse

There are advanced tools available where authorized system administrators can change configurations within a corporate network. However, just because a system administrator can run these tools doesn’t mean they should. Gurucul UEBA can identify and send an alert when an admin is discovered running a non-approved application. Gurucul examines and classifies web traffic. When a non-approved application is being run by an unauthorized user account, the event is flagged. In this way, Gurucul quickly detects when a system administrator is running a non-approved application and alerts your investigative team to initiate action to prevent possible damage to critical company data assets.

This is a very powerful capability, especially in instances with cloud computing, where there’s far less monitoring of user account activity. Gurucul UEBA can catch systems administrators in the act of elevating their own or others account privileges in the cloud, and identify where they are misusing those privileges. Linear Regression is perfect for Google G-Suite Admin monitoring. You can catch system administrators who create accounts, elevate privileges, and make company documents public so they’re available for download by anyone from anywhere. This type of outlier behavior is definitely anomalous and very risky.

Abnormal PowerShell Command Execution ML Model

The Abnormal PowerShell Command Execution machine learning model tracks all of the access that potentially grants elevated access to users, plus it identifies abnormally frequent system access or bypass attempts. It uses clustering and frequency analysis to detect outlier behavior.

PowerShell is a tool commonly used by systems administrators in Windows environments, enabling the execution of several commands in a very quick fashion. Unfortunately, it’s also a tool that hackers use to undertake nefarious activity at the command-line. PowerShell is increasingly used by cybercriminals as part of their attacks’ tool chain, mainly for installing backdoors, downloading malicious content and for lateral movement. These actions may go undetected because the activity can look normal, yet Gurucul UEBA is able to detect and root out these exploitive actions using clustering and frequency analysis.

Let’s look at some examples:

A system administrator elevates the privileges on his regular user account to perform administrative work. However, he shouldn’t be using his regular user account to execute administrative tasks. Clustering exposes these activities as outlier-type behavior for that user.
Someone who is not a system administrator attempts to execute a PowerShell command. Regular users should not have administrative access. Clustering would identify this as outlier behavior for a regular user account.

How Abnormal PowerShell Command Execution Detects Fileless Malware

Fileless malware is malicious code that exists only in memory. Because this type of malware never gets installed on the target computer’s hard drive, it doesn’t exist as a file, so it eludes intrusion prevention systems and antivirus programs. Users systems typically become infected with fileless malware via visiting malicious websites. Malvertisements are well known fileless malware offenders. Fileless malware exploits the vulnerabilities of PowerShell to conduct backdoor activities.

Traditional antivirus and anti-malware security software aren’t looking for fileless malware attacks. They aren’t designed to stop this type of attack, so they can’t find them. You need something better. The Gurucul Abnormal PowerShell Command Execution machine learning model will identify unusual spikes in PowerShell processes. It will detect if someone who is not a system administrator attempts to execute a PowerShell command. It will recognize outlier behaviors such as a regular user (whose account suddenly has elevated administrative privileges) cruising around your network and probing into servers and vulnerability management scans. If a server has not been scanned in a while and it suddenly begins doing odd things, such as attempting to communicate to IP addresses that aren’t normal, this is outlier behavior. Gurucul UEBA will detect this outlier behavior and will track that server closely to ensure that it has not been compromised by fileless malware.

Given the well documented issues of hacker’s exploits using PowerShell, it’s imperative that you track all PowerShell command line processes that are running in your Windows environments. The Abnormal PowerShell Command Execution machine learning model in Gurucul UEBA can detect whether you’re a victim of a fileless malware attack or not. Gurucul compares current behavior using frequency and clustering to previously baselined behavior, to detect fileless malware attacks. This is extremely difficult to do without the power of big data, clustering and analytics.

Outlier Categorical ML Model

We close this post with the aptly named Outlier Categorical Model machine learning model. This model uses a Bayesian layer to classify categorical based data and gives the relative probability of an occurrence being an outlier based on prior observations. The Outlier Categorical Model takes into account previously observed behavior patterns, and will automatically flag anything outside of the norm.

Let’s look at an example. Let’s say you have a rule to detect logins after midnight, but someone logs in 1 or 2 seconds before midnight. This type of subtle variation is impossible for a rule to catch, but would get picked up by the Outlier Categorical Model. You would assign a category of “late night” as logins between midnight and 4:00am. The model assigns a probability between 0 and 1 for a transaction, and that gets mapped to score between 0 and 100 where the high score indicates the likelihood of a transaction being an outlier. A user logging in at 1 second to midnight will have a very high score and thus get flagged as anomalous by this model.

How the Outlier Categorical Model Detects Merchant Fraud

The Outlier Categorical Model detects changes in transaction behavior patterns like:

Multiple transactions from a device or location not seen before
Transactions that suddenly spike in amount – for example, going from $1000 transaction amounts to $25,000.

The Outlier Categorical Model looks at past behavior to identify fraudulent transactions. With the Outlier Categorical Model, you perform machine learning training on a dataset. There is no need to retrain the training dataset after initial training is completed. Prediction is real time on incoming data.

Let’s look at example. A rogue process changes merchant and credit card details. This is basically an unauthorized way of changing bank account and credit details of a merchant. These are typically done by a customer support representative or account manager. In this case, a rogue process periodically changes the credit card and bank account information for merchants, and then changes it back a short amount of time after certain transactions have gone through. These types of attacks are very difficult to detect without the Outlier Categorical Model.

The benefit of the Outlier Categorical Model is that it can reliably detect unknown unknowns. It looks into the transactions of merchants by analyzing transactions of changes in a bank account – like the adding or removing of credit cards – along with other important categorical data like devices and location. It then gives the probability score of what may be a true positive when it comes to detecting fraudulent transactions.

Here’s the Outlier Categorical Model secret sauce: it analyzes transactions as a pattern instead of treating them as single events. It establishes a relationship between a sequence of events instead of looking at each event individually. Why is this so important? This gives the benefit of doing rich analytics to detect patterns across resources, multiple applications, and/or categorical fields instead of doing analysis on a single resource. This is why the Outlier Categorical Model can detect unknown unknowns. Rules don’t find the deviation in patterns.

Has this piqued your interest in learning more about how Gurucul UEBA can detect and stop outlier miscreants on your network? Contact us for a product demonstration. We are here to help!

Prev: ABCs of UEBA: N is for NETWORK Next: ABCs of UEBA: P is for PRIVILEGE