Aruna Rajasekhar, Lead Data Scientist, Gurucul
Jan 17, 2018
A SIEM captures giant amount of data. Security analysts then need to review this data to find abnormalities in user behavior. In data science, this is sometimes called a needle-in-a-haystack problem that cannot be solved just by throwing a bunch of humans at it to review each log entry. Enter machine learning (ML). If we know what we’re looking for, then we can sit down and write a set of detection rules. For example, all traffic coming in from a certain IP needs to be flagged, or to flag users who download more than XX bytes of a SharePoint data resource afterhours. But if you don’t know what to look for, and where or when, then how can you come up with a rule to look for it? Or, how does one build a whitelist/blacklist around this? How does one harness a third party’s threat intelligence, like a SIEM, to find something when we don’t know what we’re looking for?
In the world of anomaly detection this is called “an unknown unknown”. As the data scientists behind predictive security analytics solutions, we need to catch that anomaly whose signature we do not have. We use ML to handle this. More specifically, this involves an unsupervised or semi-supervised outlier detection algorithm. This technique builds the user’s profile over a period of days, or weeks, and at detection time compares the detection time behavior against an established baseline for the user, or of their peers, to recognize an abnormality. This baseline user profile is actually a collection of profiles. One might think of it as a profile for every attribute in the user’s data record. Hence an abnormality in any attribute in the data for the user will stand out against their established baseline. This shifts the focus from just looking at individual events to monitoring overall behaviors.
UEBA solutions based on machine learning consolidate and analyze user information to establish normal behavior and highlight deviations from established behavior, and then risk score them accordingly. Frequently, solutions like UEBA escalate expected, or well understood, patterns of anomalous behavior. However, because UEBA applies advanced analytics to all available data captured in an environment, it can also detect changes in patterns, and new patterns, as they emerge – the unknown unknowns. This is not possible with a simple rules-based solution.
From a data scientist’s perspective, traditional SIEMs are only part of the solution. This is because they remain reliant on a rules-based and rule-only approach. This will only detect the anomalies the platform “knows about” (or known unknowns) and is targeted to detect, while unknown unknowns, the damaging ones slip by. As a result, SIEMs cannot ensure a holistic security environment. To learn more about implementation of UEBA in Gurucul Risk Analytics, click here.