A man runs through the airport. Is he trying to catch a flight or fleeing from authorities? Context is key to understanding behavior. And, behavior is key to predicting risk.
The biggest challenge that enterprises face today is the flood analysts deal with involving too many alerts with very little context to characterize what they’re seeing. This results in not acting upon actionable intelligence. Due to the overwhelming number of alerts, many security teams do random sampling of alerts. Following this methodology, there is a risk of missing significant threats that require immediate attention.
The single most critical factor in differentiating anomalous behavior from risky behavior is context. An IT administrator logs in on a Sunday afternoon at 1pm. This is anomalous behavior for this user, but is it risky behavior? If we examine the behavior of his peers, we see they are also logging into the system at the same time. If we look further into his behavior, we can see that he is downloading and installing operating system updates. Without the additional context including his peers and his activity, we might have created an unnecessary alert for that 1pm Sunday afternoon login.
Rules No Longer Apply
Machine learning is a force multiplier. Rules-based detection alone is unable to keep pace with the increasingly complex demands of threat and breach detection. This occurs primarily because rules are based on what (little) we know about the data. And in turn generate excessive alerts. Since humans lack the abilities to predict what future cyber-attacks will look like, we can’t write rules for these scenarios. In contrast, machine learning and statistical analysis can uncover anomalies. These can be lurking in datasets that humans would not otherwise recognize or detect. For example, these powerful techniques can leverage useful and predictive cues that are too noisy and highly dimensional for either humans or traditional software to link together.
Context Is Critical
What makes it difficult to detect cyber threats is context, or more accurately, the lack of it. The mind-numbing quantity and volume of log files and outputs from security tools are typically standalone and siloed sources of data. Rarely are these rich sources of intelligence correlated or linked with one another to achieve greater understanding of what access and activities have taken place.
Risk scoring in combination with rich context is a prerequisite for successful predictive security analytics. Security teams need to be able to examine access patterns and behaviors in a way that allows them to see important relationships between multiple sets of activities, possibly taking place in different locations concurrently. This is where data science and machine learning are invaluable. Organizations need to know who, what, where and why, and they must know it virtually in real time. Machine learning overcomes the seemingly insurmountable challenges of creating links between mountains of dissimilar and disconnected data sources.
Add Network and User Context
One of the biggest pain points of most modern organizations is their inability to conclusively tie an event to a specific user account. Gurucul solves this problem by making user account/device metadata the very first input into its systems. User account and entity profiles can be built using logs from siloed systems such as firewalls, anti-virus, and DLP. Gurucul combines identity and network-based alerting to arm SOC teams with an end-to-end picture of an incident to answer essential questions, such as:
- Which user account triggered the incident?
- What device was this from?
- What part of the network was this from?
- Who is the owner of the device/subnet?
- Is the behavior of this user account normal, relative to peer accounts?
In addition to monitoring how identities are being both used and managed, other critical data sources within an organization’s computing environment should be examined to provide more context beyond who and what. Example data sources include network access, event and flow data, DLP data, sys logs, vulnerability scanning data, log files from IT applications, HR records, etc. In many cases, this data may already be consolidated into a log event management or SIEM solution. This vast array of data, when combined with information on how identities are being used by both humans and machines, creates a rich source of context that can be mined using threat analytics and anomaly detection. When we further view identity as a threat plane, hundreds of attributes can become models in machine learning algorithms to predict and prevent security threats.
Rich Context Lowers False Positives
Too many existing SIEM and logging systems utilize a security approach that delivers a high volume of false positives. What’s necessary, and what a mature User and Entity Behavior Analytics (UEBA) solution provides, is context. Even when there’s an alert on something an employee has never done before. For example — or that no one on their team has ever done before — you have a lot of context provided. The result is that a SOC analyst will be provided with deeper understanding surrounding any alert. As a result, providing critically valuable additional feedback and insight.
True machine learning thrives on large repositories of big data for increased processing and data variety over legacy infrastructure. The richer and more inclusive the sources of data, the higher potential to provide an accurate context for risk scores, along with fewer false positives. Solutions not drawn from big data have restricted processing capacity due to less data variety.
A Single View of Access and Activity Is Mandatory
To be armed with comprehensive predictive security analytics in their environment, security teams need a UEBA solution that offers actionable intelligence through machine learning. The UEBA solution must be able to create a single and unified view of user and entities’ access and activity throughout the network, on-premises, in the cloud and across all mobile devices that they interact with. It must maintain the ability to self-learn and self-train. This assures the exclusion of false positives from feedback and to leverage rich context provided by big data. Especially essential within hybrid environments, machine learning thrives on context, sourcing from big data in both rich volume and broad variety. Machine learning ‘on a siloed diet’ produces stilted results and is simply counter-productive.
Gurucul UEBA has a metadata-driven data format, which allows the system to map to any data source – online or offline, internal or external – and to pull information into a data lake, regardless of the data format. It’s an open choice as to which data lake to use—Hadoop, Cloudera, Hortonworks, Amazon EMR, etc. The customer can choose a preferred big data product, or use one provided by Gurucul at no charge.
The More Data Sources and the More Data Ingested, the Better.
This broadens the view of the activities and behaviors for each user account or entity by drawing them together in context. It increases the learning ability of the machine learning engine. Gurucul links together data from the disparate cross-channel sources on the basis of identity – either a person or an entity. The person could have any sort of role — a customer, an employee, a customer service representative, a cashier, a medical billing agent, an investment broker, etc. Every data record associates in some way to a specific identity. This helps to build a baseline of behavior and activity for that identity.
Numerous Factors Determine Context
The following are examples of how UEBA uses context. If a user, Frank, is logging in, is this the normal time that he logs in? Is he logging in from the laptop he always uses? Is he logging in from a geo location based on his IP address which he normally uses? The system he is trying to access – is it the one he normally uses? Answering all these questions gives us the context. And that either increases or decreases the level of trust a SOC correlates with the behavior of this identity. It’s no longer just, “Yes this is Frank because he gave the right password and username.” It’s also, “Do these things look right? Should more information be a requirement to validate it’s Frank (e.g., step-up authentication)?” Or, should a SOC analyst use this data to understand the broader context through UEBA risk scoring and highlight it as a potentially risky connection? In addition, it’s not uncommon for an outsider threat to first manifest itself within an environment as an insider threat due to account compromise or account hijacking. SOC teams must constantly assess these potential scenarios by levering context.
Validate Identity with Context
While two-factor and multi-factor authentication are important, we must now incorporate the context — the broader and richer context — in which you validate trust in an identity. Username and password is one method to validate an identity and gain trust. But, there is additional and critical activity context to consider around the circumstances of access to an asset. For example:
- What time is the employee using it?
- How often or how normal is it that the employee logs into the application?
- Where is he logging in from?
- What machine is he logging in from?
- What type of data is he accessing and how long is he using it for?
Context details like these aid security teams in truly determining the validity of the individual using the identity. UEBA provides the force multipliers by assessing large amounts of access and activity data. In addition, delivering predictive risk scores to assist in focusing a security analysts’ attention.
Behavior Context Predicts Risk
Security leaders should assume attackers are inside their networks and that they must be detected and shut down. The most effective way to detect them and to identify high risk is through their behavior. What we need to be looking for, and seeing, in the future, is the unknown, most often from the insider aspect. The question is: what is their behavior and what is the relative risk of that behavior? Yet, within a growing sea of digital exhaust the scope of the challenge now lies well beyond manual human capabilities. Without a precise predictive behavior analytics solution, driven by advanced machine learning and drawing from big data for context, the ability to predict risky behavior and stop unknown attacks becomes simply impossible.
The essential component of a successful and best-of-breed UEBA solution involves machine learning. Drawing from big data for context, across all environments, machine learning delivers the mechanism to enable a broad range of issues to be managed. Also, it serves to find the “red herrings”, to discover what’s going on and what’s different or out of the ordinary. Advanced machine learning is designed to recognize that just because a behavior is different, it doesn’t mean it’s bad or risky, and it eliminates potential red herrings from consideration.
Prev: ABCs of UEBA: B is for Behavior Next: ABCs of UEBA: D is for Data