ABCs of UEBA: A is for Analytics | Gurucul Risk Analytics

Welcome to our new blog series: ABCs of UEBA. This is not a blog series for dummies. This is a view into what makes up UEBA from A to Z, from start to finish, thoroughly, and in detail.

So, let’s start! Analytics is the engine that fuels User and Entity Behavior Analytics (UEBA). In our case, we are specifically talking about predictive security analytics.

Analytics Defined

For UEBA, analytics is the scientific process of transforming user and entity behavior data into risk-prioritized intelligence, for the purpose of driving business action. It’s the application of data science to create user and entity behavior baselines from historical access and activity. Once establishing behavior baselines (from a minimum of 3 months of historical data), it begins using analytics to monitor user and entity behavior in real-time, for the purposes of predicting and detecting anomalous activity. Real-time is the key here: analytics ingests massive amounts of data and provides insight into what’s actually going on with users and entities in your organization, as it’s happening. The output of security analytics is a single risk score for every user and entity. It is the risk score that provides actionable intelligence on potential risky situations in real-time so organizations can take corrective action.

Let’s take an example.

A user logs in at 9am on Monday. That user’s risk score is a 10. Risk scores range from 1 to 100, where 100 is the most extreme level of risk. As that user repeats logging in every day at 9am, his risk score moves from 10 to 9 to 8, as he demonstrates consistent behavior. But, when he logs in at midnight, his risk score jumps because this is anomalous behavior for this user. Taking into consideration all his other online activity – applications he normally accesses, files he works on, people he emails, what his peers are doing, etc. – analytics can begin to paint a picture of just how risky this activity is by this user. Analytics takes access and activity data feeds from multiple sources for every user and entity, and generates risk scores for each, in real-time, based on behavior.

The input for UEBA analytics is access and activity data. The processing power for UEBA analytics is machine learning models and data science. The output for UEBA analytics is risk-prioritized intelligence (risk scores).

Analytics Needs Data

The more data analyzed, the better. The key is to look at every possible access and activity feed so you can connect the dots. Consequently, connecting across applications, systems, groups, devices, and more to effectively root out truly risky anomalous behavior. If someone logs in at midnight, is that person doing a system update? Or, has his account been compromised and is this a data exfiltration attempt? Examining context across the entire environment is the only way to identify truly aberrant behavior. You don’t want to be generating false positive alerts because you didn’t realize the infrastructure team was doing an upgrade. The data is there. It just needs to be part of the analytics pool of data. The best UEBA solutions ingest the most data feeds out-of-the-box, and have the best machine learning models to do that data justice.

Below are some of the data types that Gurucul UEBA can ingest: Document Repository, Data Loss Prevention (DLP), Authentication, Source Code Management systems, HR and Administration system data, Mobile Computing data, Network and Infrastructure data, data from Security Information Event Management (SIEM) systems, Access Control Systems (Badging), data from cloud applications, data from database systems, directory and LDAP data, VPN data, data from file storage systems, data from Identity Management systems, data from Privileged Access Management systems, threat and vulnerability systems, social media data, case management systems, EMR and HER data, and financial data. You need to be able to connect the dots across multiple data siloes to see the full picture of what is happening in your environment.

Machine Learning Powers Analytics

As data is ingested, analytics leverages machine learning algorithms to process data in real time. Gurucul UEBA uses a number of analytical techniques and machine learning models to predict and detect threats, including: Link Analysis (Soundex, Fuzzy Logic), Feature Analysis (Principle Component Analysis), Behavior Analysis (Supervised, Unsupervised, Semi-supervised, Deep learning, Bayesian Network, Sentiment Analysis, Classification), Peer Group Analysis (Cluster, K-means), Outlier Analytics (LOF, K-NN) and risk scoring (Decay, Basel Index). There will be an entire blog about Machine Learning later in this series, so stay tuned for that!

Analytics Requires a Big Data Platform

Analytics is designed to operate on big data. Choosing the right UEBA big data lake can make or break your ROI. If you’re a Splunk user, you know what we’re talking about! Paying those excessive Events Per Second (EPS) data charges is painful. Be sure to select a UEBA platform that can run on your choice of big data platform. Some vendors will require you use their big data lake, even if you have your own of the same flavor (i.e. Cloudera, Hadoop, Hortonworks). This is because they have customized the data lake platform to the extent that you must use their version. As a result, you’re paying even more (translation: too much) for data processing and storage. Gurucul has decoupled our analytics from the big data platform so you can run Gurucul UEBA on your choice of big data platform. So, if you don’t have a big data lake, we’ll give you ours for free (it’s Hadoop, btw).

Gurucul made this decision because we knew our client’s backend underlying data layer could change at any time. And, we wanted to be able to support any data lake – which is how we’ve always been positioned. Further, we are not dependent on a specific format for data ingestion. Hortonworks and Cloudera both use different formats for data ingestion (for example). Even if they come up with a new, combined data format as a result of the merger, it doesn’t matter to us. There is no change on our part. We are data independent. In other words, we can ingest data from any source, in any format.

Analytics Generates Risk-Prioritized Intelligence

The output of analytics is risk-prioritized intelligence. Analytics is the math and science that provides data-based insight for you to make informed decisions. And, those decisions can be automated and orchestrated for optimum effect in environments; where you are looking at millions of events (or activities) per second.

Gurucul’s Risk Engine uses various contributing factors to calculate risk scores, including historical behavior, user context across multiple resources, type of anomaly, access level, resource and model risk ratings, etc. to aggregate the scores and provide early indication of risk providing predictive capability.

It’s all about managing risk and making risk-based decisions in a world where zero day threats can take down entire networks in seconds. Machine learning and advanced security analytics provide a way to analyze large volumes of data. Also, predict anomalous behavior that can help prevent large scale fraud and detect unknown threats. Contact Gurucul for more information on how to start your UEBA program today.

Antonyms to Analytics: Rules, Queries, Signatures, SIEM

Next: ABCs of UEBA: B is for Behavior