Good vs bad data quality in feature analysis machine learning model

Identify Good UEBA Data with “Feature Analysis”

Gurucul is sharing details on our most effective machine learning models. Up next is a critical precursor to any User and Entity Behavior Analytics (UEBA) deployment.

Gurucul Machine Learning Model: Feature Analysis

How does the Feature Analysis machine learning model work, what does it do?  The Feature Analysis machine learning model examines data sets and uncovers features that can be leveraged in high-end machine learning scenarios to separate out good data from bad data.

Feature Analysis illustrates how two groups of samples (e.g., true-positive samples and false-positive samples) can be separated by using certain features.  This refined view helps users to make better choices in terms of feature selection.

When it comes to predicting the level of your success with big data, analysts like to say, “Garbage in, Garbage out.”  If you’re not ingesting the correct data, you won’t get good results.  In fact, you may not get any meaningful results.  When it comes to behavior based security analytics, it’s all about the quality of the data.

Use Case: UEBA Data Preparation

Gurucul Risk Analytics makes recommendations, right out of the box, about which features you should be using for your UEBA implementations.  Gurucul Risk Analytics analyzes data sets, plus analyzes each feature in the data set, to determine whether the data is worthy of further investigation or not.  If there is an attribute, as simplistic as a name, that’s something you really just can’t run analytics on.

Using a real example, we recently had a customer who had collected and stored VPN data for the past 3 years. However, it turned out that their data collection method was not enabled to monitor key events, yet it was not known that the log data collected over those 3 years was meaningless.  Using the Feature Analysis machine learning algorithm, Gurucul Risk Analytics was able to quickly point out the issue, enabling the customer to perform upstream fixes to their data collection methods.  Feature Analysis ensures that you are capturing and focusing on data that will yield meaningful context over the long term.

In another example, let’s say you have a log file containing thousands of IP addresses and are wondering if there is a critical anomaly, or set of anomalies present in that list.  Applying Feature Analysis to this data first tells us if this is a set of data that will support a credible analysis, and if so, then further analyses may reveal any anomalies that exist in the data.

Here’s the deal: you don’t want to be chasing the wrong attributes.  Analyzing bad data makes your UEBA deployment ineffective, and wastes time – time that you can spend on solving other issues.  Gurucul Risk Analytics will tell you which attributes will provide you with quality results.

What are the Benefits of Feature Analysis?

This machine learning model is a key differentiator for Gurucul Risk Analytics.  If your data is bad, how can you learn from it?  How can you focus on the most problematic issues?  Our Feature Analysis machine learning model lasers in on the right data to learn from so you get optimum results.

With Feature Analysis, our goal is to find the smallest set of the available features such that the fitted model will reach its maximal predictive value. Why? Firstly, minimizing the number of features we include lowers the complexity of the model, in turn reducing bias. Secondly, lower dimensional data takes a lot less computation time. Finally, in practice, models based on smaller sets of variables are frequently also more interpretable.

Share this page: