The primary function of most current Security Information and Event Management (SIEM) products is to collect and ingest data, primarily logs, across the entire network. While traditionally the core purpose of the SIEM has been logging, data retention and compliance, SIEMs have evolved over the last decade to be more focused on identifying an ever-increasing number of complex security threats. Although analytics capabilities have mostly been bolted onto the traditional SIEM, the primary focus is on the correlation of seemingly disparate events to make it easier for security teams to manually spot patterns, representing advanced persistent threats.
However, most SIEM vendors sell threat models as an additional addon and take pre-determined pieces of data from specific sources as inputs for any rule-based machine learning (ML) threat detection model. If enough of the rules are met or if a specific condition is reached, then a security event is generated. In a nutshell, each model acts as a flowchart with a fixed set of inputs that determine the identification of a known attack. While hidden from the user, these models point security teams in the direction of individual threats, so they can further their investigation into whether they are connected and potentially part of an overall attack campaign.
Many SIEM vendors have recently added User and Entity Behavioral Analytics (UEBA) or Network Traffic Analysis (NTA) to their offerings. These capabilities rely on ingesting additional events, network traffic or flow data, which are only correlated, but not cross-analyzed to determine if they are associated, resulting in the possibility of a larger number of false positives.
What does that mean? It means a set of rules (i.e., a machine learning model or flow chart) is needed for each type of dataset that is ingested to trigger an alert. Traditional rule-based correlations don’t consolidate, or trigger based on analyzing bits of data in unison, which puts the additional burden on security teams to determine if there is an ongoing attack and its severity.
The traditional school of thought has been that SOC teams try to feed the SIEM every log and data source generated by their entire enterprise. There are two reasons for this mindset:
First, the more data ingested by the SIEM, the more alerts are created, leading to an increase in the number of false positives and challenges for security teams to prioritize what to chase down and what to ignore. Average estimates of log data collected daily vary wildly for most enterprises and can be anywhere from 4GB to 70TB per day.
Second, the licensing cost for a SIEM dramatically increases over time, often unpredictably, due to ingesting additional data. This is because most SIEM vendors charge their customers based on the amount of data ingested and collected, which also means the cost of storage and computing increases, whether physical or cloud, as the volume of data goes up. Unfortunately, CISOs often feel penalized for wanting to protect their organizations using their current SIEMs, resulting in program tradeoffs against already tight budgets.
One other challenge is that typical rule-based ML engines and siloed analytics means that when it comes to UEBA or even NTA, you will get an event triggered even if the behavior is “normal” for your organization. Not all unusual behavior is risky behavior, but when looked at as a silo, it must get flagged by the system even if it is a false positive.
An example is a salesperson incorrectly logging into their mobile app 8 times a day while driving. That may trigger a password-guessing alert but is it a threat? Now, if the login attempts are coming from a different geolocation when the salesperson’s territory is 3 states in the United States, and the login attempts are exactly 5 every 30 minutes for 8 hours, that is more indicative of machine-generated password guessing and is worthy of investigation. Most current SIEMs do not easily tie all that data together and cannot even adapt to learn that “Steve” the sales rep in Minnesota does this pattern often enough, so it will lower the priority and change the threshold for this alert.
The result is security teams suffer serious burnout, not to mention potentially missing a real attack campaign altogether.
These concerns force organizations to sometimes pick and choose which data is ingested by their SIEM, based on security coverage, adding more pressure on security teams to investigate and validate alerts. Threat actors actively exploit these gaps in coverage to hide their attacks even more effectively.
The above limitations of current SIEM and Extended Detection and Response (XDR) solutions have drastically inhibited security teams from gaining the necessary visibility to identify and respond to an attack before it can impact an organization. Beyond penalizing customers with higher licensing fees for more data ingestion, current solutions have proven inadequate in the following ways
when it comes to data ingestion and the volume of alerts:
Ingesting data from a new application, device, or updated schema, interpreting that data, and determining the security relevant context specific to that data source, requires a new data parser or changes to existing data parsers. This requires the organization to hire someone who can build/maintain these parsers, pay for professional services, or wait for the SIEM vendor to publish one, which can take months.

Since most solutions rely on correlation rules that are essentially fixed flow charts, detecting an attack becomes extraordinarily difficult if the threat actor has altered any malicious code or the attack campaign is obfuscated, spreading the actions over a long period of time, to circumvent known attacks or attack pattern models. Rule based correlations can neither adapt to the organization nor handle these variants. Security teams must gather the right threat intelligence and tweak the models or build custom models to handle any new variants or attacks. Relying on vendors in this case is generally a losing battle.
A rule-based system takes a fixed set of inputs. It cannot make use of any other data set unless it is relevant to the existing model. Therefore, any other data source, even useful data, is thrown by
the wayside. This is very short-sighted as an attack may occur at an endpoint, for example, but earlier detection of the attack comes from identifying threat activity and anomalous behavior across network, user, entity, cloud, and application data sources. A dynamic and open machine learning model that utilizes all data sources improves visibility, learns more about the organization, and can handle new attacks and variants more effectively.
Just about every SIEM and XDR solution claims unified analytics. All that means is that the analytics are on a single platform. They generate their own events, and the correlation engine simply pairs them on the same dashboard. However, there is no understanding of how the various analytics-based events are associated or if they are truly part of the same attack. This puts a huge burden on already overwhelmed security teams to sift through more data.

The Gurucul REVEAL Security Analytics Platform is designed differently from traditional SIEM and XDR platforms. The platform is built to take in as much data as possible from as many sources as possible, without penalizing customers via data ingestion based licensing.
Gurucul’s automated data ingestion engine interprets any data source, normalizes the data, and extracts security relevant context to monitor, analyze and learn as more data is ingested. Over time this negates the need for a custom parser. Gurucul has hundreds of parsers already developed. With enterprise-wide data that extends into any cloud environment, security teams no longer have to trade-off visibility for limits on data.
Gurucul’s core analytics and machine learning engine delivers true dynamic, not rule-based, ML that goes a step further with full enterprise risk engine functionality. The platform learns from the data and prioritizes the findings by constantly evaluating and calculating the actual risk posed to the specific organization.
Gurucul starts with unsupervised learning to create a baseline of the typical user and entity behaviors in an environment. Then supervised learning gives analysts the opportunity to tag and classify alerts to decide whether they are important for calculating risk in the organization. While this takes place immediately out of- the-box, Gurucul also offers the ability to customize models on-the-fly.
Just as important is the number of models Gurucul uses to narrow in on all sorts of attacks and variants. Gurucul uses third-party and internal threat intelligence to calculate a dynamic risk score, while other solutions imply aggregate third-party scores into a single static risk score, which rarely changes over time.
Gurucul’s machine learning capabilities improve detection and response by allowing security teams to chain models together to visualize the entire attack chain and associated campaign and drastically reduce MTTD/MTTR and reduce false positives.
Gurucul’s machine learning and risk calculation models are open, transparent, and customizable. This allows security teams to review and modify, if desired, how results are derived, which increases analysts’ confidence in the results and subsequent automated responses.

The effectiveness of a security program is more dependent than ever on having full visibility into the entire enterprise. Threat actors continue to take advantage of blind spots to hide their activity and delay detection as long as possible. Current solutions are inherently flawed in that they limit the amount of data and/or are cost prohibitive in helping organizations to achieve the necessary level of visibility to detect and respond to an attack effectively and rapidly. Gurucul purpose-built analytics-driven platform addresses the shortcomings of existing solutions, which often struggle in more complex hybrid environments.