Machine learning offers new ways to ID abuse by trusted users
Leslie K. Lambert
CSO | May 30, 2016
In my last post I discussed how machine learning could be used to detect phishing-based account compromise attacks using a real-world use case from the recent Verizon Data Breach Digest. This time I’ll examine how to detect insider threats using similar techniques.
The example I’ve chosen involves an organization in the middle of a buyout that was using retention contracts to prevent employee attrition. To find out what other employees were being offered, a middle manager acquired IT administrator credentials from a colleague and friend. He used these credentials to access the company’s onsite spam filter and spy on the CEO’s incoming email. The abuse didn’t stop there. The same credentials were also used to browse sensitive file shares and conduct other unauthorized actions.
This scenario is chock full of information security issues. We have social engineering taking place, plus unauthorized and inappropriate use of privileged access credentials to access files, including the confidential email archive on a spam filtering appliance.
So, why wasn’t the company able to detect this activity until an after-the-fact forensic investigation, despite having ample data to support its clear and direct discovery?
First, it’s apparent that the victim organization was unaware of the illicit activities of this specific IT administrator and that they were not monitoring the access patterns and behaviors of elevated privilege accounts. While it’s not clear what the IT administrator’s specific job function was, we know the access privileges assigned to his user account were wide reaching and very powerful, spanning the gamut of file shares to email archives from spam filtering infrastructure. Meanwhile, the access privileges were poorly configured since he was able to traverse several different types of systems with just one set of credentials.
What makes it difficult to detect insider threats like this one is context. For example, the mind numbing volume of log files and outputs from security tools are typically standalone, siloed sources of data. Rarely are these rich sources of intelligence “compared” with one another to achieve greater understanding of what access and activities have taken place.
Instead, we need to be able to examine these access patterns and behaviors in a way that allows us to see important relationships between multiple sets of activities — possibly taking place in different locations, all at the same time. This is where data science and machine learning can help.
In this case, machine learning could have been used to analyze the data already in hand. This would have likely revealed suspicious activities including accessing inappropriate files (that belonged to others), how and where they were being moved or copied, and non-typical access to the spam filtering infrastructure and confidential email archives.
IT technology and know-how has moved way beyond verifying the simple heartbeats of IT applications and infrastructure servers. We need to know who, what, where and why. Machine learning overcomes the seemingly insurmountable challenges of creating links between mountains of dissimilar and disconnected data sources. Not being aware of the online activities within an organization and not monitoring access credentials in a vigilant manner is a lack of our basic responsibilities as security professionals. It demonstrates a lack of due care for the organizations we support.
In Lockheed Martin’s Cyber Kill Chain model, the “Exploitation” phase is where security professionals are intended to perform systematic examination of rich data sets that exist inside the organization. What better way to do this than via link analysis techniques in machine learning, that enable us to proactively detect and prevent persistent threats.
In my next post, I’ll look at how machine learning can detect data exfiltration attempts when remote access malware has breached an organization’s network security defenses.