Igor Kozlov

Data Scientist

Retour à la liste des conférenciers et sessions

Igor Kozlov Data Scientist, Bell Canada

Igor Kozlov received his PhD from McGill University, Canada. He co-authored 9 research articles in 3 different fields, including computational studies of data from the LHC (biggest experiment in human history). Currently he works as a Data Scientist in Cyber Security at Bell Canada. He is always happy to share his passion for everything (data, computer, natural, applied, fundamental) science.


Discussion: Detection engineering

This is a Q&A session.


Q&A and discussion for the malware block, hosted and moderated by Jared Atkinson. Questions will be gathered from the audience during the four prior talks.

Discussion.

Talk: Data Science way to deal with advanced threats.


Is your SOC flooded with False Positives, but you are afraid to raise the rules' thresholds as this will allow advanced attackers to stay under the radar? Are your SOC analysts overwhelmed with the amount of data that they have to go through in order to give initial assessment of a security event? In this talk we will share Data Science methods that proved successful in addressing the above mentioned challenges in our corporate setup. Specifically, we will go over combining Unsupervised and Supervised Learning (Elastic and Scikit-Learn), advanced visualizations providing "light speed" deep dive into anomalies triage and environment monitoring (Python and Plotly dashboard). We will demonstrate how all this was used to detect distributed credential attacks that stayed under the radar of other solutions while saving time to our analysts.

The talk will start with an explanation of the flexibility that the Machine Learning (ML) approach brings compared to the static rule based one. (Throughout the talk, we will be following a credential attack T1078 example for illustrative purposes, but it will be explained how the suggested approach generalizes to other Mitre ATT&CK TTPs.) Specifically, the latter suffers when thresholds change over time and/or vary from one monitored entity (corporation/user/server/website/etc) to another. This leads to either attackers being able to "stay under the radar" or analysts being flooded with False Positives.

First part of our response to this challenge consists in utilizing Unsupervised ML for anomaly detection, which performs historical profiling of sources and outputs the measure of deviation of a given observable from the "norm". This can be done in a number of ways, but we currently use the Elastic ML component. Taking into account the recent license change announcement by Elastic, we mention that Elastic ML can be substituted with free open source solutions, for example, Python and Scikit-Learn ML library.

This is not the end of story, as advanced attackers understand that their activity is being monitored and are using automation tools to bypass detections. Thus, even though, the first part of our solution considerably reduces the amount of entities one needs to analyze (roughly from millions to tens of thousands in our environment), this is still not feasible for our analysts. Thus, the second one consists in tracking anomalies corresponding to various attackers in various log sources and leveraging Supervised ML for aggregating risk. Again, a number of options are available, but we specifically use free open source Scikit-Learn ML library.

Finally, we arrive at the last challenge: how can analysts monitor an environment abundant with anomalies of not easily interpretable ML models and exuberance of data coming from various types of logs? We address this issue by providing a front-end written in Python and using Plotly dashboard (we use only free open source components, while the latter library has also a commercial offering). It allows analysts to interactively monitor the security environment and provide prompt initial triage for any of the anomalies. It includes a novel (to our knowledge) way to succinctly visualize the most pertinent features of a large amount of events surrounding the potential incident (weighted-chains).

We conclude our presentation with a demonstration of our approach based on real, though anonymized, data. It represents a subsample of one of the distributed attacks that our solution detected, and all other available to us solutions missed. Additionally, we show why analysts performing triage reported saving time on processing of tickets.