Every year organizations generate more data, and security teams are expected to make sense of not just a greater volume of data from the myriad of log sources that exist in corporate environments, but new sources of logs and data as well. In this talk we look at the data scientist methodology and some of the statistical and machine learning techniques available to defenders of corporate infrastructure. After explaining the strengths and weaknesses of the different techniques we will walk through analyzing some data and spend some time explaining the python code and what would be needed to scale the code from analyzing hundreds of thousands of data points to tens of millions. This is not a talk about SIEM, and related technologies. SIEM is good at collecting logs to a central location and performing on the fly inspection and correlations, but rarely has the ability to engage in deeper statistical analysis, or employ machine learning techniques.
A white paper, slides and code will be prepared for this presentation.