Fully understanding a botnet often requires a researcher to go beyond standard reverse-engineering practices and explore the malware’s network traffic. The latter can provide meaningful information on the evolution of a malware’s activity. However, it is often disregarded in malware research due to time constraints and publication pressures.
The workshop is about overcoming such constraints by providing a powerful workflow to conduct quick analysis of malicious traffic. The data science approach presented capitalizes on open-source tools (Wireshark/Tshark, Bash with GNU parallel) and valuable python libraries (ipython, mitmproxy, pandas, matplotlib). During the workshop, participants will do practical technical labs with datasets from our recent botnet investigation. They will learn how to quickly find patterns, plot graphs and interpret data in a meaningful way. Although the exercises will focus on botnet’s data, the tools and skills learned will be useful to all sorts of context. Moreover, to ensure that participants take the most out of the workshop, it will be built in a way to allow them to easily replicate the data-analysis environment at home and reproduce similar analysis with their own traffic data.
Workshop Outline
The workshop will be divided in three sections. The first section will present the contextual information needed for participants to start the practical technical labs afterwards. The second section will focus on analyzing the botnet’s C&C traffic in Pcaps. The third section will emphasize on graphs and the use of the mitmproxy library to analyze decrypted traffic.
- Introduction
- Lab 1 – Extract SOCKS Traffic with Wireshark
- Lab 2 – Extract SOCKS Traffic with Tshark
- Introduction to Jupyter Notebook and it’s shell integration (xargs, parallel)
- Lab 3 – Search in mitmproxy logs
- Lab 4 – Manipulate Dataframes with Pandas
- Lab 5 – Graph the Data using Plotly
Tools
Due to the short time allotted, we ask participants to download and install Wireshark locally on their computer (https://www.wireshark.org/download.html) during the introduction. For the other tools (tshart, bash, GNU parallel, the anaconda package, mitmproxy, pandas, numby, plotly), we will provide a hosted environment in which the tools will be installed and the scripts, the data and the exercises will be available.