Python and Machine Learning: How to use algorithms to create yara rules with a malware zoo for hunting

Retour à la liste des conférenciers et sessions

Machine learning can be useful for helping analysts and reverse engineers. This presentation will explain how to transform data to use machine-learning algorithms to categorize a malware zoo. To cluster a set of (numerical) objects is to group them into meaningful categories. We want objects in the same group to be closer (or more similar) to each other than to those in other groups. Such groups of similar objects are called clusters. When data is labeled, this problem is called supervised clustering. It is a difficult problem but easier than the unsupervised clustering problem we have when data is not labeled. All our experiments have been done with code written in Python and we have mainly used scikit-learn. With the dataset the Zoo, we present how to use unsupervised algorithms on labeled datasets to validate the model. When the model is finalized, the resulting clusters can be used to automatically generate yara rules in order to hunt down the malware.