Yuriy Arbitman

Data Scientist

Retour à la liste des conférenciers et sessions

Yuriy Arbitman Data Scientist, Imperva

As a data scientist in Imperva, I develop machine learning solutions for various cyber security projects. I'm fascinated by the wonders that data science and machine learning bring to the world. The wealth of open-source frameworks enable us to build systems today at scale and ease unthinkable just several years ago. In the last 20+ years I've been working in the hi-tech industry in Israel. I am lucky to have worked for several great companies in engineering, management and research positions. I hold an M.Sc. in Computer Science from the Weizmann Institute in Israel.

Discussion: Detection & Response Block

This is a Q&A session. Moderators will take audience questions both remotely and on-site via sli.do.

Hosted panel discussion and Q&A.

Hosted panel discussion and Q&A.

Talk: Obfuscation classification via Machine Learning

Talks will be streamed on YouTube and Twitch for free.

In this work we build a machine learning classifier that distinguishes between cleartext and obfuscated code. Starting with JavaScript, we extend our techniques to Python and PHP.

Client-side protection is one of the key pillars on Imperva’s quest to protect its customers from attackers. Obfuscation is one of the ubiquitous methods to hide malicious code. Being able to distinguish between cleartext JavaScript documents and obfuscated ones is a first but crucial step in this endeavor.

In this work we first survey the variety of methods and techniques used to obfuscate JavaScript code. We analyze 10+ open-source JavaScript obfuscators and show their similarities and differences. For example, all obfuscators employ variable renaming, but the output distributions differ across obfuscators (e.g., in terms of the lengths of the renamed variables).

This allows us to extract several families of features. Some of them require careful feature engineering, while others are more general and follow well-known NLP techniques. Next, we survey prior art from the literature and discuss several natural approaches to this problem.

Finally, we suggest obfuscator-agnostic methods to build state-of-the-art machine learning classifier for this problem.

Although we used JavaScript as a starting point of our research, our techniques generalize nicely to additional programming languages. In other languages, as opposed to JavaScript, obfuscation is a much stronger evidence for maliciousness. Therefore our techniques there are of special interest.