Using Machine Learning Techniques to Identify Botnet Traffic

Carl Livadas, Bob Walsh, David Lapsley, Tim Strayer

Reviewer

Hannah Sim (hsim034@auckland.ac.nz)

Reference

Livadas, C., Walsh, B., Lapsley, D., Strayer, T.: Using Machine Learning Techniques to Identify Botnet Traffic. In: Proceedings of 2nd IEEE LCN Workshop on Network Security (November 2006).

Keywords

botnet, IRC, command and control, J48, naive Bayes, Bayesian network

Related Papers

Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee. "BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection." In Proceedings of the 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008.

Goebel, J. and Holz, T. 2007. Rishi: identify bot contaminated hosts by IRC nickname evaluation. In Proceedings of the First Conference on First Workshop on Hot Topics in Understanding Botnets (Cambridge, MA).

Gu, G., Porras, P., Yegneswaran, V., Fong, M., and Lee, W. 2007. BotHunter: detecting malware infection through IDS-driven dialog correlation. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (Boston, MA, August 06 - 10, 2007).

Mazzariello, C. 2008. IRC Traffic Analysis for Botnet Detection. In Proceedings of the 2008 the Fourth international Conference on information Assurance and Security (September 08 - 10, 2008). IAS. IEEE Computer Society, Washington, DC.

Jing Liu, Yang Xiao, Kaveh Ghaboosi, Hongmei Deng, and Jingyuan Zhang, “Botnet: Classification, Attacks, Detection, Tracing, and Preventive Measures,” EURASIP Journal on Wireless Communications and Networking, vol. 2009.

Maryam Feily, Alireza Shahrestani, Sureswaran Ramadass, "A Survey of Botnet and Botnet Detection," Third International Conference on Emerging Security Information, Systems and Technologies, 2009

Summary

The paper began by introducing the concept of a botnet and how machine learning techniques can be used to detect them. They divided this into two subtasks, identifying chat traffic and identifying chat traffic that is likely to be botnet traffic.

They used network traffic traces from a university wireless network and set up their own safe botnet, and tried out several classification schemes. In the first part they compared the performance of J48, naive Bayes and Bayesian network classifiers--a naive Bayes classifier performed the best and got low false negative and positive rates for the real IRC traces as well as a low false negative rate for the botnet testbed data. Some J48 and Bayesian network classifiers were successful when tested on the real IRC traces but they did not work well on the botnet testbed data. None of the classification schemes were successful in distinguishing between botnet and non-botnet flows, which they believe was mainly caused by an overly simple labelling criterion.

Evaluation

The paper is well structured and the authors explained what botnets are in appropriate detail, although I think it might have been better if they had gone into a bit more detail about some other things like the TCP features that were relevant to the model and the SubSeven trojan. There were also a couple of acronyms they used without ever writing them out in full first which could be confusing.

The authors explained their experiments in enough detail that others should be able to repeat them.