Text-Mining Research Group

On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data

In recent years, detecting anomalies in real-world computer networks has become a more and more challenging task due to the steady increase of high-volume, high-speed and high-dimensional streaming data, for which ground truth information is not available. Efficient detection schemes applied on networked embedded devices need to be fast and memory-constrained, and must be capable of dealing with concept drifts when they occur. Different approaches for unsupervised online outlier detection have been designed to deal with these circumstances in order to reliably detect malicious activity. In this paper, we introduce a novel framework called PCB-iForest, which generalized, is able to incorporate any ensemble-based online OD method to function on streaming data. Carefully engineered requirements are compared to the most popular state-of-the-art online methods with an in-depth focus on variants based on the widely accepted isolation forest algorithm, thereby highlighting the lack of a flexible and efficient solution which is satisfied by PCB-iForest. Therefore, we integrate two variants into PCB-iForest—an isolation forest improvement called extended isolation forest and a classic isolation forest variant equipped with the functionality to score features according to their contributions to a sample’s anomalousness. Extensive experiments were performed on 23 different multi-disciplinary and security-related real-world datasets in order to comprehensively evaluate the performance of our implementation compared with off-the-shelf methods. The discussion of results, including AUC, F1 score and averaged execution time metric, shows that PCB-iForest clearly outperformed the state-of-the-art competitors in 61% of cases and even achieved more promising results in terms of the tradeoff between classification and computational costs.

You will find the article also on the publisher's website.

Keywords: intrusion detection; outlier detection; streaming data; network security; online learning; unsupervised learning; machine learning

Year: 2021

Journal ISSN: 2079-9292

Download:

Full text [978 kB]

View record in Web of Science®

Authors of this publication:

Michael Heigl

E-mail: heigl@kiv.zcu.cz

Michael is currently working as a research associate at the institute ProtectIT at the Deggendorf Institute of Technology and holds a Ph.D. degree from the University of West Bohemia for his dissertation on machine learning enhanced network-based anomaly detection. He is specialized in improving outlier detection methods for streaming data applications.

Kumar Ashutosh Anand

Andreas Urmann

Dalibor Fiala

Phone: +420 377 63 2429
E-mail: dalfia@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/~dalfia/

Dalibor is the former research group coordinator, an analyst with CCA Group a.s., and an associate professor at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in data mining, web mining, information retrieval, informetrics, and information science.

Martin Schramm

Robert Hable

Related Projects:

Data Mining for Computer Networks Security
Authors:	Michael Heigl, Laurin Doerr, Dalibor Fiala
Desc.:	Novel data mining methods for the enhancement of computer networks security using advanced outlier detection techniques on streaming data are investigated.