Evaluating Feature Encodings for Unsupervised Machine Learning Classification in Automotive Ethernet Network

Evaluating Feature Encodings for Unsupervised Machine Learning Classification in Automotive Ethernet Network

Categorical attributes such as MAC and IP addresses constitute an integral part of Ethernet network data, and play a crucial role in modern network infrastructure. Representing these intrinsic entities with high cardinality presents a considerable performance challenge pertaining to machine learning tasks. In order to better manage the representations of the categorical attributes found in network data, this work presents new methods for transforming them. Some of these encoding schemes are designed using domain knowledge to limit the number of dimensions introduced in data while performing transformations. This study uses two specific Autoencoder deep neural networks for the unsupervised classification task to help assess the classification performance for the proposed encoding schemes. These varied encodings used to transform Ethernet network data from a real vehicle serve as a novel contribution to the feature engineering for analyzing the network data using machine learning approaches. The evaluation results show that the proposed techniques have a key impact on the classification performance, and the encoding schemes IE and ISF performed reasonably well in all three attack scenarios for each model.

This is a preprint version of the article.

Keywords: feature encoding, high cardinality, anomaly detection, automotive ethernet, categorical attributes, unsupervised machine learning

Year: 2025

Download: download Full text [190 kB]

Authors of this publication:


Michael Heigl


E-mail: heigl@kiv.zcu.cz

Michael is currently working as a research associate at the institute ProtectIT at the Deggendorf Institute of Technology and holds a Ph.D. degree from the University of West Bohemia for his dissertation on machine learning enhanced network-based anomaly detection. He is specialized in improving outlier detection methods for streaming data applications.

Dalibor Fiala


Phone: +420 377 63 2429
E-mail: dalfia@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/~dalfia/

Dalibor is the former research group coordinator, an analyst with CCA Group a.s., and an associate professor at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in data mining, web mining, information retrieval, informetrics, and information science.