
Evaluating Feature Encodings for Unsupervised Machine Learning Classification in Automotive Ethernet Network
Categorical attributes such as MAC and IP addresses constitute an integral part of Ethernet network data, and play a crucial role in modern network infrastructure. Representing these intrinsic entities with high cardinality presents a considerable performance challenge pertaining to machine learning tasks. In order to better manage the representations of the categorical attributes found in network data, this work presents new methods for transforming them. Some of these encoding schemes are designed using domain knowledge to limit the number of dimensions introduced in data while performing transformations. This study uses two specific Autoencoder deep neural networks for the unsupervised classification task to help assess the classification performance for the proposed encoding schemes. These varied encodings used to transform Ethernet network data from a real vehicle serve as a novel contribution to the feature engineering for analyzing the network data using machine learning approaches. The evaluation results show that the proposed techniques have a key impact on the classification performance, and the encoding schemes IE and ISF performed reasonably well in all three attack scenarios for each model.
This is a preprint version of the article.
Keywords: feature encoding, high cardinality, anomaly detection, automotive ethernet, categorical attributes, unsupervised machine learning
Year: 2025
Full text [190 kB]Authors of this publication:

Michael Heigl
E-mail: heigl@kiv.zcu.cz

Dalibor Fiala
Phone: +420 377 63 2429
E-mail: dalfia@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/~dalfia/
