Efficient Hierarchical Decomposition of Repetitive Traces for MLDriven Analysis
Conference: MBMV 2025 - 28. Workshop
03/11/0000 - 03/12/2025 at Rostock, Germany
Proceedings: ITG-Fb. 320: MBMV 2025
Pages: Language: englishTyp: PDF
Personal VDE Members are entitled to a 10% discount on this title
Authors:
Knödtel, Johannes; Reichenbach, Marc
Abstract:
In modern computing architectures, the execution of programs often results in highly repetitive and structured linear sequences of microarchitectural states and instruction traces. This is especially true for most signal and image processing task, such as stencil codes, convolution and most ANNs, due to the loop-oriented nature of their codes. When applying machine learning (ML) algorithms to such sequences, especially for tasks like predicting power consumption on specific CPUs, it becomes inefficient to process every repetitive sequence in its entirety. This inefficiency arises from the fact that, beyond a certain point, the architecture reaches a steady state, rendering further processing redundant. This makes finding a repetition representation of such data worthwhile in both the training and execution phases of these models. In this work we present a method to identify hierarchical structures, such as nested loops, and accurately detects nfold repetitions beyond simple duplication. By leveraging rolling hashes, our algorithm achieves a sufficiently high performance in the relevant cases. For large problem sizes, the approach is highly parallelizable, resulting in a significant reduction in runtime. Depending on the input data, it is able to demonstrate excellent scalability, achieving a speedup of over 152 compared to the serial version on a server with 192 physical cores, in one of the examples presented in this paper. This advancement paves the way for more efficient ML-based analysis of program behavior on complex architectures. We offer two open source implementations of this method: A more easily comprehesible Python variant and a highly optimized and parallel C version.