Testing deep neural networks across different computational configurations

Louloudakis, Nikolaos

Testing deep neural networks across different computational configurations

Simple item page

dc.contributor.advisor

Rajan, Ajitha

dc.contributor.advisor

Bilen, Hakan

dc.contributor.advisor

Cano Reyes, José

dc.contributor.author

Louloudakis, Nikolaos

dc.date.accessioned

2025-07-15T08:53:42Z

dc.date.available

2025-07-15T08:53:42Z

dc.date.issued

2025-07-15

dc.description.abstract

Deep Neural Networks (DNNs) typically consist of complex architectures and require enormous processing power. Consequently, developers and researchers use Deep Learning (DL) frameworks to build them (e.g., Keras and PyTorch), apply compiler optimizations to improve their inference time performance (e.g., constant folding and operator fusion), and deploy them on hardware accelerators to parallelize their computations (e.g., GPUs and TPUs). We concisely refer to these aspects as the computational environment of Deep Neural Networks. However, the extent to which the behavior of a DNN model (i.e., output label inference correctness and computation times) is affected when different configurations are selected across the computational environment, is overlooked in the literature. For example, if a DNN model is deployed on two different GPU devices, will it give the same predictions, and how will its computation times deviate across the devices? Given that DNNs are deployed on safety-critical domains (e.g., autonomous driving), it is important to understand the extent to which DNNs are affected by these aspects. For that purpose, we present DeltaNN, a tool that allows DNN model compilation and deployment under different configurations, as well as comparison of model behavior across them. Using DeltaNN, we conducted a set of experiments on widely used Convolutional Neural Network (CNN) models performing image classification. We built these models using different DL frameworks, converted them across different DL framework configurations, compiled on a set of optimizations and deployed on GPU devices of varying capabilities. Our experiments with different configurations led to two main observations: (1) while DNNs typically generate the same predictions across different GPU devices and compiler optimization settings, this is not true when utilizing different DL frameworks, and especially when converting from one DL framework to another (e.g., converting from Keras to PyTorch), a common practice across developers to enable model portability and extensibility; and (2) optimizations are not a panacea of inference time improvement across different devices, as the same optimization strategies that improve execution times on high-end GPUs were found to degrade them when applied on models deployed on low-end GPUs. To mitigate the faults related to the conversion process, we implemented a framework called FetaFix. FetaFix performs automatic fault detection by comparing a number of aspects across the source and the converted target DNN model, such as model parameters, hyperparameters and structure. It then applies a number of fault repair strategies related to these aspects and checks how the converted model performs in comparison to its source counterpart. FetaFix was able to repair 93% of the problematic cases identified by DeltaNN. Finally, we explored the effects of faults present in the target hardware acceleration device code towards DNN model correctness. Inspired by traditional mutation testing, we built MutateNN, a tool that generates DNN model mutants containing target device code faults. We then generated a number of faults in the target device code of numerous CNN models performing classification and evaluated how these models behaved across different hardware acceleration devices. We observed that faults related to conditional operations, as well as drastic changes in arithmetic types, considerably affected model correctness. We conclude that different configurations of computational environment aspects can affect DNN model behavior. Our contributions summarize to (1) an empirical study on how the computational environment affects DNN model behavior, performed by a tool (DeltaNN) implemented specifically for that purpose, (2) a framework (FetaFix) that automatically detects faults related to model input, structure and parameters in converted DNN models across DL frameworks and repairs them, and (3) a utility (MutateNN) that introduces faults in the target code of DNN models associated with deployment on different hardware acceleration devices, and evaluates the effects of these faults on model correctness.

en

dc.identifier.uri

https://hdl.handle.net/1842/43675

dc.identifier.uri

http://dx.doi.org/10.7488/era/6207

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

“Assessing Robustness of Image Recognition Models to Changes in the Computational Environment”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in NeurIPS ML Safety Workshop 2022

en

dc.relation.hasversion

“DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE ICSME 2023

en

dc.relation.hasversion

“Exploring Effects of Computational Parameter Changes to Image Recognition Systems”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in ArXiv

en

dc.relation.hasversion

“Fault Localization for Buggy Deep Learning Framework Conversions in Image Recognition”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE/ACM ASE 2023

en

dc.relation.hasversion

“FetaFix: Automatic Fault Localization and Repair of Deep Learning Model Conversions”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in EASE 2025

en

dc.relation.hasversion

“Exploring Robustness of Image Recognition Models on Hardware Accelerators”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE ICST Mutation 2025

en

dc.subject

software engineering

en

dc.subject

software testing

en

dc.subject

artificial intelligence

en

dc.subject

deep neural networks

en

dc.subject

differential testing

en

dc.subject

fault detection

en

dc.subject

fault repair

en

dc.subject

mutation testing

en

dc.title

Testing deep neural networks across different computational configurations

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Louloudakis2025.pdf
Size:: 16.97 MB
Format:: Adobe Portable Document Format
Description:: Revised: typo page.23

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection