Edinburgh Research Archive

Testing deep neural networks across different computational configurations

dc.contributor.advisor
Rajan, Ajitha
dc.contributor.advisor
Bilen, Hakan
dc.contributor.advisor
Cano Reyes, José
dc.contributor.author
Louloudakis, Nikolaos
dc.date.accessioned
2025-07-15T08:53:42Z
dc.date.available
2025-07-15T08:53:42Z
dc.date.issued
2025-07-15
dc.description.abstract
Deep Neural Networks (DNNs) typically consist of complex architectures and require enormous processing power. Consequently, developers and researchers use Deep Learning (DL) frameworks to build them (e.g., Keras and PyTorch), apply compiler optimizations to improve their inference time performance (e.g., constant folding and operator fusion), and deploy them on hardware accelerators to parallelize their computations (e.g., GPUs and TPUs). We concisely refer to these aspects as the computational environment of Deep Neural Networks. However, the extent to which the behavior of a DNN model (i.e., output label inference correctness and computation times) is affected when different configurations are selected across the computational environment, is overlooked in the literature. For example, if a DNN model is deployed on two different GPU devices, will it give the same predictions, and how will its computation times deviate across the devices? Given that DNNs are deployed on safety-critical domains (e.g., autonomous driving), it is important to understand the extent to which DNNs are affected by these aspects. For that purpose, we present DeltaNN, a tool that allows DNN model compilation and deployment under different configurations, as well as comparison of model behavior across them. Using DeltaNN, we conducted a set of experiments on widely used Convolutional Neural Network (CNN) models performing image classification. We built these models using different DL frameworks, converted them across different DL framework configurations, compiled on a set of optimizations and deployed on GPU devices of varying capabilities. Our experiments with different configurations led to two main observations: (1) while DNNs typically generate the same predictions across different GPU devices and compiler optimization settings, this is not true when utilizing different DL frameworks, and especially when converting from one DL framework to another (e.g., converting from Keras to PyTorch), a common practice across developers to enable model portability and extensibility; and (2) optimizations are not a panacea of inference time improvement across different devices, as the same optimization strategies that improve execution times on high-end GPUs were found to degrade them when applied on models deployed on low-end GPUs. To mitigate the faults related to the conversion process, we implemented a framework called FetaFix. FetaFix performs automatic fault detection by comparing a number of aspects across the source and the converted target DNN model, such as model parameters, hyperparameters and structure. It then applies a number of fault repair strategies related to these aspects and checks how the converted model performs in comparison to its source counterpart. FetaFix was able to repair 93% of the problematic cases identified by DeltaNN. Finally, we explored the effects of faults present in the target hardware acceleration device code towards DNN model correctness. Inspired by traditional mutation testing, we built MutateNN, a tool that generates DNN model mutants containing target device code faults. We then generated a number of faults in the target device code of numerous CNN models performing classification and evaluated how these models behaved across different hardware acceleration devices. We observed that faults related to conditional operations, as well as drastic changes in arithmetic types, considerably affected model correctness. We conclude that different configurations of computational environment aspects can affect DNN model behavior. Our contributions summarize to (1) an empirical study on how the computational environment affects DNN model behavior, performed by a tool (DeltaNN) implemented specifically for that purpose, (2) a framework (FetaFix) that automatically detects faults related to model input, structure and parameters in converted DNN models across DL frameworks and repairs them, and (3) a utility (MutateNN) that introduces faults in the target code of DNN models associated with deployment on different hardware acceleration devices, and evaluates the effects of these faults on model correctness.
en
dc.identifier.uri
https://hdl.handle.net/1842/43675
dc.identifier.uri
http://dx.doi.org/10.7488/era/6207
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
“Assessing Robustness of Image Recognition Models to Changes in the Computational Environment”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in NeurIPS ML Safety Workshop 2022
en
dc.relation.hasversion
“DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE ICSME 2023
en
dc.relation.hasversion
“Exploring Effects of Computational Parameter Changes to Image Recognition Systems”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in ArXiv
en
dc.relation.hasversion
“Fault Localization for Buggy Deep Learning Framework Conversions in Image Recognition”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE/ACM ASE 2023
en
dc.relation.hasversion
“FetaFix: Automatic Fault Localization and Repair of Deep Learning Model Conversions”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in EASE 2025
en
dc.relation.hasversion
“Exploring Robustness of Image Recognition Models on Hardware Accelerators”, N. Louloudakis, P. Gibson, J. Cano, A. Rajan, in IEEE ICST Mutation 2025
en
dc.subject
software engineering
en
dc.subject
software testing
en
dc.subject
artificial intelligence
en
dc.subject
deep neural networks
en
dc.subject
differential testing
en
dc.subject
fault detection
en
dc.subject
fault repair
en
dc.subject
mutation testing
en
dc.title
Testing deep neural networks across different computational configurations
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
Louloudakis2025.pdf
Size:
16.97 MB
Format:
Adobe Portable Document Format
Description:
Revised: typo page.23

This item appears in the following Collection(s)