Efficient cross-architecture simulation of multicore systems
dc.contributor.advisor
Franke, Bjoern
dc.contributor.advisor
Spink, Tom
dc.contributor.advisor
Topham, Nigel
dc.contributor.advisor
B¨ohm, Igor
dc.contributor.author
Kristien, Martin
dc.date.accessioned
2024-10-24T14:36:15Z
dc.date.available
2024-10-24T14:36:15Z
dc.date.issued
2024-10-24
dc.description.abstract
Computer systems are continually becoming more complex and powerful in all areas of computing, from high-end servers to embedded devices. Machine virtualisation has become an instrumental technology to manage these vast hardware resources by separating applications running within virtual machines (guests) from the real hardware (hosts). This form of virtualisation is well supported by modern hardware as long as the guest and the host machines’ Instruction Set Architectures (ISA) are matching. A mismatching (cross-ISA) virtualisation is more challenging, while remaining an important technology for hardware prototyping and software development.
In this context, Instruction Set Simulators (ISSs) are developed to provide functional cross-architecture simulation. A wide range of techniques can be utilised to achieve high simulation speeds for single-core guest applications. However, multicore support is limited. The state of the art tools are often making trade-offs between accuracy and speed. The lack of multicore support is exacerbated if the guest
application requires a full-system simulation and/or exhibits dynamically generated code.
This thesis presents three contributions to accuracy, memory efficiency, and simulation speed of multicore cross-ISA simulation. Firstly, it presents a scalable and provably correct scheme for emulating atomic instructions. Most commonly in cross-ISA simulation, Reduced Instruction Set Computer (RISC) type guest atomics, LoadLinked/Store-Conditional (LL/SC), are emulated on Complex Instruction Set Computer (CISC) type host hardware providing a complex Compare-And-Swap (CAS) atomic instruction. Although the semantics of the RISC and CISC atomics are different, ISSs often emulate LL/SC using CAS instructions for improved performance. However, this results in a divergent execution inside a simulator relative to the real hardware. The scheme presented in this thesis faithfully emulates the LL/SC semantics while maintaining scalability to multicore systems.
Efficient use of simulator memory is especially important for interpreter-based ISSs, which enable quick prototyping without extensive engineering efforts and easy integration with instrumentation, profiling, and debugging tools. However, the computational overheads of the Fetch and Decode stages in instruction interpretation significantly increase the overall simulation time. This thesis proposes a number of memory efficient caching strategies with focus on memory sharing among multiple simulated cores. The novel schemes exhibit up to 1.57× speedup relative to state-of-the-art baseline scheme, while requiring only 27% of cache memory.
Dynamic Binary Translation typically translates and caches multiple guest instructions as a unit, resulting in faster simulation speed. However, code cache maintenance has to account for guest applications modifying its own instructions, necessitating invalidation of cached code. This maintenance mechanism in most ISSs is falsely triggered by applications dynamically generating code in proximity of previously cached code, resulting in needless code invalidation and poor performance. This thesis proposes an improved code tracking scheme that allows optimised guest code execution even when data and code are interleaved by the guest, achieving up to 1.42× speedup relative to state-of-the-art page-granular code protection mechanism.
en
dc.identifier.uri
https://hdl.handle.net/1842/42345
dc.identifier.uri
http://dx.doi.org/10.7488/era/5058
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Martin Kristien, Tom Spink, Brian Campbell, Susmit Sarkar, Ian Stark, Bj¨orn Franke, Igor B¨ohm, and Nigel Topham. 2020. Fast and correct load-link/storeconditional instruction handling in DBT systems. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 39, 11 (2020), 3544-3554
en
dc.subject
simulation
en
dc.subject
hardware virtualisation
en
dc.subject
multicore
en
dc.title
Efficient cross-architecture simulation of multicore systems
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- KristienM_2024.pdf
- Size:
- 4.61 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

