Efficient cross-architecture simulation of multicore systems

Kristien, Martin

Efficient cross-architecture simulation of multicore systems

Simple item page

dc.contributor.advisor

Franke, Bjoern

dc.contributor.advisor

Spink, Tom

dc.contributor.advisor

Topham, Nigel

dc.contributor.advisor

B¨ohm, Igor

dc.contributor.author

Kristien, Martin

dc.date.accessioned

2024-10-24T14:36:15Z

dc.date.available

2024-10-24T14:36:15Z

dc.date.issued

2024-10-24

dc.description.abstract

Computer systems are continually becoming more complex and powerful in all areas of computing, from high-end servers to embedded devices. Machine virtualisation has become an instrumental technology to manage these vast hardware resources by separating applications running within virtual machines (guests) from the real hardware (hosts). This form of virtualisation is well supported by modern hardware as long as the guest and the host machines’ Instruction Set Architectures (ISA) are matching. A mismatching (cross-ISA) virtualisation is more challenging, while remaining an important technology for hardware prototyping and software development. In this context, Instruction Set Simulators (ISSs) are developed to provide functional cross-architecture simulation. A wide range of techniques can be utilised to achieve high simulation speeds for single-core guest applications. However, multicore support is limited. The state of the art tools are often making trade-offs between accuracy and speed. The lack of multicore support is exacerbated if the guest application requires a full-system simulation and/or exhibits dynamically generated code. This thesis presents three contributions to accuracy, memory efficiency, and simulation speed of multicore cross-ISA simulation. Firstly, it presents a scalable and provably correct scheme for emulating atomic instructions. Most commonly in cross-ISA simulation, Reduced Instruction Set Computer (RISC) type guest atomics, LoadLinked/Store-Conditional (LL/SC), are emulated on Complex Instruction Set Computer (CISC) type host hardware providing a complex Compare-And-Swap (CAS) atomic instruction. Although the semantics of the RISC and CISC atomics are different, ISSs often emulate LL/SC using CAS instructions for improved performance. However, this results in a divergent execution inside a simulator relative to the real hardware. The scheme presented in this thesis faithfully emulates the LL/SC semantics while maintaining scalability to multicore systems. Efficient use of simulator memory is especially important for interpreter-based ISSs, which enable quick prototyping without extensive engineering efforts and easy integration with instrumentation, profiling, and debugging tools. However, the computational overheads of the Fetch and Decode stages in instruction interpretation significantly increase the overall simulation time. This thesis proposes a number of memory efficient caching strategies with focus on memory sharing among multiple simulated cores. The novel schemes exhibit up to 1.57× speedup relative to state-of-the-art baseline scheme, while requiring only 27% of cache memory. Dynamic Binary Translation typically translates and caches multiple guest instructions as a unit, resulting in faster simulation speed. However, code cache maintenance has to account for guest applications modifying its own instructions, necessitating invalidation of cached code. This maintenance mechanism in most ISSs is falsely triggered by applications dynamically generating code in proximity of previously cached code, resulting in needless code invalidation and poor performance. This thesis proposes an improved code tracking scheme that allows optimised guest code execution even when data and code are interleaved by the guest, achieving up to 1.42× speedup relative to state-of-the-art page-granular code protection mechanism.

en

dc.identifier.uri

https://hdl.handle.net/1842/42345

dc.identifier.uri

http://dx.doi.org/10.7488/era/5058

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Martin Kristien, Tom Spink, Brian Campbell, Susmit Sarkar, Ian Stark, Bj¨orn Franke, Igor B¨ohm, and Nigel Topham. 2020. Fast and correct load-link/storeconditional instruction handling in DBT systems. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 39, 11 (2020), 3544-3554

en

dc.subject

simulation

en

dc.subject

hardware virtualisation

en

dc.subject

multicore

en

dc.title

Efficient cross-architecture simulation of multicore systems

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: KristienM_2024.pdf
Size:: 4.61 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection