Edinburgh Research Archive

Instruction scheduling optimizations for energy efficient VLIW processors

dc.contributor.advisor
Cintra, Marcelo
en
dc.contributor.author
Porpodas, Vasileios
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2014-01-06T15:04:38Z
dc.date.available
2014-01-06T15:04:38Z
dc.date.issued
2013-11-28
dc.description.abstract
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitioning the shared hardware resources (e.g., register file). Such designs further increase the challenges of instruction scheduling since the compiler has the additional tasks of deciding on the placement of the instructions to the corresponding clusters and orchestrating the data movements across clusters. In this thesis we propose instruction scheduling optimizations for energy-efficient VLIW processors. Some of the techniques aim at improving the existing state-of-theart scheduling techniques, while others aim at using compiler techniques for closing the gap between lightweight hardware designs and more complex ones. Each of the proposed techniques target individual features of energy efficient VLIW architectures. Our first technique, called Aligned Scheduling, makes use of a novel scheduling heuristic for hiding memory latencies in lightweight VLIW processors without hardware load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only technique, a SOM processor coupled with non-blocking caches can better cope with the cache latencies and it can perform closer to the heavyweight designs. Performance is improved by up to 20% across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The rest of the techniques target a class of VLIW processors known as clustered VLIWs, that are more scalable and more energy efficient and operate at higher frequencies than their monolithic counterparts. The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors that solves the problem of the existing state-of-the-art schedulers being very susceptible to the inter-cluster communication latency. The proposed unified clustering and scheduling technique is a hybrid scheme that performs instruction by instruction switching between the two state-of-the-art clustering heuristics, leading to better scheduling than either of them. It generates better performing code compared to the state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II benchmarks. The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures that minimizes inter-cluster communication by local caching and reuse of already received data. Unlike dynamically scheduled processors, where this can be supported by the register renaming hardware, in VLIWs it has to be done by the code generator. The proposed instruction scheduler unifies cluster assignment, instruction scheduling and communication minimization in a single unified algorithm, solving the phase ordering issues between all three parts. The proposed scheduler shows an improvement in execution time of up to 20.3% and 13.8% on average across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The last technique, applies to heterogeneous clustered VLIWs that support dynamic voltage and frequency scaling (DVFS) independently per cluster. In these processors there are no hardware interlocks between clusters to honor the data dependencies. Instead, the scheduler has to be aware of the DVFS decisions to guarantee correct execution. Effectively controlling DVFS, to selectively decrease the frequency of clusters with slack in their schedule, can lead to significant energy savings. The proposed technique (called UCIFF) solves the phase ordering problem between frequency selection and scheduling that is present in existing algorithms. The results show that UCIFF produces better code than the state-of-the-art and very close to the optimal across the Mediabench II benchmarks. Overall, the proposed instruction scheduling techniques lead to either better efficiency on existing designs or allow simpler lightweight designs to be competitive against ones with more complex hardware.
en
dc.identifier.uri
http://hdl.handle.net/1842/8291
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
“UCIFF: Unified Cluster assignment Instruction scheduling and Fast Frequency selection for heterogeneous clustered VLIW cores” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2012
en
dc.relation.hasversion
“LUCAS: Latency-adaptiveUnified Cluster Assignment and instruction Scheduling” Vasileios Porpodas, and Marcelo Cintra Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), 2013
en
dc.relation.hasversion
“CAeSaR: unified Cluster-Assignment Scheduling and communication Reuse for clustered VLIW processors” Vasileios Porpodas, and Marcelo Cintra International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES), 2013
en
dc.relation.hasversion
“Aligned Scheduling: Cache-efficient Instruction Scheduling for VLIW Processors” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2013
en
dc.subject
VLIW
en
dc.subject
Very Long Instruction Word
en
dc.subject
instruction scheduling
en
dc.subject
clustered VLIW
en
dc.title
Instruction scheduling optimizations for energy efficient VLIW processors
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 2 of 2
Name:
thesis files.zip
Size:
1.61 MB
Format:
Unknown data format
Description:
Name:
Porpodas2013.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)