Instruction scheduling optimizations for energy efficient VLIW processors
dc.contributor.advisor
Cintra, Marcelo
en
dc.contributor.author
Porpodas, Vasileios
en
dc.contributor.sponsor
Engineering and Physical Sciences Research Council (EPSRC)
en
dc.date.accessioned
2014-01-06T15:04:38Z
dc.date.available
2014-01-06T15:04:38Z
dc.date.issued
2013-11-28
dc.description.abstract
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled
processors. Instruction scheduling for these processors is performed by the compiler
and is therefore a critical factor for its operation. Some VLIWs are clustered, a design
that improves scalability to higher issue widths while improving energy efficiency and
frequency. Their design is based on physically partitioning the shared hardware resources
(e.g., register file). Such designs further increase the challenges of instruction
scheduling since the compiler has the additional tasks of deciding on the placement
of the instructions to the corresponding clusters and orchestrating the data movements
across clusters.
In this thesis we propose instruction scheduling optimizations for energy-efficient
VLIW processors. Some of the techniques aim at improving the existing state-of-theart
scheduling techniques, while others aim at using compiler techniques for closing
the gap between lightweight hardware designs and more complex ones. Each of the
proposed techniques target individual features of energy efficient VLIW architectures.
Our first technique, called Aligned Scheduling, makes use of a novel scheduling
heuristic for hiding memory latencies in lightweight VLIW processors without hardware
load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only
technique, a SOM processor coupled with non-blocking caches can better cope with
the cache latencies and it can perform closer to the heavyweight designs. Performance
is improved by up to 20% across a range of benchmarks from the Mediabench II and
SPEC CINT2000 benchmark suites.
The rest of the techniques target a class of VLIW processors known as clustered
VLIWs, that are more scalable and more energy efficient and operate at higher frequencies
than their monolithic counterparts.
The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors
that solves the problem of the existing state-of-the-art schedulers being very
susceptible to the inter-cluster communication latency. The proposed unified clustering
and scheduling technique is a hybrid scheme that performs instruction by instruction
switching between the two state-of-the-art clustering heuristics, leading to better
scheduling than either of them. It generates better performing code compared to the
state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II
benchmarks.
The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures
that minimizes inter-cluster communication by local caching and reuse of already
received data. Unlike dynamically scheduled processors, where this can be supported
by the register renaming hardware, in VLIWs it has to be done by the code generator.
The proposed instruction scheduler unifies cluster assignment, instruction scheduling
and communication minimization in a single unified algorithm, solving the phase ordering
issues between all three parts. The proposed scheduler shows an improvement
in execution time of up to 20.3% and 13.8% on average across a range of benchmarks
from the Mediabench II and SPEC CINT2000 benchmark suites.
The last technique, applies to heterogeneous clustered VLIWs that support dynamic
voltage and frequency scaling (DVFS) independently per cluster. In these processors
there are no hardware interlocks between clusters to honor the data dependencies.
Instead, the scheduler has to be aware of the DVFS decisions to guarantee correct
execution. Effectively controlling DVFS, to selectively decrease the frequency of clusters
with slack in their schedule, can lead to significant energy savings. The proposed
technique (called UCIFF) solves the phase ordering problem between frequency selection
and scheduling that is present in existing algorithms. The results show that UCIFF
produces better code than the state-of-the-art and very close to the optimal across the
Mediabench II benchmarks.
Overall, the proposed instruction scheduling techniques lead to either better efficiency
on existing designs or allow simpler lightweight designs to be competitive
against ones with more complex hardware.
en
dc.identifier.uri
http://hdl.handle.net/1842/8291
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
“UCIFF: Unified Cluster assignment Instruction scheduling and Fast Frequency selection for heterogeneous clustered VLIW cores” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2012
en
dc.relation.hasversion
“LUCAS: Latency-adaptiveUnified Cluster Assignment and instruction Scheduling” Vasileios Porpodas, and Marcelo Cintra Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), 2013
en
dc.relation.hasversion
“CAeSaR: unified Cluster-Assignment Scheduling and communication Reuse for clustered VLIW processors” Vasileios Porpodas, and Marcelo Cintra International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES), 2013
en
dc.relation.hasversion
“Aligned Scheduling: Cache-efficient Instruction Scheduling for VLIW Processors” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2013
en
dc.subject
VLIW
en
dc.subject
Very Long Instruction Word
en
dc.subject
instruction scheduling
en
dc.subject
clustered VLIW
en
dc.title
Instruction scheduling optimizations for energy efficient VLIW processors
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
This item appears in the following Collection(s)

