Instruction scheduling optimizations for energy efficient VLIW processors

Porpodas, Vasileios

Instruction scheduling optimizations for energy efficient VLIW processors

Simple item page

dc.contributor.advisor

Cintra, Marcelo

en

dc.contributor.author

Porpodas, Vasileios

en

dc.contributor.sponsor

Engineering and Physical Sciences Research Council (EPSRC)

en

dc.date.accessioned

2014-01-06T15:04:38Z

dc.date.available

2014-01-06T15:04:38Z

dc.date.issued

2013-11-28

dc.description.abstract

Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitioning the shared hardware resources (e.g., register file). Such designs further increase the challenges of instruction scheduling since the compiler has the additional tasks of deciding on the placement of the instructions to the corresponding clusters and orchestrating the data movements across clusters. In this thesis we propose instruction scheduling optimizations for energy-efficient VLIW processors. Some of the techniques aim at improving the existing state-of-theart scheduling techniques, while others aim at using compiler techniques for closing the gap between lightweight hardware designs and more complex ones. Each of the proposed techniques target individual features of energy efficient VLIW architectures. Our first technique, called Aligned Scheduling, makes use of a novel scheduling heuristic for hiding memory latencies in lightweight VLIW processors without hardware load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only technique, a SOM processor coupled with non-blocking caches can better cope with the cache latencies and it can perform closer to the heavyweight designs. Performance is improved by up to 20% across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The rest of the techniques target a class of VLIW processors known as clustered VLIWs, that are more scalable and more energy efficient and operate at higher frequencies than their monolithic counterparts. The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors that solves the problem of the existing state-of-the-art schedulers being very susceptible to the inter-cluster communication latency. The proposed unified clustering and scheduling technique is a hybrid scheme that performs instruction by instruction switching between the two state-of-the-art clustering heuristics, leading to better scheduling than either of them. It generates better performing code compared to the state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II benchmarks. The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures that minimizes inter-cluster communication by local caching and reuse of already received data. Unlike dynamically scheduled processors, where this can be supported by the register renaming hardware, in VLIWs it has to be done by the code generator. The proposed instruction scheduler unifies cluster assignment, instruction scheduling and communication minimization in a single unified algorithm, solving the phase ordering issues between all three parts. The proposed scheduler shows an improvement in execution time of up to 20.3% and 13.8% on average across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The last technique, applies to heterogeneous clustered VLIWs that support dynamic voltage and frequency scaling (DVFS) independently per cluster. In these processors there are no hardware interlocks between clusters to honor the data dependencies. Instead, the scheduler has to be aware of the DVFS decisions to guarantee correct execution. Effectively controlling DVFS, to selectively decrease the frequency of clusters with slack in their schedule, can lead to significant energy savings. The proposed technique (called UCIFF) solves the phase ordering problem between frequency selection and scheduling that is present in existing algorithms. The results show that UCIFF produces better code than the state-of-the-art and very close to the optimal across the Mediabench II benchmarks. Overall, the proposed instruction scheduling techniques lead to either better efficiency on existing designs or allow simpler lightweight designs to be competitive against ones with more complex hardware.

en

dc.identifier.uri

http://hdl.handle.net/1842/8291

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

“UCIFF: Unified Cluster assignment Instruction scheduling and Fast Frequency selection for heterogeneous clustered VLIW cores” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2012

en

dc.relation.hasversion

“LUCAS: Latency-adaptiveUnified Cluster Assignment and instruction Scheduling” Vasileios Porpodas, and Marcelo Cintra Conference on Languages, Compilers and Tools for Embedded Systems (LCTES), 2013

en

dc.relation.hasversion

“CAeSaR: unified Cluster-Assignment Scheduling and communication Reuse for clustered VLIW processors” Vasileios Porpodas, and Marcelo Cintra International Conference on Compilers Architecture and Synthesis for Embedded Systems (CASES), 2013

en

dc.relation.hasversion

“Aligned Scheduling: Cache-efficient Instruction Scheduling for VLIW Processors” Vasileios Porpodas, and Marcelo Cintra International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2013

en

dc.subject

VLIW

en

dc.subject

Very Long Instruction Word

en

dc.subject

instruction scheduling

en

dc.subject

clustered VLIW

en

dc.title

Instruction scheduling optimizations for energy efficient VLIW processors

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: thesis files.zip
Size:: 1.61 MB
Format:: Unknown data format
Description:

Download

Name:: Porpodas2013.pdf
Size:: 1.06 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection