计算机组织_12

上传人:wt****50 文档编号:45877814 上传时间:2018-06-19 格式:PDF 页数:36 大小:72.62KB
返回 下载 相关 举报
计算机组织_12_第1页
第1页 / 共36页
计算机组织_12_第2页
第2页 / 共36页
计算机组织_12_第3页
第3页 / 共36页
计算机组织_12_第4页
第4页 / 共36页
计算机组织_12_第5页
第5页 / 共36页
点击查看更多>>
资源描述

《计算机组织_12》由会员分享,可在线阅读,更多相关《计算机组织_12(36页珍藏版)》请在金锄头文库上搜索。

1、217CS 701 Fall 2008Scheduling for Superscalar if ( d != 0)flag = 1;else flag = 0;f = d - g;Blocks 1 and 4 are equivalent. Moving an Instruction from B2 to B1 (or B3 to B1) is 1-branch speculative. Moving an Instruction from B4 to B2 (or B4 to B3) requires duplication.d = a + b d != 0flag = 1flag = 0

2、f = d - gTF1234235CS 701 Fall 2008Limits on Code MotionAssume that pseudo registers are used in generated code (prior to register allocation). To respect data dependencies: A use of a Pseudo Register cant bemoved above its definition. Memory loads cant be moved ahead of Stores to the same location.S

3、tores cant be moved ahead of either loads or stores to the same location. A load of a memory location can be moved ahead of another load of the same location (such a load may often be optimized away by equivalencing the two pseudo registers).236CS 701 Fall 2008Example (Revisited)block1:ld a,Pr1ld b,

4、Pr2add Pr1,Pr2,Pr3Stallst Pr3,dcmp Pr3,0be block3 block2:mov 1,Pr4st Pr4,flagb block4 block3:st 0,flag block4:ld d,Pr5ld g,Pr6sub Pr5,Pr6,Pr7Stallst Pr7,f In B1 and B4, the number of available registers is irrelevant in avoiding stalls. There are too few independent instructions in each block.237CS

5、701 Fall 2008Global Scheduling Restrictions (in Bernstein/ Rodeh Heuristic)1. Subprograms are divided into Regions. A region is a loop body or the subprogram body without enclosed loops. 2. Regions are scheduled inside-out. 3. Instructions never cross region boundaries. 4. All instructions move “upw

6、ard” (to earlier positions in the instruction order). 5. The original order of branches is preserved.238CS 701 Fall 2008Lesser (temporary) restrictions Include: 6. No code duplication. 7. Only 1-branch speculation. 8. No new basic blocks are created or added.239CS 701 Fall 2008Scheduling Basic Block

7、s in a CFGBasic blocks are visited and scheduled in Topological Order. Thus all of a blocks predecessors are scheduled before it is. Two levels of scheduling are possible (depending on whether speculative execution is allowed or not): 1. When Basic Block A is scheduled,only Instructions in A and blo

8、cksequivalent to A that A dominatesare considered.(Only “useful” instructions areconsidered.)240CS 701 Fall 20082. Blocks that are immediate successors of those considered in(1) are also considered.(This allows 1-branch speculation.)241CS 701 Fall 2008Candidate InstructionsWe first compute the set o

9、f basic blocks that may contribute instructions when block A is scheduled. (Either blocks equivalent to A or blocks at most 1-branch speculative.)242CS 701 Fall 2008An individual Instruction, Inst, in this set of basic blocks may be scheduled in A if: 1. It is located in A. 2. It is in a block equiv

10、alent to A and may be moved across block boundaries.(Some instructions, like calls, cantbe moved.) 3. It is not in a block equivalent to A, but may be scheduled speculatively.(Some instructions, like stores, cant be executed speculatively.)243CS 701 Fall 2008Selecting Instructions to IssueA list of

11、“ready to issue” instructions in block A and in bocks equivalent to A (or 1-branch distant from A) is maintained.All data dependencies must be satisfied and stalls avoided (if possible).N independent instructions are selected, where N is the processors issue-width.But what if more than N instruction

12、s are ready to issue?Selection is by Priority, using two Scheduling Heuristics.244CS 701 Fall 2008Delay HeuristicThis value is computed on a per-basic block basis. It estimates the worst-case delay (stalls) from an Instruction to the end of the basic block.D(I) = 0 if I is a leaf.Let d(I,J) be the d

13、elay if instruction J follows instruction I in the code schedule.D(I) = Max (D(Ji)+d(I,Ji) Ji Succ(I)245CS 701 Fall 2008Example of Delay Valuesblock1: 1. ld a,Pr1 2. ld b,Pr2 3. add Pr1,Pr2,Pr3 4. st Pr3,d 5. cmp Pr3,0 6. be block3(Assume only loads can stall.)123456000011246CS 701 Fall 2008Critical

14、 Path HeuristicThis value is also computed on a per- basic block basis. It estimates how long it will take to execute Instruction I, and all Is successors, assuming unlimited parallelism.E(I) = Execution time for instruction I(normally 1 for pipelined machines) CP(I) = E(I) if I is a leaf.CP(I) = E(

15、I) + Max (CP(Ji)+d(I,Ji) Succ(I)Ji247CS 701 Fall 2008Example of Critical Path Valuesblock1: 1. ld a,Pr1 2. ld b,Pr2 3. add Pr1,Pr2,Pr3 4. st Pr3,d 5. cmp Pr3,0 6. be block3123456122355248CS 701 Fall 2008Selecting Instructions to IssueFrom the Ready Set (instructions with alldependencies satisfied, a

16、nd which will not stall) use the following priority rules: 1. Instructions in block A and blocksequivalent to A have priority overother (speculative) blocks. 2. Instructions with the highest D values have priority. 3. Instructions with the highest CP values have priority. These rules imply that we schedule useful instructions before speculative ones, instru

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 生活休闲 > 社会民生

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号