《计算机组织_12》由会员分享,可在线阅读,更多相关《计算机组织_12(36页珍藏版)》请在金锄头文库上搜索。
1、217CS 701 Fall 2008Scheduling for Superscalar if ( d != 0)flag = 1;else flag = 0;f = d - g;Blocks 1 and 4 are equivalent. Moving an Instruction from B2 to B1 (or B3 to B1) is 1-branch speculative. Moving an Instruction from B4 to B2 (or B4 to B3) requires duplication.d = a + b d != 0flag = 1flag = 0
2、f = d - gTF1234235CS 701 Fall 2008Limits on Code MotionAssume that pseudo registers are used in generated code (prior to register allocation). To respect data dependencies: A use of a Pseudo Register cant bemoved above its definition. Memory loads cant be moved ahead of Stores to the same location.S
3、tores cant be moved ahead of either loads or stores to the same location. A load of a memory location can be moved ahead of another load of the same location (such a load may often be optimized away by equivalencing the two pseudo registers).236CS 701 Fall 2008Example (Revisited)block1:ld a,Pr1ld b,
4、Pr2add Pr1,Pr2,Pr3Stallst Pr3,dcmp Pr3,0be block3 block2:mov 1,Pr4st Pr4,flagb block4 block3:st 0,flag block4:ld d,Pr5ld g,Pr6sub Pr5,Pr6,Pr7Stallst Pr7,f In B1 and B4, the number of available registers is irrelevant in avoiding stalls. There are too few independent instructions in each block.237CS
5、701 Fall 2008Global Scheduling Restrictions (in Bernstein/ Rodeh Heuristic)1. Subprograms are divided into Regions. A region is a loop body or the subprogram body without enclosed loops. 2. Regions are scheduled inside-out. 3. Instructions never cross region boundaries. 4. All instructions move “upw
6、ard” (to earlier positions in the instruction order). 5. The original order of branches is preserved.238CS 701 Fall 2008Lesser (temporary) restrictions Include: 6. No code duplication. 7. Only 1-branch speculation. 8. No new basic blocks are created or added.239CS 701 Fall 2008Scheduling Basic Block
7、s in a CFGBasic blocks are visited and scheduled in Topological Order. Thus all of a blocks predecessors are scheduled before it is. Two levels of scheduling are possible (depending on whether speculative execution is allowed or not): 1. When Basic Block A is scheduled,only Instructions in A and blo
8、cksequivalent to A that A dominatesare considered.(Only “useful” instructions areconsidered.)240CS 701 Fall 20082. Blocks that are immediate successors of those considered in(1) are also considered.(This allows 1-branch speculation.)241CS 701 Fall 2008Candidate InstructionsWe first compute the set o
9、f basic blocks that may contribute instructions when block A is scheduled. (Either blocks equivalent to A or blocks at most 1-branch speculative.)242CS 701 Fall 2008An individual Instruction, Inst, in this set of basic blocks may be scheduled in A if: 1. It is located in A. 2. It is in a block equiv
10、alent to A and may be moved across block boundaries.(Some instructions, like calls, cantbe moved.) 3. It is not in a block equivalent to A, but may be scheduled speculatively.(Some instructions, like stores, cant be executed speculatively.)243CS 701 Fall 2008Selecting Instructions to IssueA list of
11、“ready to issue” instructions in block A and in bocks equivalent to A (or 1-branch distant from A) is maintained.All data dependencies must be satisfied and stalls avoided (if possible).N independent instructions are selected, where N is the processors issue-width.But what if more than N instruction
12、s are ready to issue?Selection is by Priority, using two Scheduling Heuristics.244CS 701 Fall 2008Delay HeuristicThis value is computed on a per-basic block basis. It estimates the worst-case delay (stalls) from an Instruction to the end of the basic block.D(I) = 0 if I is a leaf.Let d(I,J) be the d
13、elay if instruction J follows instruction I in the code schedule.D(I) = Max (D(Ji)+d(I,Ji) Ji Succ(I)245CS 701 Fall 2008Example of Delay Valuesblock1: 1. ld a,Pr1 2. ld b,Pr2 3. add Pr1,Pr2,Pr3 4. st Pr3,d 5. cmp Pr3,0 6. be block3(Assume only loads can stall.)123456000011246CS 701 Fall 2008Critical
14、 Path HeuristicThis value is also computed on a per- basic block basis. It estimates how long it will take to execute Instruction I, and all Is successors, assuming unlimited parallelism.E(I) = Execution time for instruction I(normally 1 for pipelined machines) CP(I) = E(I) if I is a leaf.CP(I) = E(
15、I) + Max (CP(Ji)+d(I,Ji) Succ(I)Ji247CS 701 Fall 2008Example of Critical Path Valuesblock1: 1. ld a,Pr1 2. ld b,Pr2 3. add Pr1,Pr2,Pr3 4. st Pr3,d 5. cmp Pr3,0 6. be block3123456122355248CS 701 Fall 2008Selecting Instructions to IssueFrom the Ready Set (instructions with alldependencies satisfied, a
16、nd which will not stall) use the following priority rules: 1. Instructions in block A and blocksequivalent to A have priority overother (speculative) blocks. 2. Instructions with the highest D values have priority. 3. Instructions with the highest CP values have priority. These rules imply that we schedule useful instructions before speculative ones, instru