《高性能处理器》PPT课件.ppt

上传人:m**** 文档编号:575459497 上传时间:2024-08-18 格式:PPT 页数:25 大小:333.50KB
返回 下载 相关 举报
《高性能处理器》PPT课件.ppt_第1页
第1页 / 共25页
《高性能处理器》PPT课件.ppt_第2页
第2页 / 共25页
《高性能处理器》PPT课件.ppt_第3页
第3页 / 共25页
《高性能处理器》PPT课件.ppt_第4页
第4页 / 共25页
《高性能处理器》PPT课件.ppt_第5页
第5页 / 共25页
点击查看更多>>
资源描述

《《高性能处理器》PPT课件.ppt》由会员分享,可在线阅读,更多相关《《高性能处理器》PPT课件.ppt(25页珍藏版)》请在金锄头文库上搜索。

1、取指和取数都要访问同一个存储器取指和取数都要访问同一个存储器Detection is easy in this case! (right half highlight means read, left half write)MemInstr.OrderTime (clock cycles)LoadInstr 1Instr 2Instr 3Instr 4ALUMemRegRegALUMemRegMemRegALUMemRegMemRegALURegMemRegALUMemRegMemRegMem结构相关:由访存引起的结构相关结构相关:由访存引起的结构相关2024/8/181USTC CS AN

2、Hong取指延迟一拍进行取指延迟一拍进行MemInstr.OrderTime (clock cycles)LoadInstr 1Instr 2Instr 3Instr 4ALUMemRegRegALUMemRegMemRegALUMemRegMemRegALURegMemRegALUMemRegMemRegMemStall结构相关的解决方案:阻塞结构相关的解决方案:阻塞2024/8/182USTC CS AN Hong控制相关:控制相关: Whats the Problem?Instruction FetchDecodeExecuteMemory AccessWritebackNeed add

3、ress hereCompute address hereBranch Delaybne r2, #0, r3add r4,r5,r6sub r7,r8,r9TNT例:例:BEQ rs, rt, offset if Rrs = Rrt then PC - 尽快尽快获得转移的目标地址(分支地址相关)获得转移的目标地址(分支地址相关)2024/8/183USTC CS AN HongnStall: wait until decision is clearnImpact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) = slo

4、wnMove decision to end of decodesave 1 cycle per branchInstr.OrderTime (clock cycles)AddBeqLoadALUMemRegMemRegALUMemRegMemRegALURegMemRegMemLostpotentialControl Hazard Solution #1: Stall2024/8/184USTC CS AN HongnPredict: guess one direction then back up if wrongnImpact: 0 lost cycles per branch inst

5、ruction if right, 1 if wrong (right 50% of time)nMore dynamic scheme: history of 1 branch ( 90%)Instr.OrderTime (clock cycles)AddBeqLoadALUMemRegMemRegALUMemRegMemRegMemALURegMemRegControl Hazard Solution #2: Predict2024/8/185USTC CS AN HongnDelayed Branch: Redefine branch behavior (takes place afte

6、r next instruction) nImpact: 0 clock cycles per branch instruction if can find instruction to put in “slot” ( 50% of time)nAs launch more instruction per clock cycle, less usefulInstr.OrderTime (clock cycles)AddBeqMiscALUMemRegMemRegALUMemRegMemRegMemALURegMemRegLoadMemALURegMemRegControl Hazard Sol

7、ution #3: Delayed Branch2024/8/186USTC CS AN HongI: add r1,r2,r3J: sub r4,r1,r3Data Hazard on R1nRead After Write (RAW) InstrJ tries to read operand before InstrI writes itnCaused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.2024/8/187USTC C

8、S AN Hongadd r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11Data Hazard on r1: Read after write hazard (RAW)2024/8/188USTC CS AN HongInstr.OrderTime (clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11IFID/RFEXMEMWBALUImRegDmRegALUImRegDmRegALUImRegDmRegImALURegDmReg

9、ALUImRegDmRegData Hazard on r1: Read after write hazard (RAW)nDependencies backwards in time are hazards2024/8/189USTC CS AN HongInstr.OrderTime (clock cycles)add r1,r2,r3sub r4,r1,r3and r6,r1,r7or r8,r1,r9xor r10,r1,r11IFID/RFEXMEMWBALUImRegDmRegALUImRegDmRegALUImRegDmRegImALURegDmRegALUImRegDmRegD

10、ata Hazard Solution: Forwardingn“Forward” result from one stage to another2024/8/1810USTC CS AN HongRegTime (clock cycles)lw r1,0(r2)sub r4,r1,r3IFID/RFEXMEMWBALUImRegDmALUImRegDmRegForwarding (or Bypassing): What about Loads?nDependencies backwards in time are hazardsnData Hazard Even with Forwardi

11、ngnCant solve with forwarding ,Must delay/stall instruction dependent on loads2024/8/1811USTC CS AN HongRegTime (clock cycles)lw r1,0(r2)sub r4,r1,r3IFID/RFEXMEMWBALUImRegDmALUImRegDmRegStallForwarding (or Bypassing): What about Loads ?nDependencies backwards in time are hazardsnData Hazard Even wit

12、h ForwardingnCant solve with forwarding ,Must delay/stall instruction dependent on loads2024/8/1812USTC CS AN HongTry producing fast code fora = b + c;d = e f;assuming a, b, c, d ,e, and f in memory. Slow code:LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSWd,RdSoftware Scheduling to

13、Avoid Load HazardsFast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSWd,RdCompiler optimizes for performance. Hardware checks for safety.2024/8/1813USTC CS AN HongData Hazard Solution(3):OutofOrder ExecutionnNeed to detect data dependences at run timenNeed of precise exceptions:

14、Outoforder execution, inorder completion Time T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12sub $2, $1,$3 IF ID EX ME WBadd $14, $5, $4 IF ID EX ME WB sw $15, 100($6) IF ID EX ME WB and $12, $2, $3 IF * ID EX ME WBor $13, $6, $2 IF ID EX ME WB2024/8/1814USTC CS AN HongData Hazard Solution(4): Data Speculati

15、onnIn a wideissue processors, e.g. 8 12 instructions per clock cycleLarger than a basic block (5 7 instructions)Multiple branches use multiplebranch prediction (e.g. trace cache)Multiple data dependence chains very hard to execute them in the same clock cyclenValue speculation is primarily used to r

16、esolve data dependences:In the same clock cycleLong latency operations (e.g. load operations)2024/8/1815USTC CS AN HongData Hazard Solution(4): Data SpeculationnWhy is Speculation Useful?Speculation lets all these instruction run in parallel on a superscalar machine.addq $3 $1 $2addq $4 $3 $1addq $5

17、 $3 $2nWhat is Value Prediction?Predict the value of instructions before they are executedCp.lBranch Prediction eliminates the control dependencesPrediction Data are just two values( taken or not taken)lValue Predictioneliminates the data dependencesPrediction Data are taken from a much larger range

18、 of values2024/8/1816USTC CS AN HongData Hazard Solution(4): Data SpeculationnValue Locality: likelihood of a previouslyseen value recurring repeatedly within a storage locationObserved in any storage locations lRegisterslCache memorylMain memoryMost work focussing on value stored in registers to br

19、eak potential data dependences: register value localitynWhy Value Prediction?Results of many instructions can be accurately predicted before they are issued or executed.Dependent instructions are no longer bound by the serialization constraints imposed by data dependences.More parallelism can be exp

20、lored.Prediction of values for dependant instructions can lead to beneficial speculative execution2024/8/1817USTC CS AN Hong冗余指令冗余指令n若将程序执行期间生成的每条静态指令的动态实例进行缓存,若将程序执行期间生成的每条静态指令的动态实例进行缓存,则每条生成结果的动态指令可归为以下三种类型:则每条生成结果的动态指令可归为以下三种类型:新结果指令:首次生成新值的动态指令新结果指令:首次生成新值的动态指令 5%重复结果指令:生成结果与对应静态指令的其它动态实例相同的动重复结

21、果指令:生成结果与对应静态指令的其它动态实例相同的动态指令态指令 80%90%可推导型指令:生成结果能用先前的结果推导出来的动态指令可推导型指令:生成结果能用先前的结果推导出来的动态指令 5%n冗余指令冗余指令重复型指令重复型指令和可推导指令和可推导指令2024/8/1818USTC CS AN HongQuestion: Where does value locality occur?Singlecycle Arithmetic (i.e. addq $1 $2)Singlecycle Logical (i.e bis $1 $2)Multicycle Arithmetic (i.e. mu

22、lq $1 $2)Register Move (i.e. cmov $1 $2)Integer Load (i.e. ldq $1 8($2)Store with base register update FP Multiply FP Add FP MoveFP LoadSomewhat YesNoYesYesNoSomewhat Somewhat YesYesHow often does the same value result from the same instruction twice in a row?Source of Value Locality(Sources of valu

23、e predictability)2024/8/1819USTC CS AN HongnData redundancy: text files with white spaces, empty cells in spreadsheetsnError checkingnProgram constantsnComputed branchesnVirtual function callsnGlue code: allow calling from one compilation unit to anothernAddressability: pointer tables store constant

24、 addresses loaded at runtimenCall contexts: callersaved/callee saved registersnMemory alias resolution: conservative assumptions from compiler regarding aliasingnRegister spill codenSource of Value Locality(Sources of predictability)2024/8/1820USTC CS AN HongI: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,

25、r1,r7Three Generic Data HazardsnWrite After Read (WAR) InstrJ writes operand before InstrI reads itnCalled an “antidependence” by compiler writers.This results from reuse of the name “r1”.nCant happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, a

26、nd Writes are always in stage 52024/8/1821USTC CS AN HongI: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7Three Generic Data HazardsnWrite After Write (WAW) InstrJ writes operand before InstrI writes it.nCalled an “output dependence” by compiler writersThis also results from the reuse of name “r1”.nCan

27、t happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Writes are always in stage 5nWill see WAR and WAW in more complicated pipes2024/8/1822USTC CS AN Hong总结:影响指令级并行性的因素总结:影响指令级并行性的因素nPipeline CPI=Ideal pipeline CPI + Structural stalls + RAW stalls + WAR stalls + WAW stalls +

28、 Control stallsn改进理想的改进理想的CPI:多发射(静态多发射(静态/动态)动态)n克服流水线中的相关性克服流水线中的相关性结构相关:由资源冲突导致的相关结构相关:由资源冲突导致的相关l解决办法:增加资源数据相关:由数据相关:由RAW、WAW、WAR导致的相关导致的相关l解决办法(用软件):编译器静态调度,循环展开,寄存器重命名,软流水(用软件):编译器静态调度,循环展开,寄存器重命名,软流水(用硬件):(用硬件):forwarding技术,寄存器重命名,动态调度的乱序执行技术(记分板,技术,寄存器重命名,动态调度的乱序执行技术(记分板,Tomasulo算法)算法)控制相关:由

29、分支引起的相关控制相关:由分支引起的相关l解决方法:静态/动态预测和推测执行2024/8/1823USTC CS AN Hong总结:数据相关总结:数据相关(又称数据依赖又称数据依赖)在程序的一个基本块中存在的数据相关有以下几种情形:在程序的一个基本块中存在的数据相关有以下几种情形:n真数据依赖真数据依赖:两条指令之间存在数据流,有真正的数据依赖:两条指令之间存在数据流,有真正的数据依赖关系关系RAW(Read After Write)相关:对于指令对于指令i和和j,如果如果(1) 指令指令j使用指令使用指令i产生的结果,则称指令产生的结果,则称指令j与指令与指令i为为RAW相关;或者相关;或

30、者(2) 指令指令j与指令与指令i存在存在RAW相关,而指令相关,而指令k与指令与指令j存在存在RAW相关,则称指令相关,则称指令k与指令与指令i为为RAW相关相关n伪数据依赖伪数据依赖(又称名相关):指令使用的寄存器或存储器称(又称名相关):指令使用的寄存器或存储器称为名。两条指令使用相同名,但它们之间不存在数据流,则为名。两条指令使用相同名,但它们之间不存在数据流,则它们之间是一种伪数据依赖关系,包括两种情形:它们之间是一种伪数据依赖关系,包括两种情形:WAR(Write After Read)相关:对于指令对于指令i和和j,如果指令如果指令i先执行,指令先执行,指令j写的名写的名是指令是指令i读的名,则称读的名,则称指令指令j与指令与指令i为为WAR相关相关(又称反相关,(又称反相关,antidependence)WAW( Write After Write)相关: 对于指令对于指令i和和j,如果指令如果指令i与指令与指令j写相同写相同的的名,则称名,则称指令指令j与指令与指令i为为WAW相关(又称输出相关相关(又称输出相关,outputdependence)2024/8/1824USTC CS AN Hong总结:开发指令级并行性的技术总结:开发指令级并行性的技术2024/8/1825USTC CS AN Hong

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 高等教育 > 研究生课件

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号