西安交大并行计算理论赵银亮课件第五章

上传人:豆浆 文档编号:6649957 上传时间:2017-08-08 格式:PPT 页数:68 大小:558.50KB
返回 下载 相关 举报
西安交大并行计算理论赵银亮课件第五章_第1页
第1页 / 共68页
西安交大并行计算理论赵银亮课件第五章_第2页
第2页 / 共68页
西安交大并行计算理论赵银亮课件第五章_第3页
第3页 / 共68页
西安交大并行计算理论赵银亮课件第五章_第4页
第4页 / 共68页
西安交大并行计算理论赵银亮课件第五章_第5页
第5页 / 共68页
点击查看更多>>
资源描述

《西安交大并行计算理论赵银亮课件第五章》由会员分享,可在线阅读,更多相关《西安交大并行计算理论赵银亮课件第五章(68页珍藏版)》请在金锄头文库上搜索。

1、Analytical Modeling of Parallel Systems,Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar,To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.,Topic Overview,Sources of Overhead in Parallel Programs Performance Metrics for Parallel Systems Effect of Granularity o

2、n Performance Scalability of Parallel Systems Minimum Execution Time and Minimum Cost-Optimal Execution Time Asymptotic Analysis of Parallel Programs Other Scalability Metrics,Analytical Modeling - Basics,A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a functio

3、n of input size). The asymptotic runtime of a sequential program is identical on any serial platform. The parallel runtime of a program depends on the input size, the number of processors, and the communication parameters of the machine. An algorithm must therefore be analyzed in the context of the

4、underlying platform. A parallel system is a combination of a parallel algorithm and an underlying platform.,Analytical Modeling - Basics,A number of performance measures are intuitive. Wall clock time - the time from the start of the first processor to the stopping time of the last processor in a pa

5、rallel ensemble. But how does this scale when the number of processors is changed of the program is ported to another machine altogether? How much faster is the parallel version? This begs the obvious followup question - whats the baseline serial version with which we compare? Can we use a suboptima

6、l serial program to make our parallel program look Raw FLOP count - What good are FLOP counts when they dont solve a problem?,Sources of Overhead in Parallel Programs,If I use two processors, shouldnt my program run twice as fast? No - a number of overheads, including wasted computation, communicati

7、on, idling, and contention cause degradation in performance.,The execution profile of a hypothetical parallel program executing on eight processing elements. Profile indicates times spent performing computation (both essential and excess), communication, and idling.,Sources of Overheads in Parallel

8、Programs,Interprocess interactions: Processors working on any non-trivial parallel problem will need to talk to each other. Idling: Processes may idle because of load imbalance, synchronization, or serial components. Excess Computation: This is computation not performed by the serial version. This m

9、ight be because the serial algorithm is difficult to parallelize, or that some computations are repeated across processors to minimize communication.,Performance Metrics for Parallel Systems: Execution Time,Serial runtime of a program is the time elapsed between the beginning and the end of its exec

10、ution on a sequential computer. The parallel runtime is the time that elapses from the moment the first processor starts to the moment the last processor finishes execution. We denote the serial runtime by and the parallel runtime by TP .,Performance Metrics for Parallel Systems: Total Parallel Over

11、head,Let Tall be the total time collectively spent by all the processing elements. TS is the serial time. Observe that Tall - TS is then the total time spend by all processors combined in non-useful work. This is called the total overhead. The total time collectively spent by all the processing elem

12、ents Tall = p TP (p is the number of processors). The overhead function (To) is therefore given by To = p TP - TS (1),Performance Metrics for Parallel Systems: Speedup,What is the benefit from parallelism? Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the tim

13、e required to solve the same problem on a parallel computer with p identical processing elements.,Performance Metrics: Example,Consider the problem of adding n numbers by using n processing elements. If n is a power of two, we can perform this operation in log n steps by propagating partial sums up

14、a logical binary tree of processors.,Performance Metrics: Example,Computing the globalsum of 16 partial sums using 16 processing elements . ji denotes the sum of numbers with consecutive labels from i to j.,Performance Metrics: Example (continued),If an addition takes constant time, say, tc and comm

15、unication of a single word takes time ts + tw, we have the parallel time TP = (log n)We know that TS = (n)Speedup S is given by S = (n / log n),Performance Metrics: Speedup,For a given problem, there might be many serial algorithms available. These algorithms may have different asymptotic runtimes a

16、nd may be parallelizable to different degrees. For the purpose of computing speedup, we always consider the best sequential program as the baseline.,Performance Metrics: Speedup Example,Consider the problem of parallel bubble sort. The serial time for bubblesort is 150 seconds. The parallel time for

17、 odd-even sort (efficient parallelization of bubble sort) is 40 seconds. The speedup would appear to be 150/40 = 3.75. But is this really a fair assessment of the system? What if serial quicksort only took 30 seconds? In this case, the speedup is 30/40 = 0.75. This is a more realistic assessment of the system.,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 行业资料 > 其它行业文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号