算法和并行编程－金锄头文库

资源描述

《算法和并行编程》由会员分享，可在线阅读，更多相关《算法和并行编程（9页珍藏版）》请在金锄头文库上搜索。

1、算法和并行编程 (Algorithms)1. 在下面三个问题中任选一个，给出相应的并行求解算法，具体说明使用的数据结构和分解方法，并分析算法的时间复杂度。a) 图的连通分支(connected components of graphs)b) 图的极大独立集(Maximal independent set)c) 单源/全源最短路( single (all pair) shortest path)。2. 在下面三个问题中任选一个，用MPI, Pthread或OpenMP中的一种编写出程序。a) 矩阵相乘的并行程序b) n计算c) 求数组的最小值或最大值。算法和并行编程(Algorithms)答案

2、1. 在下面三个问题中任选一个，给出相应的并行求解算法，具体说明使用的数据结构和分解方法，并分析算法的时间复杂度。a)图的连通分支(connected components of graphs)答：对图进行深度优先遍历可以找出连通分支：并行求解算法：将图G的邻接矩阵划分为p个部分，每个部分分配给一个进程。第一步，每个进程Pi计算子图Gi的深度优先生成森林。第二步，生成森林按对合并，直到只剩下一个生成森林。合并方法：假设将A合并到B对A中的每条边(u, v),确定B中两个顶点是否处在同一棵树上，如果不是，则将两棵树合并。算法的时间复杂度：B(n2/p) 答： Perform DFS on t

3、he graph to get Connected Components:The n * n adjacency matrix is partitioned into p blocks.Partition the graph across processors and run independent connected component algorithms on each processor. second, spanning forests are merged pairwise until only one spanning forest remains.Each processor

4、can compute its local spanning forest in time B(n/p).Merging is done by embedding a logical tree into the topology. There are log p merging stages, and each takes time 0(n). Thus, the cost due to merging is 0(n log p).During each mergin g stage, spanning forests are sent between nearest neighbors. R

5、ecall that 0(n) edges of the spanning forest are transmitted.The parallel run time of the connected-component algorithm is/广pforest mergingG ( - + (nlogpj?For a cost-optimal formulation p = O(n / log n).b)图的极大独立集(Maximal independent set)答：图的极大独立集算法的思想：Luby 算法：计算一个图的顶点 V 的最大独立集 I：集合I初始设置为空，候选顶点集合C=V

6、一个唯一的随机数分配给C中每一个顶点，如果某个顶点的随机数小于所有与它相邻顶点的随机数, 则将它加入到I中；更新集合C,除去加入到I中的顶点以及与它们相邻的所有顶点重复至C为空并行求解算法：使用三个数组，一个存储进入最大独立集I中的节点，一个存储候选集C, 一个存储随机数R。将候选集 C在p个进程中划分。每个进程为从C中分配给它的顶点产生一个随机数。当所有顶点都产生随机数后，就可以来确定哪些顶点包含在I中。算法的时间复杂度： O(log|V|)答：Maximal independent set 算法的思想：We use three arrays, each of length n - I

7、, which stores nodes in MIS, C, which stores the candidate set, and R, the random numbers.Partition C across p processors. Each processor generates the corresponding values in the R array, and from this, computes which candidate vertices can enter MIS.The C array is updated by deleting all the neigh

8、bors of vertices that entered MIS.The performance of this algorithm is dependent on the structure of the graph.Parallel MIS algorithms (Lubys algorithm for graph coloring). Luby 算法： each node is in the candidate set C. Each node generates a (unique) random number and communicates it to its neighbors

9、. If a nodes number exceeds that of all its neighbors, it joins set I. All of its neighbors are removed from C. This process continues until C is empty.On average, this algorithm converges after O(log|V|) such steps.c)单源/全源最短路径(single (all pair) shortest path)。答：单源最短路径 Dijkstra 算法与最小生成树算法Prim十分相似，不同

10、点是Dijkstra存储的数组lu,它是从顶点s经Vt中的顶点到达点u 的最小成本，而Prim存储du，它是连接Vt中的某个顶点与u的最小成本。并行化算法的思想：串行的dijkstra算法有两层N次循环，第一层循环必须得按次序执行，不可并行。那么只有将第二层循环并行。第二层循环有两个N次的循环，第一次循环是求dist数组的最小值。复杂度为O (N) 第二次循环是更新dist数组的值，复杂度也是O (N)。输入：数据u(存放在单元B(l)中)；输出：将u广播到数组B的所有单元中去。Begin(1) B(1)-u；(2) for 0 to -1 do(3) for each j:2i+1 WjW

11、2i+1 pardoB(j)B(j-2j)EndforEndforEnd.时间复杂度为O(n2p+nlogp), O (p)个处理器。并行的 dijkstra 算法复杂度为 O (N*(N/p+logp+N/p)=O (N*N/p+Nlogp)答： single shortest path Dijkstra 算法：we use multiple queues, one for each processor. Each processor builds its priority queue only using its own vertices.When process Pi extracts

12、the vertex u 丘 Vi, it sends a message to processes that store vertices adjacent to u.Process Pj, upon receiving this message, sets the value of lv stored in its priority queue to minlv,lu + w(u,v).If a shorter path has been discovered to node v, it is reinserted back into the local priority queue.Th

13、e algorithm terminates only when all the queues become empty. 输入：数据u(存放在单元B(l)中)；输出：将u广播到数组B的所有单元中去。Begin(1) B(1)-u；(2) for 0 to -1 do(3) for each j:2i+1 WjW2i+1 pardoEndforEndforEnd.并行的 dijkstra 算法复杂度为 O (N*(N/p+logp+N/p)=O (N*N/p+Nlogp)答：全源最短路径二维任务划分方法的 Floyd 并行算法：输入:顶点数目n;图的邻接矩阵array;初始化后的路径矩阵pat

14、h到主机(0号进程)(1) 如果是主机(0号进程)，根据数据规模n和处理机数目p来划分任务，按照棋盘式均匀的分解的方法, 发送array和path的数据块到相应的处理机，各处理机在本地存储为jobarra y和jobpath。(2) for ( k = 0; k n; k +) 循环开始。(3) 各个处理器计算当前需要广播的第k行array所在的处理机号。如果在本地，那么从jobarray矩阵中选取出相应的一行数据，存储在temparrayA ，并向同一列处理机广播。各个处理器计算当前需要广播的第k列array和path所在的处理机号。如果在本地，那么从jobarray和jobpath矩阵

15、中选取出相应的一列数据，分别存储在 tem parrayB 和 temppath ，并向同一行处理机广播。(4) 各个处理器执行并行计算:for ( i = 0; i ; i +)for ( j = 0; j ( temparrayBi + temparrayAj ) )jobarrayij = temparrayBi + temparrayA j; pathij = temppathi;(5)跳(2)直至k的循环结束转(6); 各个处理器返回运算结果jobarray和jobpath两个矩阵到主机(0号进程)，主机在将其归并到本地的array 和 path 矩阵。释放空间。并行Floyd算法复杂度分析输出:变换后的矩阵array和path答： all pair shortest path Floyed 算法Floyds Algorithm: Parallel Formulation Using 2-D Block Mapping：Matrix D(k) is divided into p blocks of size (n / 丿p) x (nEach processor updates i

展开阅读全文

算法和并行编程

最新文档