《SolvingLinearSystems》由会员分享,可在线阅读,更多相关《SolvingLinearSystems(70页珍藏版)》请在金锄头文库上搜索。
1、Parallel Programming in C with MPI and OpenMP,Michael J. Quinn,Chapter 12,Solving Linear Systems,Outline,Terminology Back substitution Gaussian elimination Jacobi method Conjugate gradient method,Terminology,System of linear equations Solve Ax = b for x Special matrices Symmetrically banded Upper tr
2、iangular Lower triangular Diagonally dominant Symmetric,Symmetrically Banded,4,2,-1,0,0,0,3,-4,5,6,0,0,1,6,3,2,4,0,0,2,-2,0,9,2,0,0,7,3,8,7,0,0,0,4,0,2,Semibandwidth 2,Upper Triangular,4,2,-1,5,9,2,0,-4,5,6,0,-4,0,0,3,2,4,6,0,0,0,0,9,2,0,0,0,0,8,7,0,0,0,0,0,2,Lower Triangular,4,0,0,0,0,0,0,0,0,0,0,0
3、,5,4,3,0,0,0,2,6,2,3,0,0,8,-2,0,1,8,0,-3,5,7,9,5,2,Diagonally Dominant,19,0,2,2,0,6,0,-15,2,0,-3,0,5,4,22,-1,0,4,2,3,2,13,0,-5,5,-2,0,1,16,0,-3,5,5,3,5,-32,Symmetric,3,0,2,2,0,6,0,7,4,3,-3,5,5,4,0,-1,0,4,2,3,-1,9,0,-5,0,-3,0,0,5,5,6,5,4,-5,5,-3,Back Substitution,Used to solve upper triangular system
4、 Tx = b for x Methodology: one element of x can be immediately computed Use this value to simplify system, revealing another element that can be immediately computed Repeat,Back Substitution,1x0,+1x1,1x2,+4x3,8,=, 2x1,3x2,+1x3,5,=,2x2, 3x3,0,=,2x3,4,=,Back Substitution,1x0,+1x1,1x2,+4x3,8,=, 2x1,3x2
5、,+1x3,5,=,2x2, 3x3,0,=,2x3,4,=,x3 = 2,Back Substitution,1x0,+1x1,1x2,0,=, 2x1,3x2,3,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,+1x1,1x2,0,=, 2x1,3x2,3,=,2x2,6,=,2x3,4,=,x2 = 3,Back Substitution,1x0,+1x1,3,=, 2x1,12,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,+1x1,3,=, 2x1,12,=,2x2,6,=,2x3,4,=,x1 = 6,Back S
6、ubstitution,1x0,9,=, 2x1,12,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,9,=, 2x1,12,=,2x2,6,=,2x3,4,=,x0 = 9,Pseudocode,for i n 1 down to 1 dox i b i / a i, i for j 0 to i 1 dob j b j x i a j, i endfor endfor,Time complexity: (n2),Data Dependence Diagram,We cannot execute the outer loop in parallel. We
7、can execute the inner loop in parallel.,Row-oriented Algorithm,Associate primitive task with each row of A and corresponding elements of x and b During iteration i task associated with row j computes new value of bj Task i must compute xi and broadcast its value Agglomerate using rowwise interleaved
8、 striped decomposition,Interleaved Decompositions,Rowwise interleaved striped decomposition,Columnwise interleaved striped decomposition,Complexity Analysis,Each process performs about n / (2p) iterations of loop j in all A total of n -1 iterations in all Computational complexity: (n2/p) One broadca
9、st per iteration Communication complexity: (n log p),Column-oriented Algorithm,Associate one primitive task per column of A and associated element of x Last task starts with vector b During iteration i task i computes xi, updates b, and sends b to task i -1 In other words, no computational concurren
10、cy Agglomerate tasks in interleaved fashion,Complexity Analysis,Since b always updated by a single process, computational complexity same as sequential algorithm: (n2) Since elements of b passed from one process to another each iteration, communication complexity is (n2),Comparison,Message- passing
11、time dominates,Computation time dominates,Gaussian Elimination,Used to solve Ax = b when A is dense Reduces Ax = b to upper triangular system Tx = c Back substitution can then solve Tx = c for x,Gaussian Elimination,4x0,+6x1,+2x2, 2x3,=,8,2x0,+5x2, 2x3,=,4,4x0, 3x1, 5x2,+4x3,=,1,8x0,+18x1, 2x2,+3x3,
12、=,40,Gaussian Elimination,4x0,+6x1,+2x2, 2x3,=,8,+4x2, 1x3,=,0,+3x1, 3x2,+2x3,=,9,+6x1, 6x2,+7x3,=,24, 3x1,Gaussian Elimination,4x0,+6x1,+2x2, 2x3,=,8,+4x2, 1x3,=,0,1x2,+1x3,=,9,2x2,+5x3,=,24, 3x1,Gaussian Elimination,4x0,+6x1,+2x2, 2x3,=,8,+4x2, 1x3,=,0,1x2,+1x3,=,9,3x3,=,6, 3x1,Iteration of Gaussi
13、an Elimination,Numerical Stability Issues,If pivot element close to zero, significant roundoff errors can result Gaussian elimination with partial pivoting eliminates this problem In step i we search rows i through n-1 for the row whose column i element has the largest absolute value Swap (pivot) th
14、is row with row i,Implementing Partial Pivoting,Without partial pivoting,With partial pivoting,Row-oriented Parallel Algorithm,Associate primitive task with each row of A and corresponding elements of x and b A kind of reduction needed to find the identity of the pivot row Tournament: want to determ
15、ine identity of row with largest value, rather than largest value itself Could be done with two all-reductions MPI provides a simpler, faster mechanism,MPI_MAXLOC, MPI_MINLOC,MPI provides reduction operators MPI_MAXLOC, MPI_MINLOC Provide datatype representing a (value, index) pair,MPI (value,index)
16、 Datatypes,Example Use of MPI_MAXLOC,struct double value;int index; local, global; . local.value = fabs(aji); local.index = j; . MPI_Allreduce (,Second Communication per Iteration,Communication Complexity,Complexity of tournament: (log p) Complexity of broadcasting pivot row: (n log p) A total of n - 1 iterations Overall communication complexity: (n2 log p),Isoefficiency Analysis,