《matrix-vector multiplication》由会员分享,可在线阅读,更多相关《matrix-vector multiplication(119页珍藏版)》请在金锄头文库上搜索。
1、Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Parallel Programming in C with MPI and OpenMPMichael J. QuinnCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Chapter 8Matrix-vector MultiplicationCopyright The McGraw-H
2、ill Companies, Inc. Permission required for reproduction or display.Chapter ObjectivesnReview matrix-vector multiplicaitonnPropose replication of vectorsnDevelop three parallel programs, each based on a different data decompositionCopyright The McGraw-Hill Companies, Inc. Permission required for rep
3、roduction or display.OutlinenSequential algorithm and its complexitynDesign, analysis, and implementation of three parallel programsuRowwise block stripeduColumnwise block stripeduCheckerboard blockCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.210432114312
4、30201341=Sequential Algorithm221504135 9419210413413319231314141114134132114411333171419211913414312331330112411011113413020Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Storing VectorsnDivide vector elements among processesnReplicate vector elementsnVecto
5、r replication acceptable because vectors have only n elements, versus n2 elements in matricesCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Rowwise Block Striped MatrixnPartitioning through domain decompositionnPrimitive task associated withuRow of matrixuE
6、ntire vectorCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Phases of Parallel AlgorithmRow i of AbRow i of Ab ciInner product computationRow i of AbcAll-gather communicationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or di
7、splay.Agglomeration and MappingnStatic number of tasksnRegular communication pattern (all-gather)nComputation time per task is constantnStrategy:uAgglomerate groups of rowsuCreate one task per MPI processCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Comple
8、xity AnalysisnSequential algorithm complexity: (n2)nParallel algorithm computational complexity: (n2/p)nCommunication complexity of all-gather: (log p + n)nOverall complexity: (n2/p + log p)Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Isoefficiency Analys
9、isnSequential time complexity: (n2)nOnly parallel overhead is all-gatheruWhen n is large, message transmission time dominates message latencyuParallel communication time: (n)nn2 Cpn n Cp and M(n) = n2nSystem is not highly scalableCopyright The McGraw-Hill Companies, Inc. Permission required for repr
10、oduction or display.Block-to-replicated TransformationCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.MPI_AllgathervCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.MPI_Allgathervint MPI_Allgatherv (void *send_buffer,
11、int send_cnt,MPI_Datatype send_type,void *receive_buffer,int *receive_cnt,int *receive_disp,MPI_Datatype receive_type,MPI_Comm communicator)Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.MPI_Allgatherv in ActionCopyright The McGraw-Hill Companies, Inc. Perm
12、ission required for reproduction or display.Function create_mixed_xfer_arraysnFirst arrayuHow many elements contributed by each processuUses utility macro BLOCK_SIZEnSecond arrayuStarting position of each process blockuAssume blocks in process rank orderCopyright The McGraw-Hill Companies, Inc. Perm
13、ission required for reproduction or display.Function replicate_block_vectornCreate space for entire vectornCreate “mixed transfer” arraysnCall MPI_AllgathervCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Function read_replicated_vectornProcess p-1uOpens fil
14、euReads vector lengthnBroadcast vector length (root process = p-1)nAllocate space for vectornProcess p-1 reads vector, closes filenBroadcast vectorCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Function print_replicated_vectornProcess 0 prints vectornExact
15、call to printf depends on value of parameter datatypeCopyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Run-time Expressionn: inner product loop iteration timenComputational time: nn/pnAll-gather requires log p messages with latency nTotal vector elements trans
16、mitted: (2log p -1) / 2log p nTotal execution time: nn/p + log p + (2log p -1) / (2log p )Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Benchmarking ResultsExecution Time (msec)pPredictedActualSpeedupMflops163.463.41.0031.6232.432.71.9461.2322.322.72.7988.1417.017.83.56112.4514.115.24.16131.6612.013.34.76150.4710