大型稀疏矩阵的LU分解及特征值求解Grusoftcom

上传人:汽*** 文档编号:569763433 上传时间:2024-07-30 格式:PPT 页数:39 大小:3.48MB
返回 下载 相关 举报
大型稀疏矩阵的LU分解及特征值求解Grusoftcom_第1页
第1页 / 共39页
大型稀疏矩阵的LU分解及特征值求解Grusoftcom_第2页
第2页 / 共39页
大型稀疏矩阵的LU分解及特征值求解Grusoftcom_第3页
第3页 / 共39页
大型稀疏矩阵的LU分解及特征值求解Grusoftcom_第4页
第4页 / 共39页
大型稀疏矩阵的LU分解及特征值求解Grusoftcom_第5页
第5页 / 共39页
点击查看更多>>
资源描述

《大型稀疏矩阵的LU分解及特征值求解Grusoftcom》由会员分享,可在线阅读,更多相关《大型稀疏矩阵的LU分解及特征值求解Grusoftcom(39页珍藏版)》请在金锄头文库上搜索。

1、2013.7.20大型稀疏矩阵的LU分解及特征值求解陈英时2016.1.9稀疏矩阵求解的广泛应用矩阵求解是数值计算的核心1稀疏矩阵求解是数值计算的关键之一偏微分方程,积分方程,特征值,优化万阶以上densematrix不可行稀疏矩阵求解往往是资源瓶颈时间瓶颈,内存,外存等瓶颈直接法基于高斯消元法,即计算A的LU分解。A通常要经过一系列置换排序,可归并为左置换矩阵P,右置换矩阵Q。基本步骤如下:1)符号分析:得到置换排序矩阵P Q2)数值分解:3)前代回代:I.S.Duff, A.M.Erisman, and J.K.Reid. Direct Methods for Sparse Matrice

2、s. London:Oxford Univ. Press,1986.J.J.Dongarra,I.S.Duff, . Numerical Linear Algebra for High-Performance Computers.G.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996稀疏矩阵复杂、多变基本参数对称性,稀疏性,非零元分布敏感性,病态矩阵条件数格式多变Harwell-BoeingExchangeFormat。测试集Harwell-BoeingSparseMatrix

3、CollectionUFsparsematrixcollection求解器的飞速发展BBMAThttp:/www.cise.ufl.edu/research/sparse/matrices/Simon/bbmat.html38744阶,分解后元素超过四千万.1988巨型机cray-2上1000秒20034Gumfpack432.6秒420063.0GGSS1.215秒20123.0G4核GSS2.34秒2014i78核GSS2.41秒硬件的发展CPU,内存等稀疏算法逐渐成熟稀疏排序,密集子阵multifrontal,supernodal数学库BLAS,LAPACKlib.org稀疏LU分解算法

4、的关键根据符号分析,数值分解算法的不同,直接法有以下几类15:1)generaltechnique(通用方法):主要采用消去树等结构进行LU分解。适用于非常稀疏的非结构化矩阵。2)frontalscheme(波前法)LU分解过程中,将连续多行合并为一个密集子块(波前),对这个子块采用BLAS等高效数学库进行分解。3)multifrontalmethod(多波前法)将符号结构相同的多行(列)合并为多个密集子块,以这些密集子块为单位进行LU分解。这些子块的生成,消去,装配,释放等都需要特定的数据结构及算法。1零元是动态的概念,需要稀疏排序稀疏排序,减少注入元(fill-in)2密集子阵密集子阵稀疏

5、LU分解(不考虑零元零元的)LU分解稀疏排序 排序算法的作用是减少矩阵LU分解过程中产生的注入元,求解矩阵的最优排序方法是个NP完全问题(不能够在合理的时间内进行求解),对具体矩阵而言,目前也没有方法或指标来判定哪种算法好。因此实测不同的算法,对比产生的注入元,是确定排序算法的可靠依据。 主要的排序算法有最小度排序(MMD,minimum degree ordering algorithm)和嵌套排序(nested dissection)两种。矩阵排序方面相应的成熟软件库有AMD12 、COLAMD、METIS13等。Yannokakis M. Computing the minimum fi

6、ll-in in NP-Complete. SIAM J.Algebraic Discrete Methods, 1981, 2:7779近似最小度排序算法商图 近似最小度排序(AMD,Approximate Minimum Degree OrderingAlgorithm)算法于1996年左右由Patrick R. Amestoy等学者提出AMESTOY, P. R., DAVIS, . 1996a. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Applic. 17, 4, 886 905.为何

7、需要密集子块(DenseMatrix)http:/eigen.tuxfamily.org/index.php?title=Benchmark多波前法(multifrontal)简介发展DuffandRaid2J.W.H.Liu等分析,改进3T.A.Davis开发UMFPACK4基本算法利用稀疏矩阵的特性,得到一系列密集子阵(波前)。将LU分解转化为对这些波前的装配,消去,更新等操作。多波前法的优点波前是densematrix,可直接调用高性能库(BLAS等)密集子阵可以节省下标存储提高并行性目前主要的求解器UMFPACK,GSS,MUMPS,PARDISO,WSMP,HSLMA41等LU分解形

8、成frontal10阶矩阵。蓝点代表非零元。红点表示分解产生的注入元(fill-in)Frontal划分a, bcd e f,gh,i,jFrontal的装配,消去,更新过程a c hc h cf,gbeh,i,jac,g,hg h b e je j f,g,hg g h e,i,ji j h,i, ji i j jdd,i,ji j 消去树消去树消去树是符号分析的关键结构,其每个节点对应于矩阵的一列(行),该节点只与其父节点相连,父节点定义如下:J.W.H.Liu. The Role of Elimination Trees in Sparse Factorization. SIAM J.M

9、atrix Anal.Applic.,11(1):134-172,1990.J.W.H.Liu. Elimination structures for unsymmetric sparse LU factors. SIAM J. Matrix Anal. Appl. 14, no. 2, 334-352, 1993.GSS简介标准C开发,适用于各种平台比INTELPARDISO更快,更稳定CPU/GPU混合计算突破32位Windows内存限制32个用户参数支持用户定制模块高校,研究所中国电力科学研究院香港大学南京大学河海大学中国石油大学电子科技大学三峡大学山东大学userdetailWhy t

10、hey choose GSScrosslightIndustryleaderinTCADsimulationHybridGPU/CPUversion,morethan2timesfasterthanPARDISO,MUMPSandothersparsesolvers.soilvisionThemosttechnicallyadvancedsuiteof1D/2D/3DgeotechnicalsoftwareMuchfasterthantheirownsparsesolver.FEM consultingTheleadingresearchteamsintheareaoftheFiniteEle

11、mentMethodsince1967GSSisfasterthanPARDISOandprovidemanycustommodule.GSCADLeadingbuildingsoftwareinChinaGSSprovideauser-specificmoduletodealwithill-conditionedmatrix.ICAROSAglobalturnkeygeospatialmappingserviceproviderandstateoftheartphotogrammetrictechnologiesdeveloper.GSSisfasterthanPARDISO.Alsopro

12、videsometechnicalhelp.EPRIChinaElectricPowerResearchInstitute3-4timesfasterthanKLU他们为什么选择GSS?GSS加权消去树工作量消去树基于消去树结构来计算数值分解的工作量,将LU分解划分为多个独立的任务,为高效并行计算奠定基础。GSS-双阈值列选主元算法GSS-CPU/GPU混合计算1) After divides LU factorization into many parallel tasks, GSS will use adaptive strategy to run these tasks in diffe

13、rent hardware (CPU, GPU ). That is, if GPU have high computing power, then it will run more tasks automatically. If CPU is more powerful, then GSS will give it more tasks. 2) And furthermore, if CPU is free (have finished all tasks) and GPU still run a task, then GSS will divide this task to some sm

14、all tasks then assign some child-task to CPU, then CPU do computing again. So get higher parallel efficiency. 3) GSS will also do some testing to get best parameters for different hardware. GSS求解频域谱元方法生成的矩阵矩阵较稠密约40万阶,15亿个非零元 GSS约需15G内存需要求解多个右端项32个右端项 需80秒?进一步优化CPU/GPU混合计算 数值分解约35秒重复利用符号分析结果根据矩阵的特殊结构

15、来进一步减少非零元估计80/4=20秒一次LU分解符号分解时间50秒 数值分解时间46秒回代2.5秒对比测试ThetestmatricesareallfromtheUFsparsematrixcollectionPARDISOisfromIntelComposerXE2013SP1.GSS2.4useCPU-GPUhybridcomputing.ThetestingCPUisINTELCorei7-4770(3.4GHz)with24Gmemory.ThegraphicscardisASUSGTX780(withcomputecapability3.5).NVIDIACUDAToolkitis

16、5.5.TheoperatingsystemisWindows764.Bothsolversusedefaultparameters.Forlargematricesneedlongtimecomputing,GSS2.4isNearly3timesfasterthanPARDISO.Formatricesneedshorttimecomputing,PARDISOisfasterthanGSS.OnereasonisthatcomplexsynchronizationbetweenCPU/GPUdoneedsomeextratime.大型稀疏矩阵的特征值求解重视A 为全过程动态仿真程序中大规

17、模稀疏矩阵为特征值,x为对应的特征向量150Yearsoldandstillalive:eigenproblems1997-byHenkA.vanderVorst,GeneH.Golub稀疏LU分解-理论上即高斯消元法稀疏特征值趋近于纯数学代数特征值问题J.H.WilkinsonMatrixAlgorithms|EigenSystems.G.W.StewartG.H.Golob, C.F.Van loan. Matrix Computations. The Johns Hopkins University Press. 1996Templates for the Solution of Alg

18、ebraic Eigenvalue Problems: A Practical Guide. Zhaojun Bai , . 2000幂迭代-Poweriteration幂迭代-一个形象的解释瑞利商迭代-RayleighQuotientiteration子空间迭代Krylov子空间Arnoldi迭代Arnoldi迭代的基本算法ARPACKwww.caam.rice.edu/software/ARPACK/ON RESTARTING THE ARNOLDI METHOD FOR LARGENONSYMMETRIC EIGENVALUE PROBLEMS, MORGANR.B.Lehoucq.An

19、alysisandImplementationofanImplicitlyRestartedIteration.PhDthesisArnoldi迭代求解特征值Krylov分解-基于酉相似变换(unitarysimilarity)重视基于酉相似变换的分解具有后向稳定性-TemplatesfortheSolutionofAlgebraicEigenvalueProblems:APracticalGuideP.173标准Arnoldi分解广义Krylov分解其中Q为酉矩阵,且可以连续叠加MatrixAlgorithms|EigenSystems.G.W.StewartP.309实测更效率实测迭代次数

20、,运行时间都减少约1/3。(与ARPACK对比)Krylov-schur分解的优点Deflation操作的基本思路也类似,更加复杂。易于挑选ritz值作为implicit shift易于Deflation(Lock+Purge)Schur分解将任何一个矩阵归约为上三角矩阵,对角线即为该矩阵的特征值;并且在这条对角线上,特征值可以通过酉相似变换来任意排列。也就是说,在生成Rayleigh矩阵,并计算出所有的ritz值之后,可以把需要的Ritz值排到前面,而不需要的Ritz值排到后面,重启之后,只有挑出来的Ritz值出现在序列中。G.W.Stewart,AKrylovSchuralgorithmf

21、orlargeeigenproblems,SIAMJ.MatrixAnal.Appl.,23(2001),pp.601614.Krylov-schur及其重启考虑重启后,B矩阵更加复杂,如右图所示包含重启的Krylov-Schur分解算法Matrix Algorithms | EigenSystems. G.W.Stewart P329-330收敛速度更快经多个实际算例验证,其速度明显快于目前通用的ARPACK,一般迭代次数仅为ARPACK的60%-70%。Why?最新算法SubspaceiterationwithapproximatespectralprojectionFEASTASASUB

22、SPACEITERATIONEIGENSOLVERACCELERATEDBYAPPROXIMATESPECTRALPROJECTION-P.TAK,P.TANG,E.POLIZZI可求出复平面内指定区域内的所有特征值主要用于对称矩阵,需推广到非对称矩阵基于cauch积分的spectralprojectionshift-invert变换标准的shift-invert变换Matrixtransformationsforcomputingrightmosteigenvaluesoflargesparsenon-symmetriceigenvalueproblems-K.MEERBERGEN,D.RO

23、OSECayley变换Matrixtransformationsforcomputingrightmosteigenvaluesoflargesparsenon-symmetriceigenvalueproblems-K.MEERBERGEN,D.ROOSECayley变换Cayley变换特性1平行于虚轴的直线映射到单位圆2该直线左侧的点被映射到单位圆内部3该直线右侧的点被映射到单位圆外部Filterpolynomial-MatrixAlgorithms|EigenSystems.G.W.StewartP.317Aleksei Nikolaevich Krylov(18631945)showe

24、d in 1931 how to use sequences of the form b, Ab, A2b, . . . to construct the characteristic polynomial of a matrix. Krylov was a Russian applied mathematician whose scientific interests arose from his early training in naval science that involved the theories of buoyancy, stability, rolling and pit

25、ching, vibrations, and compass theories. Krylov served as the director of the PhysicsMathematics Institute of the Soviet Academy of Sciences from 1927 until 1932, and in 1943 he was awarded a “state prize” for his work on compass theory. Krylov was made a “hero of socialist labor,” and he is one of

26、a few athematicians to have a lunar feature named in his honoron the moon there is the “Crater Krylov.”Walter Edwin Arnoldi(19171995)was an American engineer who published this technique in 1951, not far from the time that Lanczoss algorithm emerged. Arnoldi received his undergraduate degree in mech

27、anical engineering from Stevens Institute of Technology, Hoboken, New Jersey, in 1937 and his MS degree at Harvard University in 1939. He spent his career working as an engineer in the Hamilton Standard Division of the United Aircraft Corporation where he eventually became the divisions chief resear

28、cher. He retired in 1977. While his research concerned mechanical and aerodynamic properties of aircraft and aerospace structures, Arnoldis name is kept alive by his orthogonalization procedure.附一附二参考文献1NumericalAnalysis.RainerKress.Springer-Verlag.19912I.S.Duff,A.M.Erisman,andJ.K.Reid.DirectMethods

29、forSparseMatrices.London:OxfordUniv.Press,1986.3J.W.H.Liu.TheMultifrontalMethodforSparseMatrixSolution:TheoryandPractice.SIAMRev.,34(1992),pp.82-109.4T.A.Davis.Acolumnpre-orderingstrategyfortheunsymmetric-patternmultifrontalmethod,ACMTrans.Math.Software,vol30,no.2,pp.165-195,2004.5N.J.Higham.Accurac

30、yandStabilityofNumericalAlgorithms.SIAM,20026G.H.Golob,C.F.Vanloan.MatrixComputations.TheJohnsHopkinsUniversityPress.19967J.W.H.Liu.TheMultifrontalMethodforSparseMatrixSolution:TheoryandPractice.SIAMRev.,34(1992),pp.82-109.8FastPageRankComputationviaaSparseLinearSystem(ExtendedAbstract)GiannaM.DelCo

31、rso,AntonioGull,FrancescoRomani.9Y.Saad,IterativeMethodsforSparseLinearSystems,PWS,Boston,199610Y.S.Chen*,SimonLi.ApplicationofMultifrontalMethodforDoubly-BorderedSparseMatrixinLaserDiodeSimulator.NUSOD,200411陈英时吴文勇等.采用多波前法求解大型结构方程组.建筑结构,2007年09期12宋新立,陈英时等.电力系统全过程动态仿真中大型稀疏线性方程组的分块求解算法非常感谢各位老师,同学!MAIL:QQ:

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 医学/心理学 > 基础医学

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号