高性能数据挖掘技术及其应用课件

上传人:M****1 文档编号:567429209 上传时间:2024-07-20 格式:PPT 页数:71 大小:3.91MB
返回 下载 相关 举报
高性能数据挖掘技术及其应用课件_第1页
第1页 / 共71页
高性能数据挖掘技术及其应用课件_第2页
第2页 / 共71页
高性能数据挖掘技术及其应用课件_第3页
第3页 / 共71页
高性能数据挖掘技术及其应用课件_第4页
第4页 / 共71页
高性能数据挖掘技术及其应用课件_第5页
第5页 / 共71页
点击查看更多>>
资源描述

《高性能数据挖掘技术及其应用课件》由会员分享,可在线阅读,更多相关《高性能数据挖掘技术及其应用课件(71页珍藏版)》请在金锄头文库上搜索。

1、刘刘 莹莹 高性能数据挖掘技术及其应用高性能数据挖掘技术及其应用简介简介n1999/07,北京大学,北京大学, 计算机科学与技术,学士计算机科学与技术,学士n2001/12,美国西北大学美国西北大学 (Northwestern University), 计算机工程,计算机工程, 硕士硕士n2005/06,美国西北大学美国西北大学(Northwestern University), 计算机工程,计算机工程, 博士博士n2005/06 2005/11,助理研究员,美国西北大学助理研究员,美国西北大学n2006/01 今,副教授,中国科学院研究生院信息科副教授,中国科学院研究生院信息科学与工程学院,

2、虚拟经济与数据科学研究中心学与工程学院,虚拟经济与数据科学研究中心2024/7/202高性能数据挖掘技术及其应用课件科研经历科研经历n美国国家航空航天局 (NASA): Mass Storage Performance Information Systemn美国能源部(DOE):Scientific Data Management Integrated Software Infrastructure CenternIntel公司:Characterizing Scalable Data Mining Kernels/Primitives on SMPsn美国国家科学基金(NSF): High-

3、Performance Techniques, Designs and Implementation of Software Infrastructure for Change Detection and Mining (IIS-0536994)2024/7/203高性能数据挖掘技术及其应用课件科研经历科研经历n负责中国人民银行横向课题个人信用评分系统研究n主持自然科学基金创新群体项目子课题海量数据的挖掘技术的研究n主持自然科学基金重点项目子课题可信软件过程的基本属性和度量模型n主持教育部留学归国人员启动基金基于传感器网络的交通数据流挖掘n主持中科院研究生院院长基金基于效用的数据挖掘理论与技术

4、的研究2024/7/204高性能数据挖掘技术及其应用课件科研成果科研成果n大规模科学模拟计算中的高性能数据挖掘天体物理模拟中的聚类算法HOP的并行方案适用于超大规模的科学模拟计算中,取得了非常好的加速比被美国圣地亚哥超级计算中心(SDSC)使用n可扩展的数据挖掘算法的性能评估可扩展的数据挖掘算法的性能评估发布了NU-Minebench,第一个数据挖掘算法的基准组(benchmark suite),被下载1666次(2005/06/15 今)被Intel公司使用 2024/7/205高性能数据挖掘技术及其应用课件提纲提纲n数据挖掘简介n高性能(并行/分布式)数据挖掘n应用实例介绍天体模拟(cos

5、mological simulation)天文(astronomy)航天(space operation)生态系统(ecosystem)生物信息学(bioinformatics)n总结2024/7/206高性能数据挖掘技术及其应用课件数据挖掘数据挖掘n自动的、从”海量”数据中挖掘出隐藏的、潜在的、有价值的知识的技术n挖掘的结果(知识)是用户感兴趣的,管理决策支持系统n数据挖掘技术的特点海量数据从历史的数据中自动寻找高效可扩展性好模型更新快应用性强2024/7/207高性能数据挖掘技术及其应用课件数据挖掘的动机数据挖掘的动机 商业角度商业角度n收集和存储的数据量太大 电子商务商业交易数据信用卡交

6、易保险nCPU的处理速度每年增长15%,不能满足数据量增长的需要n提供更好的个性化服务,先进的客户关系管理手段等数据爆炸,知识贫乏数据爆炸,知识贫乏2024/7/208高性能数据挖掘技术及其应用课件数据挖掘的动机数据挖掘的动机 科学计算角度科学计算角度n海量数据(GB/hour)遥感数据天文望远镜巡天基因表达微阵列(Microarrays)科学模拟n帮助科学家对数据进行多种分析, 如分类、分层等2024/7/209高性能数据挖掘技术及其应用课件数据挖掘的起源数据挖掘的起源n交叉学科统计方法机器学习方法神经网络数据库并行计算n传统方法的局限性在于海量数据高维数据异构数据复杂数据类型2024/7/

7、2010高性能数据挖掘技术及其应用课件流程流程Data Cleaningand IntegrationDatabasesData WarehouseSelection and TransformationData MiningPattern EvaluationFlat files2024/7/2011高性能数据挖掘技术及其应用课件数据挖掘的主要技术数据挖掘的主要技术n聚类(clustering)n异常点检测(anomaly detection)n分类(classification)n预测(prediction)n关联规则(association rules mining)n顺序模式(sequ

8、ential pattern)n时间序列(time-series )2024/7/2012高性能数据挖掘技术及其应用课件聚类聚类n自动将数据分成若干簇,使得不同簇的数据项相似性最小,簇内数据项的相似性最大。(不依赖于预先定义好的类,不需要训练集)n应用模式识别地理信息系统 图像处理生物基因序列分析天体模拟文档聚类n常用算法K-means, BIRCH, DBSACN, EM2024/7/2013高性能数据挖掘技术及其应用课件异常点检测异常点检测n从数据集中找出与正常行为有显著差异的数据项n应用信用卡欺诈医疗数据分析网络入侵检测n常用算法聚类Statistical-based, Distance

9、-based, Deviation-based2024/7/2014高性能数据挖掘技术及其应用课件分类分类n根据从训练集数据(training data)中分析得来的数据各域与已知类别间的函数关系,预测一个新的数据记录的类别n应用市场预测客户关系管理(CRM)营销策略信用评分n常用算法决策树贝叶斯分类神经网络K-近邻2024/7/2015高性能数据挖掘技术及其应用课件分类分类天气湿度温度风况运动晴热高无不适合晴热高有不适合多云热高无不适合class2024/7/2016高性能数据挖掘技术及其应用课件预测预测n根据历史数据建立数学模型,预测新的记录的一个属性的值。n回归(Regression)方

10、法,线性、非线性曲线拟合n常用算法线性回归Logistic回归yxy = x + 2024/7/2017高性能数据挖掘技术及其应用课件关联规则关联规则n从数据中找出频繁集(frequent itemsets),并且找出频繁集中数据项间的相互影响作用n应用市场组合分析套装产品分析目录设计交叉销售n常用算法AprioriDICFP-growthA为为“北京附近有冷涡北京附近有冷涡”,B为为“北京地区有降水北京地区有降水”,A、B同时出现的概率较高同时出现的概率较高(s=60%), P(B|A)高高(c=75%),A 导致导致BTIDItems10A, B, D20A, C, D30A, B, E4

11、0B, E, F50A, B, D, E, F2024/7/2018高性能数据挖掘技术及其应用课件顺序模式顺序模式n从与时间顺序有关的数据中找出频繁的(frequent events),然后寻找出频繁集中数据项间的相互影响作用n应用电信市场营销DNA序列分析n常用算法GSPSPADE买买PC买打印机买打印机 买墨盒买墨盒 买新的买新的CPUTime凡是购买了新电脑的顾客,9个月后很可能又要买新的CPU 营销手段:营销手段:9个月后主动向用户推荐,以保持客户个月后主动向用户推荐,以保持客户2024/7/2019高性能数据挖掘技术及其应用课件时间序列时间序列n随时间变化的数值序列,分析序列的周期,

12、不同序列的相似度,以及预测序列的趋势n应用股票价格医疗诊断电力消耗交通流timeprice2024/7/2020高性能数据挖掘技术及其应用课件Why High Performance Data Mining?nLots of data being collected in commercial and scientific world, massive data setsnStrong competitive pressure to extract and use the information from the data, e.g.Climate simulationAstrophysics

13、Molecular biology2024/7/2021高性能数据挖掘技术及其应用课件Why High Performance Data Mining?nData and/or computational resources needed for analysis are often distributednSometimes the choice is distributed data mining or no data miningOwnership, privacy, security issuesINTERNETnetworknetworknetwork Accelerate the

14、computation Use more memory from multiple machinesSolution: parallel computing !2024/7/2022高性能数据挖掘技术及其应用课件Progress in HPC - past 6 decadesENIACS 1945 100 K Hz 5 K Additions/second 357 Multiplications/secondIBM Blue Gene/L CPU power increasing by a factor of 30-100 every decade Multi-Giga Hz, multi-G

15、igabyte, multi-core CPUs are commodity Teraflops computers are common Petaflops scale computing within reachJaguar - Cray XT4/XT3 - Oak Ridge National LaboratoryEKA (HP Cluster Platform 3000BL) - Computational Research Laboratories2024/7/2023高性能数据挖掘技术及其应用课件TOP 10 Machines 7/2008RankSiteComputer#proc

16、TF/sCountry1DOE/NNSA/LANLIBM1224001375.78USA2DOE/NNSA/LLNLIBM212992596.38USA3Argonne National LaboratoryIBM163840557.06USA4Texas Advanced Computing Center/Univ. of TexasSun62976503.81USA5DOE/Oak Ridge National LaboratoryCray30976260.20USA6 Forschungszentrum Juelich (FZJ)IBM65536222.82Germany7New Mex

17、ico Computing Applications Center (NMCAC)SGI14336172.03USA8Computational Research Laboratories, TATA SONSHP14384172.61India9IDRISIBM40960139.26France10Total Exploration ProductionSGI10240122.88France2024/7/2024高性能数据挖掘技术及其应用课件2024/7/2025高性能数据挖掘技术及其应用课件Supercomputers in Chinan2004年6月,曙光超级服务器,每秒峰值运算速度万

18、亿次,列全球第十,位于上海超级计算中心n2008年6月,曙光5超级服务器,每秒峰值运算速度160万亿次,位于上海超级计算中心n联想深腾6800网格超级计算机,265个四路节点机,1060个处理器芯片,每秒峰值运算速度5万亿次,列2003年11月世界TOP500第14名,位于中科院网络信息中心2024/7/2026高性能数据挖掘技术及其应用课件体系结构体系结构(Architectures)nShared Address SpaceAll processors share a single global address spaceSingle address space facilitates a

19、 simple programming modelExamples: SGI Origin 3000, IBM SP2PCPCPCIMMMProcessorsCachesProcessorInterconnectMemoryPCPCPCIMMMProcessorsCachesProcessorInterconnectMemory(a) UMA(b) NUMA2024/7/2027高性能数据挖掘技术及其应用课件体系结构体系结构(Architectures)nMessage passing platformEach processor has local memory with local add

20、ress spaceOnly way to exchange data is using explicit message passingTime taken for message depends on the relative locations of the source and destination processorsPerformance of a parallel program determined by how well the location of data matches its useExample: clusters, IBM SP and SGI Origin

21、3000 support itProcessing UnitProcessing UnitProcessing Unitobjectmemoryobjectmemoryobjectmemorymessages2024/7/2028高性能数据挖掘技术及其应用课件体系结构体系结构(Architectures)Clusters of 4-way SMPsnHybridMost popular2024/7/2029高性能数据挖掘技术及其应用课件Parallel ProgrammingnConstruct or modify a series program for solving a given pr

22、oblem on a parallel machinenThe programmers responsibility to identify the ways to decompose the computation and extract concurrencynAn exact copy of the program on each processornComplex programming2024/7/2030高性能数据挖掘技术及其应用课件Parallel ProgrammingnData parallelismPartition the data across processorsEa

23、ch processor performs the same operations on its local data partitioningnTask parallelismAssign independent modules to different processorsEach processor performs different operations 2024/7/2031高性能数据挖掘技术及其应用课件提纲提纲n数据挖掘简介n高性能(并行/分布式)数据挖掘n应用实例介绍天体模拟(cosmological simulation)航天(space operation)生态系统(eco

24、system)生物信息学(bioinformatics)天文(astronomy)n总结2024/7/2032高性能数据挖掘技术及其应用课件天体模拟(天体模拟(Cosmological Simulation)nN-body simulation numerically approximates the evolution of the universenEach body represents a galaxy or a star, and bodies attract each other through the gravitational forcenSimilar application

25、sProtein foldingTurbulent fluid flow simulation2024/7/2033高性能数据挖掘技术及其应用课件2024/7/2034高性能数据挖掘技术及其应用课件HOP Clustering AlgorithmnDifficult to discern which particles belong to the same group or cluster, computational intensivenHOP, density-based clustering algorithm by Daniel J. Eisenstein, Piet Hut, 199

26、8nAutomatically identify groups of particles in N-body simulationnParticle attributes: mass, three-dimension coordinatesnFour processing stages: Constructing a KD tree Generating density Hopping Grouping2024/7/2035高性能数据挖掘技术及其应用课件 Find the median particle on the longest axis Recursively bisect the pa

27、rticles along the longest axis Nearby particles are in the same sub-domain Each internal node contains boundary n Two-dimensional KD TreeHOP Clustering Algorithm2024/7/2036高性能数据挖掘技术及其应用课件n Generating density Traverse the tree to find Ndens neighbors for every particle Assign an estimated density to

28、every particlen Hopping Associate each particle with its densest neighbor Each particle “hops” to its densest neighbor till it reaches a particle that is its own densest neighborn Grouping Define particles associated to the same densest neighbor as a group Refine and merge groupsHOP Clustering Algor

29、ithm2024/7/2037高性能数据挖掘技术及其应用课件v Key idea n Load balance Assign approximate equal number of particles to each processorn Minimize communication overheads Requests for potential required remote particles are packed into a single message, and the required particles are transferred to the requesting pro

30、cessorsHOP Clustering Parallelization Ying Liu, Wei-keng Liao, Alok Choudhary, Northwestern University, USA2024/7/2038高性能数据挖掘技术及其应用课件 Assign approximate equal number of particles to each processor Find the median particle in parallel on the longest axis Bisect particles along the longest axis Exchan

31、ge particles between bisected processors Build local KD tree Maintain a global tree on each processor with no real particle transfern Construct Parallel KD TreeHOP Parallelization2024/7/2039高性能数据挖掘技术及其应用课件n Generate Density Intersection test Send out a single message to request the required remote p

32、articles Transfer the required particles Search for neighbors Calculate densityHOP Parallelization2024/7/2040高性能数据挖掘技术及其应用课件n Hopping Hop to its highest density neighbor Book the required remote particles and send out requests Transfer the required particles to requesting processorsn Grouping Partic

33、les linked to the same densest particle are defined as a group Refine groupsHOP Parallelization2024/7/2041高性能数据挖掘技术及其应用课件ExperimentnENZOAn adaptive mesh refinement (AMR), grid-based hybrid code (hydro + N-Body), simulate the cosmological structure formationUse the algorithms of Berger & Collela to i

34、mprove spatial and temporal resolution in regions of large gradients, such as gravitationally collapsing objectsSoftware is flexible, can simulate a wide range of cosmological situationsParallelized using MPI and can run on any shared or distributed memory parallel supercomputer or clustersSimulatio

35、ns on 1024 processors have been carried out on the San Diego Supercomputing Centers Blue Horizon, an IBM SP2024/7/2042高性能数据挖掘技术及其应用课件Data set 1Data set 2Data SourcenEach data set contains 491520 particles2024/7/2043高性能数据挖掘技术及其应用课件 Density generation is the most time consuming stage Data set 2 takes

36、longer execution timen Total Execution TimeData set 1Data set 2Performance Evaluation2024/7/2044高性能数据挖掘技术及其应用课件 The overall performance scales up on IBM SP2 and SGI Origin2000 when increasing number of processors It scales up to 32 processors on Linux Clustern Speedups for Total Execution TimeData s

37、et 1Data set 2Performance Evaluation2024/7/2045高性能数据挖掘技术及其应用课件 Generating density stage scales up on IBM SP2 and SGI Origin2000 It scales up to 32 processors on Linux Clustern Speedups for Generating DensityData set 1Data set 2Performance Evaluation2024/7/2046高性能数据挖掘技术及其应用课件Data set 1Data set 2 Comm

38、unication time does not scale well Communication time increases when number of processors goes beyond 32n Communication CostsPerformance Evaluation2024/7/2047高性能数据挖掘技术及其应用课件提纲提纲n数据挖掘简介n应用实例介绍天体模拟(cosmological simulation)天文(astronomy)航天(space operation)生态系统(ecosystem)生物信息学(bioinformatics)n高性能(并行/分布式)

39、数据挖掘n总结2024/7/2048高性能数据挖掘技术及其应用课件天文(天文(Astronomy)University of Baltimore, USAnPredictive Mining of Time Series Data in Astronomy发觉相同天体或者不同天体间有趣的周期性的行为或者巧合。应用这种周期性的行为来预测或者分析天体行为n算法将每个望远镜收集的数据看成时间序列对时间序列用sliding window处理,得到子序列对这个数据的子序列使用聚类的算法进行分析,得到这个子序列中各种pattern用这些pattern来表示这段时间序列n意义如果pattern A出现在时间

40、序列1当中,那么在此后T时间之内有c%的几率,pattern B会出现在时间序列1, 得到有意义的关联规则对不同的时间序列的pattern进行比较2024/7/2049高性能数据挖掘技术及其应用课件天文(天文(Astronomy)分析天体周期性行为的框架2024/7/2050高性能数据挖掘技术及其应用课件天文(天文(Astronomy)Lawrence Livermore National Laboratory, USAnMining the FIRST survey for galaxies with a bent-double morphologyFIRST: Faint Images o

41、f the Radio Sky at Twenty CentimetersRadio equivalent of the Palomar Observatory Sky Survey (POSS)10,000 square degrees survey of the North Galactic CapUsing the NRAO Very Large Array (VLA), B configuration2024/7/2051高性能数据挖掘技术及其应用课件天文(天文(Astronomy)nThe FIRST data1.8 pixels, resolution 5,rms0.15mJy90

42、 radio sources per square-degree at 1mJy thresholdThe morphological type of a radio source provides clues to their emission mechanism, source class, and the properties of the surrounding mediumThe raw data from the telescopes is extensively processedImages maps and catalog available (sundog.stsci.ed

43、u)2024/7/2052高性能数据挖掘技术及其应用课件天文(天文(Astronomy)nUse data mining to find “bent-doubles” in FIRSTnFIRST astronomers interested in “bent-doubles”indicates presence of clusters of galaxies first “identify” using a visual technique followed by optical observations and checks with other surveysnVisual identi

44、fication is no longer feasible subjective, tedious, likely to miss cases .900,000 galaxies in the full surveynGoal: replace the visual identification of bent doubles by a semi-automated one2024/7/2053高性能数据挖掘技术及其应用课件天文(天文(Astronomy)nDetecting bent-double galaxies in 250GB image data, 78MB catalog dat

45、a (as of 7/2000)2024/7/2054高性能数据挖掘技术及其应用课件天文(天文(Astronomy)Calculate features for a galaxy (103 features)Use the features to train a decision treeUse the tree to classify the unlabeled galaxies and validate the resultsUse validated results to enhance training set2024/7/2056高性能数据挖掘技术及其应用课件天文(天文(Astron

46、omy)nResults using a single tree for 3-entry sources were satisfactory Labeled training set: 167 bents, 28 non-bents Performed several inner iterations using pruned trees (c5.0 decision tree software) Ten 10-fold cross-validation errors: mean (SE)using all the features: 9.7%(0.3%)using triple featur

47、es only: 10.7%(0.3%) Discriminating features include geometrically calculated angles, relative distances, ellipticity and symmetry measures 2024/7/2057高性能数据挖掘技术及其应用课件New Trends GPUs+CUDAnGPU (Graphic Processing Unit), 图形处理器,专用处理器CPU和GPU每秒浮点运算数2024/7/2058高性能数据挖掘技术及其应用课件New Trends GPUs+CUDAnGPU与CPU结构区

48、别更多的晶体管高内存带宽驱动的多核nGPU优势成本低(几百美元)多线程(几百个线程)处理计算密集型数据的效率远高于CPUnGPU缺点编程难2024/7/2059高性能数据挖掘技术及其应用课件New Trends GPUs+CUDAnCUDA一个基于业界标准的C语言的编程环境,用于开发GPU的计算应用程序GPU并行执行非常多线程CPU把计算密集的、并行度高的部分卸载给GPU易编程软件层次结构2024/7/2060高性能数据挖掘技术及其应用课件New Trends GPUs+CUDAn原来只能由workstation完成的工作,可以由PC完成n超级计算一次革命性的进步n成功例子斯坦福大学利用CUD

49、A开发了在GPU上运行的foldinghome,最高运行速度比CPU快140倍。Foldinghome进行蛋白质折叠模拟,找出蛋白质误折叠的后果。Elemental Technologies利用CUDA开发了在使用基于GPU的Badaboom软件后,视频编码的转换过程最高比传统方法快了18倍。有了CUDA的帮助,地理信息系统中,从前需要20分钟才能完成的运算现在只需30秒即可完成,而从前需要30到40秒钟完成的运算现在能够实现实时运算。CUDA技术是自微处理器发明以来计算行业内所诞生的最具革命性的技术。” M2024/7/2061高性能数据挖掘技术及其应用课件New Trends GPUs+C

50、UDA伊利诺伊大学(UIUC)利用GPU进行并行分子动力学研究,用于分析大型生物分子系统。 “未来计算性能的加强将直接来自多核GPU(图形处理器)大规模并行硬件。目前的最大挑战是将代码实现并行化,以便更好地利用相关的硬件,而CUDA取得了突破性的进展,推进了这一领域的发展。” 胡文美教授 金融分析、天体物理学、地震成像等各个领域的开发人员正在受益于CUDA的开发工具。“凭借CUDA,我们很容易地就可利用GPU的处理能力,减少时间和资金的投入。一台主机系统配备两块Tesla D870的成本要比组建16核集群低很多。” Technician“Volera只用了12个GPU(图形处理器)就能实时分析

51、美国整个期权市场,延迟时间不超过10微秒。而达到这样的速度则通常会至少需要60个传统的1U服务器。通过使用GPU,我们的客户可以用更小的维护成本、更低的电能消耗以及更小的占地空间实现更好的效益。” Hanweck Associates2024/7/2062高性能数据挖掘技术及其应用课件High Performance Scientific Data Mining ProjectsnHillol Kargupta, University of Maryland, USAvDistributed Data Mining for Scalable Analysis of Data from Virt

52、ual Observatories, NASA, 2007-2010Astronomers are unable to tap the riches of this collection of gigabyte, terabyte, and (eventually) petabyte catalogs without a computational backbone that includes support for queries and data mining across distributed virtual tables of decentralized, joined, and i

53、ntegrated sky survey catalogs.(1) Design and implement distributed algorithms for computing statistical primitives, principal component analysis, and outlier detection from distributed astronomy catalogs and their partial images stored in users local data management systems. (2) Develop a prototype

54、system which will offer a rich collection of web-services based on various DDM algorithms. (3) Search for unusual correlations, outliers, sub-clusters, and fundamental planes within the multi-dimensional parameter space presented by several large surveys.2024/7/2064高性能数据挖掘技术及其应用课件High Performance Sc

55、ientific Data Mining ProjectsnVipin Kumar, University of Minnesota, USAvDiscovery of Patterns in the Global Climate System using Data Mining, NASA, NOAA, and NSFvData Mining for Bio-medical Informatics2024/7/2065高性能数据挖掘技术及其应用课件High Performance Scientific Data Mining ProjectsnDavid Skillicorn, Queens

56、 University, CanadavTreatment Strategies for Childhood CancersTreatment strategies for most diseases are determined based only on the disease diagnosis. Allow treatment decisions to be made based on subtype of the disease (determined from microarray data), the patients genetic makeup (determined, fo

57、r example, from SNPs), other properties of the patient (for example, age), and constraints of the health system. Build a system that will provide guidance to physicians so that improve patient recovery rates and reduce costs. vMineral ExplorationExplore data mining of geochemical data from surface o

58、r near-surface samples to predict the presence of underlying mineralization. 2024/7/2066高性能数据挖掘技术及其应用课件High Performance Scientific Data Mining ProjectsnMohammed Zaki, Rensselaer Polytechnic Institute, USAvMining Complex Patterns Develop an extensible high-performance generic pattern mining toolkit (

59、GPMT) (1) Extensible and modular for ease of use and customizable to the needs of analysts (2) Scalable and high-performance for rapid response on massive datasets. (3) Seamlessly access , databases, or data archives. Provide a systematic solution to the whole class of common pattern mining tasks in

60、 massive, diverse, and complex datasets, rather than to focus on a specific problem2024/7/2067高性能数据挖掘技术及其应用课件提纲提纲n数据挖掘简介n高性能(并行/分布式)数据挖掘n应用实例介绍天体模拟(cosmological simulation)天文(astronomy)航天(space operation)生态系统(ecosystem)生物信息学(bioinformatics)n总结2024/7/2068高性能数据挖掘技术及其应用课件总结总结n科学计算是未来发展的方向n数据挖掘技术从海量数据中自

61、动地发现有价值的知识,可以提高科学探索的效率n并行/分布式计算的模式是必然需要n交叉研究基础学科领域研究人员数据挖掘研究人员系统平台科研人员2024/7/2069高性能数据挖掘技术及其应用课件总结总结n中国科学院研究生院数据挖掘方向团队:教授3人,副教授4人,讲师2人,博士后4人,学生40余人研究方向:数据挖掘算法及商业应用、科学计算应用承担项目:973计划,自然科学基金委创新群体,自然基金重点项目,面上项目,澳大利亚BHP Billiton石油公司,公安部,中国人民银行,中国工商银行,中国再保险集团,网易等n与各单位真诚合作!2024/7/2070高性能数据挖掘技术及其应用课件Q & A?谢谢!2024/7/2071高性能数据挖掘技术及其应用课件

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 教学/培训

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号