HADOOP - 数据挖掘研究组

上传人:jiups****uk12 文档编号:45255868 上传时间:2018-06-15 格式:PPTX 页数:38 大小:731.45KB
返回 下载 相关 举报
HADOOP - 数据挖掘研究组_第1页
第1页 / 共38页
HADOOP - 数据挖掘研究组_第2页
第2页 / 共38页
HADOOP - 数据挖掘研究组_第3页
第3页 / 共38页
HADOOP - 数据挖掘研究组_第4页
第4页 / 共38页
HADOOP - 数据挖掘研究组_第5页
第5页 / 共38页
点击查看更多>>
资源描述

《HADOOP - 数据挖掘研究组》由会员分享,可在线阅读,更多相关《HADOOP - 数据挖掘研究组(38页珍藏版)》请在金锄头文库上搜索。

1、 HadoopIntroducing Installation and Configuration数据挖掘研究组 Data Mining Group Xiamen UniversityA Distributed data-intensive Programming FrameworkHDFSMapReduceHadoopDistributed Distributed storagestorageParallel computingParallel computing数据挖掘研究组 Data Mining Group Xiamen UniversityIntroducing to HDFSHad

2、oop Distributed File System (HDFS)An open-source implementation of GFS has many similarities with distributed file systems. However, comes differences with it. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data

3、and is suitable for applications that have large data sets.数据挖掘研究组 Data Mining Group Xiamen UniversityHow it works?Features of itAn important feature of the design An important feature of the design : :data is never moved through the data is never moved through the namenodenamenode. .Instead, Instea

4、d, all all data transferoccurs directly data transferoccurs directly between between clients clients and datanodesand datanodes数据挖掘研究组 Data Mining Group Xiamen UniversityMapReduce?Lets talk it next timeLets talk it next time数据挖掘研究组 Data Mining Group Xiamen University“Running Hadoop?”What means for i

5、t?“Running Hadoop” means running a set of daemons.NameNodeDataNodeSecondary NameNodeJobTrackerTaskTracker数据挖掘研究组 Data Mining Group Xiamen UniversityWho Works for who?HDFSMapReduceHadoopNameNodeSec NDTaskTrackerJobTrackerDataNodeNameNodeHadoop employs a master/slave architecture for Hadoop employs a

6、master/slave architecture for both distributed storage and both distributed storage and distributed distributed putation.NameNodeNameNode is the master of HDFS that directs the is the master of HDFS that directs the slave slave DataNodeDataNode daemons to perform the low- daemons to perform the low-

7、 level I/O taskslevel I/O tasksNameNodeNameNode is the bookkeeper of HDFS is the bookkeeper of HDFSkeeps track of how your files are broken down keeps track of how your files are broken down into file blocksinto file blockskeeps track of the overall health of the keeps track of the overall health of

8、 the distributed fidistributed filesystemlesystemDataNodereading and writing HDFS blocks for clientsreading and writing HDFS blocks for clientscommunicate with other communicate with other DataNodesDataNodes to to replicate its data blocks for redundancyreplicate its data blocks for redundancy数据挖掘研究

9、组 Data Mining Group Xiamen UniversityNameNode and DataNodeSecondary NameNodeSNN is an assistant daemon for monitoring SNN is an assistant daemon for monitoring the state of the cluster HDFSthe state of the cluster HDFSdiffers from the differs from the NameNodeNameNode in that this in that this proce

10、ss doesnt receive or record any real-time process doesnt receive or record any real-time changes to HDFSchanges to HDFScommunicates with the communicates with the NameNodeNameNode to take to take snapshots of the HDFS metadatasnapshots of the HDFS metadataRecovery:Recovery:NameNodeNameNode failure ?

11、 failure ?We reconfigure the cluster to use the SNN as We reconfigure the cluster to use the SNN as the primary the primary NameNodeNameNodeJobTrackerthe liaison between your application and the liaison between your application and HadoopHadoopsubmit your code to your cluster, the submit your code t

12、o your cluster, the JobTrackerJobTracker determines the execution plan determines the execution plandetermining which files to processdetermining which files to processassigns nodes to different tasksassigns nodes to different tasksmonitors all tasks as theyre running monitors all tasks as theyre ru

13、nning a task fail?a task fail?JobTrackerJobTracker will will relaunchrelaunch the task on a different the task on a different nodenodeTaskTrackerEach Each TaskTrackerTaskTracker is responsible for is responsible for executing the individual tasks that the executing the individual tasks that the JobT

14、rackerJobTracker assigns assigns数据挖掘研究组 Data Mining Group Xiamen UniversityJobTracker and TaskTrackerInstallation and ConfigurationPseudo-distributed modePseudo-distributed modeAll All daemons run on daemons run on onon the machine the machineFully distributed modeFully distributed modeWhat Differen

15、t?What Different?数据挖掘研究组 Data Mining Group Xiamen UniversityInstallation forPseudo-distributed modePrerequisitesPrerequisitesUbuntu LinuxUbuntu LinuxHadoop 0.20.2Hadoop 0.20.2Sun Java 6Sun Java 6$ $sudosudo add-apt-repository “deb http:/ lucid partner“ add-apt-repository “deb http:/ lucid partner“ $ $sudosudo apt-get update apt-get update $ $sudosudo apt-get install sun-java6-jdk apt-get install sun-java6-jdk数据挖掘研究组 Data Mining Group Xiamen UniversityConfiguring SSHHadoop requires SSH access to manage its nodes, remote machines plus your local machine if yo

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 行业资料 > 其它行业文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号