大规模数据处理云计算.ppt

资源描述

《大规模数据处理云计算.ppt》由会员分享，可在线阅读，更多相关《大规模数据处理云计算.ppt（39页珍藏版）》请在金锄头文库上搜索。

1、大规模数据处理大规模数据处理/云计算云计算 Lecture 3 MapReduce Basics闫宏飞北京大学信息科学技术学院7/12/2011http:/ work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http:/creativecommons.org/licenses/by-nc-sa/3.0/us/ for detailsJimmy LinUniversity of Maryland课程建设SEWMGroupHow do we sca

2、le up?Source: Wikipedia (IBM Roadrunner)Divide and Conquer“Work”w1w2w3r1r2r3“Result”“worker”“worker”“worker”PartitionCombineParallelization ChallengesHow do we assign work units to workers?What if we have more work units than workers?What if workers need to share partial results?How do we aggregate

3、partial results?How do we know all the workers have finished?What if workers die?What is the common theme of all of these problems?Common Theme?Parallelization problems arise from:Communication between workers (e.g., to exchange state)Access to shared resources (e.g., data)Thus, we need a synchroniz

4、ation mechanismSource: Ricardo Guimares HerrmannManaging Multiple WorkersDifficult becauseWe dont know the order in which workers runWe dont know when workers interrupt each otherWe dont know the order in which workers access shared dataThus, we need:Semaphores (lock, unlock)Conditional variables (w

5、ait, notify, broadcast)BarriersStill, lots of problems:Deadlock, livelock, race conditions.Dining philosophers, sleepy barbers, cigarette smokers.Moral of the story: be careful!Current ToolsProgramming modelsShared memory (pthreads)Message passing (MPI)Design PatternsMaster-slavesProducer-consumer f

6、lowsShared work queuesMessage PassingP1P2P3P4P5Shared MemoryP1P2P3P4P5Memorymasterslavesproducer consumerproducer consumerwork queueWhere the rubber meets the roadConcurrency is difficult to reason aboutConcurrency is even more difficult to reason aboutAt the scale of datacenters (even across datace

7、nters)In the presence of failuresIn terms of multiple interacting servicesNot to mention debuggingThe reality:Lots of one-off solutions, custom codeWrite you own dedicated library, then program with itBurden on the programmer to explicitly manage everythingSource: Wikipedia (Flat Tire)Source: MIT Op

8、en CoursewareSource: MIT Open CoursewareSource: Harpers (Feb, 2008)Whats the point?Its all about the right level of abstractionThe von Neumann architecture has served us well, but is no longer appropriate for the multi-core/cluster environmentHide system-level details from the developersNo more race

9、 conditions, lock contention, etc.Separating the what from howDeveloper specifies the computation that needs to be performedExecution framework (“runtime”) handles actual executionThe datacenter is the computer!“Big Ideas”Scale “out”, not “up”Limits of SMP and large shared-memory machinesMove proces

10、sing to the dataCluster have limited bandwidthProcess data sequentially, avoid random accessSeeks are expensive, disk throughput is reasonableSeamless scalabilityFrom the mythical man-month to the tradable machine-hourMapReducegggggfffffMapFoldRoots in Functional ProgrammingTypical Large-Data Proble

11、mIterate over a large number of recordsExtract something of interest from eachShuffle and sort intermediate resultsAggregate intermediate resultsGenerate final outputKey idea: provide a functional abstraction for these two operationsMapReduce(Dean and Ghemawat, OSDI 2004)19MapReduceProgrammers speci

12、fy two functions:map (k, v) *reduce (k, v) *lAll values with the same key are sent to the same reducerThe execution framework handles everything else20mapmapmapmapShuffle and Sort: aggregate values by keysreducereducereducek1k2k3k4k5k6v1v2v3v4v5v6ba12cc36ac52bc78a15b27c2368r1s1r2s2r3s321MapReducePro

13、grammers specify two functions:map (k, v) *reduce (k, v) *lAll values with the same key are sent to the same reducerThe execution framework handles everything elseWhats “everything else”?22MapReduce “Runtime”Handles schedulinglAssigns workers to map and reduce tasksHandles “data distribution”lMoves

14、processes to dataHandles synchronizationlGathers, sorts, and shuffles intermediate dataHandles errors and faultslDetects worker failures and restartsEverything happens on top of a distributed FS (later)23MapReduceProgrammers specify two functions:map (k, v) *reduce (k, v) *lAll values with the same

15、key are reduced togetherThe execution framework handles everything elseNot quiteusually, programmers also specify:partition (k, number of partitions) partition for klOften a simple hash of the key, e.g., hash(k) mod nlDivides up key space for parallel reduce operationscombine (k, v) *lMini-reducers

16、that run in memory after the map phaselUsed as an optimization to reduce network traffic24combinecombinecombinecombineba12c9ac52bc78partitionpartitionpartitionpartitionmapmapmapmapk1k2k3k4k5k6v1v2v3v4v5v6ba12cc36ac52bc78Shuffle and Sort: aggregate values by keysreducereducereducea15b27c298r1s1r2s2r3

17、s3c236825Two more detailsBarrier between map and reduce phaseslBut we can begin copying intermediate data earlierKeys arrive at each reducer in sorted orderlNo enforced ordering across reducers26“Hello World”: Word CountMap(String docid, String text): for each word w in text: Emit(w, 1);Reduce(Strin

18、g term, Iterator values): int sum = 0; for each v in values: sum += v; Emit(term, sum);27MapReduce can refer toThe programming modelThe execution framework (aka “runtime”)The specific implementationUsage is usually clear from context!28MapReduce ImplementationsGoogle has a proprietary implementation

19、 in C+lBindings in Java, PythonHadoop is an open-source implementation in JavalAn Apache projectlLarge contribution of development led by Yahoo, used in productionlRapidly expanding software ecosystemLots of custom research implementationslFor GPUs, cell processors, etc.29split 0split 1split 2split

20、3split 4workerworkerworkerworkerworkerMasterUserProgramoutputfile 0outputfile 1(1) submit(2) schedule map(2) schedule reduce(3) read(4) local write(5) remote read(6) writeInputfilesMapphaseIntermediate files(on local disk)ReducephaseOutputfilesAdapted from (Dean and Ghemawat, OSDI 2004)30How do we g

21、et data to the workers?Compute NodesNASSANWhats the problem here?31Distributed File SystemDont move data to workers move workers to the data!lStore data on the local disks of nodes in the clusterlStart up the workers on the node that has the data localWhy?lNot enough RAM to hold all the data in memo

22、rylDisk access is slow, but disk throughput is reasonableA distributed file system is the answerlGFS (Google File System) for Googles MapReducelHDFS (Hadoop Distributed File System) for Hadoop32GFS: AssumptionsCommodity hardware over “exotic” hardwarelScale “out”, not “up”High component failure rate

23、slInexpensive commodity components fail all the time“Modest” number of huge fileslMulti-gigabyte files are common, if not encouragedFiles are write-once, mostly appended tolPerhaps concurrentlyLarge streaming reads over random accesslHigh sustained throughput over low latencyGFS slides adapted from

24、material by (Ghemawat et al., SOSP 2003)33GFS: Design DecisionsFiles stored as chunkslFixed size (64MB)Reliability through replicationlEach chunk replicated across 3+ chunkserversSingle master to coordinate access, keep metadatalSimple centralized managementNo data cachinglLittle benefit due to larg

25、e datasets, streaming readsSimplify the APIlPush some of the issues onto the client (e.g., data layout)HDFS = GFS clone (same basic ideas)34From GFS to HDFSTerminology differences:lGFS master = Hadoop namenodelGFS chunkservers = Hadoop datanodesFunctional differences:lNo file appends in HDFS (planne

26、d feature)lHDFS performance is (likely) slowerFor the most part, well use the Hadoop terminology35Adapted from (Ghemawat et al., SOSP 2003)(file name, block id)(block id, block location)instructions to datanodedatanode state(block id, byte range)block dataHDFS namenodeHDFS datanodeLinux file systemH

27、DFS datanodeLinux file systemFile namespace/foo/barblock 3df2ApplicationHDFS ClientHDFS Architecture36Namenode ResponsibilitiesManaging the file system namespace:lHolds file/directory structure, metadata, file-to-block mapping, access permissions, etc.Coordinating file operations:lDirects clients to

28、 datanodes for reads and writeslNo data is moved through the namenodeMaintaining overall health:lPeriodic communication with the datanodeslBlock re-replication and rebalancinglGarbage collection37Putting everything togetherdatanode daemonLinux file systemtasktrackerslave nodedatanode daemonLinux fil

29、e systemtasktrackerslave nodedatanode daemonLinux file systemtasktrackerslave nodenamenodenamenode daemonjob submission nodejobtracker38ReferencesLinCh2:Mapreduce Basic TomCh6:How mapreduce works2003 The Google file system, in sosp. Bolton Landing, NY, USA: ACM Press, 2003. 2004 MapReduce: Simplified Data Processing on Large Clusters, in osdi, 2004, 39

展开阅读全文

大规模数据处理云计算.ppt

最新文档