云计算关键技术－金锄头文库

资源描述

《云计算关键技术》由会员分享，可在线阅读，更多相关《云计算关键技术（108页珍藏版）》请在金锄头文库上搜索。

1、云计算关键技术,Page 2,虚拟化技术内容,1 虚拟化定义 2 虚拟化分类 3 全虚拟化与半虚拟化 4虚拟化实现 5虚拟化技术比较与选型 6虚拟化带来的好处 7虚拟化带来的问题 8虚拟化适用范围 9服务器虚拟化过程,MapReduceMapReduce是一个简单易用的并行编程模型，它极大简化了大规模数据处理问题的实现,Page 3,Divide and Conquer,“Work”,w1,w2,w3,r1,r2,r3,“Result”,“worker”,“worker”,“worker”,Partition,Combine,Parallelization Challenges,How do

2、we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die?,What is the common theme of all of these problems?,Common Theme?,Paral

3、lelization problems arise from: Communication between workers (e.g., to exchange state) Access to shared resources (e.g., data) Thus, we need a synchronization mechanism,Managing Multiple Workers,Difficult because We dont know the order in which workers run We dont know when workers interrupt each o

4、ther We dont know the order in which workers access shared data Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers Still, lots of problems: Deadlock, livelock, race conditions. Dining philosophers, sleepy barbers, cigarette smokers. Moral of the story:

5、be careful!,Current Tools,Programming models Shared memory (pthreads) Message passing (MPI) Design Patterns Master-slaves Producer-consumer flows Shared work queues,But , now Mapreduce!,Mapreduce: Parallel/Distributed Computing Programming Model,Input split,shuffle,output,Typical problem solved by M

6、apReduce,读入数据: key/value 对的记录格式数据 Map: 从每个记录里extract something map (in_key, in_value) - list(out_key, intermediate_value) 处理input key/value pair 输出中间结果key/value pairs Shuffle: 混排交换数据把相同key的中间结果汇集到相同节点上 Reduce: aggregate, summarize, filter, etc. reduce (out_key, list(intermediate_value) - list(out_v

7、alue) 归并某一个key的所有values，进行计算输出合并的计算结果 (usually just one) 输出结果,Mapreduce Framework,Mapreduce Framework,Shuffle Implementation,Partition and Sort Group,Partition function: hash(key)%reducer number Group function: sort by key,Example uses:,Model is Widely Applicable MapReduce Programs In Google Source

8、 Tree,Google MapReduce Architecture,MapReduce Operation,Initial data split into 64MB blocks,Computed, results locally stored,M sends data location to R workers,Final output written,Master informed of result locations,Execution overview,1. Input files are split into M pieces (16 to 64 MB)Many worker

9、copies of the program are forked. 2. One special copy, the master, assigns map and reduce tasks to idle slave workers 3. Map workers read input splits, parse (key,value) pairs, apply the map function, create buffered output pairs. 4. Buffered output pairs are periodically written to local disk, part

10、itioned into R regions, locations of regions are passed back to the master. 5. Master notifies reduce worker about locations. Worker uses remote procedure calls to read data from local disks of the map workers, sorts by intermediate keys to group same key records together.,Execution overview cont,6.

11、 Reduce worker passes key plus corresponding set of all intermediate data to reduce function. The output of the reduce function is appended to the final output file. 7. When all map and reduce tasks are completed the master wakes up the user program, which resumes the user code.,Fault Tolerance: wor

12、kers,master保持一些数据结构。它为每个map和reduce任务存储它们的状态(空闲，工作中，完成)，和worker机器(非空闲任务的机器)的标识。Master pings workers periodically. No response: worker marked as failed. Completed map tasks are reset to idle state, so that they can be restarted, because their results (local to failed worker) are lost. Completed reduce

13、 tasks do not need to be re-started (output stored in global file system). Reduce tasks are notified of the new map tasks, so they can read unread data from the new locations.,Fault Tolerance: Master,Master writes checkpoints Only one master, less chance of failure If master failes, MapReduce task a

14、borts.,Refinement: Redundant Execution,Slow workers significantly delay completion time Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowly Solution: Near end of phase, spawn backup tasks Whichever one finishes first “wins“ Dramatically shortens job completion tim

15、e,Refinement: Locality Optimization,Master scheduling policy: Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect Thousands of machines read input at local

16、disk speed Without this, rack switches limit read rate,Refinement: Skipping Bad Records,Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix Not always possible third-party source libraries On segmentation fault: Send UDP packet to master from signal handler Incl

17、ude sequence number of record being processed If master sees two failures for same record: Next worker is told to skip the record,Compression of intermediate data Combiner “Combiner” functions can run on same machine as a mapper Causes a mini-reduce phase to occur before the real reduce phase, to save bandwidth Local execution for debugging/testing User-defined counters,

展开阅读全文