《csdn大数据应用大会ppt——01-杨栋:hce提升资源利用率的mapreduce框架》由会员分享,可在线阅读,更多相关《csdn大数据应用大会ppt——01-杨栋:hce提升资源利用率的mapreduce框架(30页珍藏版)》请在金锄头文库上搜索。
1、HCE: A MapReduce Framework towards Improve Resource Utilization Yang Dong yangdonglee About Me Research Area Distributed Storage System HDFS Hypertable Distributed Computing System MapReduce DataStream 2 Agenda Background and Motivation Framework Model Evaluation Conclusion Q&A 3 Agenda Background a
2、nd Motivation State of Art Challenge Solution Framework Model Evaluation Conclusion Q&A 4 State of Art 50000+ jobs 10000+ nodes 10P+ data processed per day 5 How to improve the efficiency of clusters? How to improve development efficiency? How to satisfy customer requirements? How to control and mai
3、ntain? Challenge Resource Utilization Job optimization Resource Scheduling Dynamic Configuration Task optimization Framework optimization for small tasks User program optimization for big tasks 6 Challenge Cluster Status Most tasks are small 80% map tasks time 10% machines at least Contribution Face
4、book Hive Over HCE Implementation HiveMapper and HiveReducer RC-File RecordReader and RecordWriter Performance CPU utilization 20%50% improvement Patches to Apache Jira http:/issues.apache.org/jira/browse/MAPREDUCE-1270 https:/issues.apache.org/jira/browse/MAPREDUCE-2446 28 Thanks for your Attention 29 Questions 30