大数据技术年会

上传人:F****n 文档编号:96401690 上传时间:2019-08-26 格式:PPT 页数:24 大小:736.56KB
返回 下载 相关 举报
大数据技术年会_第1页
第1页 / 共24页
大数据技术年会_第2页
第2页 / 共24页
大数据技术年会_第3页
第3页 / 共24页
大数据技术年会_第4页
第4页 / 共24页
大数据技术年会_第5页
第5页 / 共24页
点击查看更多>>
资源描述

《大数据技术年会》由会员分享,可在线阅读,更多相关《大数据技术年会(24页珍藏版)》请在金锄头文库上搜索。

1、开放平台 Apsara Cloud Platform,About Aliyun,Chinas largest cloud service provider 100s of thousands of customers Billions of accesses everyday,Providing Foundation Services of the Cloud Eco-system,Pay by usage Elasticity Safety (like “tap water”),The Nature of Cloud Computing,Scale 大规模,Economy 低成本,Publi

2、c Utility 服务运营,Internet-scale computing 2.5EB generated per day, doubling every 40 months Billions of txns on Taobao everyday, must be processed in 6 hours,Economy means more than low prices Leading to behavior changes (like “telephone”) Key is scheduling (like “power grid”),Two Design Principles,La

3、rge-scale general computing platform as the base One system supporting both offline and online services Multi-tenancy, resource sharing, load shifting Web-based API as the delivery mechanism Online activation, pay-by-usage Location-transparency,Linux Cluster,IDC,Resource Management (伏羲),Security (钟馗

4、),RPC (夸父),Naming/Coordination(女娲),Cluster Deployment (大禹),Cluster Monitor (神农),Distributed File System (盘古),Job Scheduling (伏羲),ACE,OSS,OTS,ODPS,ECS/SLB,RDS,Map, Mail, Search, etc,Cloud Mart,Other Cloud Services,OSPS,Cloud Computing Services,Elastic Computing 弹性计算,ECS: virtualized instances of serv

5、ers that can be created and tailored to meet application requirements SLB: software load balancing technology that can elastically expand service capacity on demand ACE: Convenient and efficient execution environment for Web services, supporting Java, PHP, Node.js,Storage and Databases 海量存储和数据库,Larg

6、e-scale Computing 大规模计算,Cloud Computing Services,Elastic Computing 弹性计算,Storage and Databases 海量存储和数据库,OSS: large-scale object storage service for unstructured data such as photos, music, or video OTS: large scale storage service for structured or semi-structured data storage and real-time query RDS

7、: managed instances for relational databases with automatic backup and failover,Large-scale Data Computing 大规模计算,A Comparison of Storage and Database Services,Cloud Computing Services,Elastic Computing 弹性计算,Storage and Database 海量存储和数据库,Large-scale Computing 大规模计算,ODPS: large-scale data batch proces

8、sing and computation, supporting SQL and MapReduce style programming languages OSPS: stream data processing service, supporting SQL-like query language and automatic failure recovery,Apsara Technical Highlights,A common platform supporting both offline and online services Search: 24B pages processed

9、, 13B online index Mail: 100M mails received, 10M mails sent, 10ms latency Capability-based security management framework, enforcing the Principle of Least Privilege Distributed deployment, monitoring and diagnostics Zero SPOF (single-point-of-failure): availability 99.9% All data has 3 replicas: da

10、ta reliability 99.99999999%,5K,2013/08/15: First-ever 5000-node Apsara cluster (ODPS) went into production 100K CPU cores, 100PB raw storage Processing petabytes per day 2013/09/24: Opened access to ODPS for 4 universities & research institutions Sorting 100TB in 30 minutes Current known record: 72

11、minutes (Yahoo!, 2013/07/03),Pangu: Large-scale Distributed File System,Master-Slave Architecture Master for metadata mgmt, Slave(Chunk Server) for IO mgmt Paxos-based multi-master architecture, failure recovery time 1 minute End-to-end inline checksum Scales to 1 billion files,CS,CS,CS,CS,CS,Separa

12、ted IO Pipeline and Storage Mgmt,Adaptive IO Pipeline Replication master: chunk server vs client Replication policy: chaining vs star-replication Chunking policy: fixed, variable, or RAID Durability guarantee: txn logging vs sequential write,Common Storage Management Physical IO management Priority

13、and QoS Background re-replication Chunk placement,Staged Event-driven Physical IO Mgmt,Chunk Server would rearrange IO requests to support priority, QoS, and reduce IO seek overhead,Distributed Re-replication,1TB,1TB,1TB,Typical: Mirroring,(10 hours),1TB,1TB,1TB,Pangu: Distributed re-replication,(20

14、 min, 50-nodes),1TB,Intelligent scheduling Balanced storage Bandwidth throttling Minimizing data loss,RAID,Built into the core system instead of an add-on layer (as in HDFS RAID) Better management of data integrity, recovery, and chunk placement Synchronous redundancy block generation Low-latency fa

15、ilure recovery Small file support,Fuxi Master,Fuxi Master,. . .,App Master,APP Worker,App Master,APP Worker,. . .,Client,. . .,. . .,. . .,Tubo,Tubo,Job control,Resource requests,Node control,Job submission,APP Worker,APP Worker,APP Worker,Tubo,Tubo,Fuxi Resource Scheduling,Multi-dimension resources

16、 Elastic quota CGroup-based isolation Fuxi Master HA App Master failover Incremental scheduling,Fuxi Job Programming Model,Job: A DAG Vertex: Task Each task may have multiple instances based on input data chunks Edge: data flow, each task may have multiple input/output flows A data flow connecting two tasks represents data shuffling,Example: Find Best-Sellers,SELECT prod_id, Sum(count) AS quantity FROM orders GROUP BY prod_id ORDER BY quantity DESC;,orders,

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > PPT模板库 > PPT素材/模板

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号