大数据介绍英文讲述

上传人:最**** 文档编号:115987875 上传时间:2019-11-15 格式:PPT 页数:33 大小:1.50MB
返回 下载 相关 举报
大数据介绍英文讲述_第1页
第1页 / 共33页
大数据介绍英文讲述_第2页
第2页 / 共33页
大数据介绍英文讲述_第3页
第3页 / 共33页
大数据介绍英文讲述_第4页
第4页 / 共33页
大数据介绍英文讲述_第5页
第5页 / 共33页
点击查看更多>>
资源描述

《大数据介绍英文讲述》由会员分享,可在线阅读,更多相关《大数据介绍英文讲述(33页珍藏版)》请在金锄头文库上搜索。

1、BIG DATA EVERY MINUTE 1,388 cabs 2,777 private cars Didi rides hailed: EVERY MINUTE 395,833 People log in To WeChat 194,444 people are video or audio chatting EVERY MINUTE 625,000 Youku Tudou videos being watched EVERY MINUTE 64,814 posts and reposts on Weibo SEARCH 4,166,667 search queries EVERY MI

2、NUTE 774 people buy something on Alibabas marketplaces US$1,133,942 spent on Alibaba 1 Definition 2 Characteristic 3 NoSQL 4 RDBMS 5 MapReduce CONTENTS 6 Applications 1 Definition 1 Definition BIG DATA volume of data important data on a day-to-day basis for better decisions 2 Characteristic 2 Charac

3、teristic Volume The quantity of generated and stored data. Variety The type and nature of the data. The quality of captured data can vary greatly, affecting accurate analysis. Velocity In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie

4、 in the path of growth and development. Variability Inconsistency of the data set can hamper processes to handle and manage it. Veracity 3 NoSQL 3 NoSQL NoSQL refers to document-oriented databases SQL doesnt scale well horizontally. It is schemaless. But not formless (JSON format). JSON: data interc

5、hange format Mongo Database Couch Database 3 NoSQL Basic Availability spread data across many storage systems with a high degree of replication. Soft State Eventual Consistency Base Model data consistency is the developers problem and should not be handled by the database. at some point in the futur

6、e, data will converge to a consistent state. No guarantees are made “when”. 3 NoSQL field1: value1, field2: value2 fieldN: valueN var mydoc = _id:ObjectId(“5099803df3f4948bd2f98391“), name: first: “Alan“, last: “Turing“ , birth: new Date(Jun 23, 1912), death: new Date(Jun 07, 1954), contribs: “Turin

7、g machine“, “Turing test“, , views : NumberLong(1250000) JSON Structure 3 NoSQL RDBMS vs NoSQL Xszc Row DB: 001:10,Smith,Joe,40000;002:12,Jones,Mary,50000;003:11,Johnson,Cathy,44000;004:22,Jones,Bob,5 5000; index: 001:40000;002:50000;003:44000;004:55000; Column DB: 10:001,12:002,11:003,22:004;Smith:

8、001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,Cathy: 003,Bob:004;40000:001,50000 ;Smith:001,Jones:002,004,Johnson:003; 3 NoSQL Benefits Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns o

9、f data, because reading that smaller subset of data can be faster than reading all data. Column-oriented organizations are more efficient when new values of a column are supplied for all rows at once, because that column data can be written efficiently and replace old column data without touching an

10、y other columns for the rows. Row-oriented organizations are more efficient when many columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek. Row-oriented organizations are more efficient when writing a

11、 new row if all of the column data is supplied at the same time, as the entire row can be written with a single disk seek. 3 NoSQL SQL vs Non SQL A good compromise is to design your system with 3 logical DBs 1. Normal SQL DB used by your admin application to create content. 2. No-SQL DB for front-en

12、d/public/high-volume applicaiton used by the public internet. 3. The last DB is for analytical reporting system using cubes and all that good stuff. Then data flows from the Admin DB to the client No- SQL DB when someone “Publishes“ a piece of content, the client (NoSQL) db provides very fast read a

13、ccess and records user interactions with the content. Then you have a scheduled job that pulls the data from the client DB into the reporting system. Since Admin, client, and reporting are often separate apps, each application team can work with data in the format that best serves the application an

14、d the transition from one system to the other is handled in the service layers. 4 RDBMS 4RDBMS fixed-schema, row-oriented databases with ACID properties and a sophisticated SQL query engine The emphasis is on strong consistency, referential integrity, abstraction from the physical layer, and complex

15、 queries through the SQL language. easily create secondary indexes, perform complex inner and outer joins, count, sum, sort, group, and page your data across a number of tables, rows, and columns. 5 MapReduce Dividing and conquering Highly fault tolerant Every data block replicated on 3 nodes Diffic

16、ult to implement 5 MapReduce 5 Comparison RDBMSMapReduce Data sizeGBPB AccessInteractive and Batch Batch UpdatesRead /Write many times Write once ,Read many times Structure Static Schema Dynamic Scheme Integrated High(ACID)Low Scaling No liner Liner DBA Ratio 1:401:3000 5 How does MapReduce work MapReduce uses key/value pairs. (Traditionally using rows and columns)-Map all the intermediate values for a given output key are combined together into a list.

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 大学课件

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号