《Google与云计算》由会员分享,可在线阅读,更多相关《Google与云计算(32页珍藏版)》请在金锄头文库上搜索。
1、Google and Cloud ComputingGoogle与云计算王咏刚王咏刚Google 资深工程师AgendaThe Internet: From Hardware to CommunityThe Innovation: A Computing CloudBreakthroughs for Cloud ComputingGoogle Apps for Cloud ComputingGoogle Infrastructure for Cloud ComputingThe InternetFrom Hardware to CommunityThe Internet: From Hardw
2、are to CommunityMySpaceFacebook开心网校内网What Do Todays Users Want? AccessibilityAccess from anywhere and from multiple devicesShareabilityMake sharing as easy as creating and savingFreedomUsers dont want their data held hostageSimplicityEasy-to-learn, easy-to-useSecurityTrust that data will not be lost
3、 or seen by unwanted parties6The InnovationA Computing CloudCloud Computing7Attributes of Cloud Computing8Data stored on the cloudSoftware & services on the cloud - Access via web browserBased on standards and protocols - Linux, AJAX, LAMP, etc.Accessible from any deviceHardware CentricSoftware Cent
4、ricService CentricPersonal PCClient ServerCloud Computing9Breakthroughs for Cloud ComputingBreakthroughs for Cloud Computing10User-Centric1Task-Centric2Powerful3Intelligent4Affordable5Programmable6User CentricData stored in the “Cloud”Data follows you & your devicesData accessible anywhereData can b
5、e shared with othersmusicpreferencesmapsnewscontactsmessagesmailing listsphotoe-mailscalendarphone numbersinvestmentsExample : GMailJust a web browser and your account with password!Once you login, the device is “yours”.Data stored on remote servers in the “cloud” (with large capacity)Beijing, on tr
6、avelSan Francisco, MondayHome, WednesdayUse Google Docs to Solve a TaskAccess your docs from anywhereChat with others in real timeChanges instantly appear to other collaboratorsTask = “Teachers creating a departmental curriculum”Communication Task Email, Chat, Contacts, Chat HistoryTask: Collaborate
7、 on Spreadsheet Communicate Chat with others editing the spreadsheetTask: Collaborate on Spreadsheet CollaborateInvite others to collaborate on the spreadsheetTask: Collaborate on Spreadsheet Publish Invite others to view the spreadsheetYou can also easily organize all your common tasks Cloud Comput
8、ing is Powerful: It can do what no PC can doIs Google Search faster than search in Windows/Outlook/Word?And Google Search must be much harder.How much storage does it take to store all of the web pages?100B pages * 10K per page = 1000T disk!Cloud computing has at its disposalEssentially infinite amo
9、unt of diskEssentially infinite amount of computation(Assuming they can be parallelized)Example: Google SearchWeb Page Search Universal SearchW1st Generation: era of single search not diverse2nd Generation: era of vertical search too complex3rd Generation: an era of Universal SearchABCDEFrom vertica
10、l search to universal searchABCDEIntegration of user experienceUniversal Search ExampleUniversal Search ExampleCloud Computing Infrastructure25GFS ArchitectureGoogle48%MSN19%Yahoo33%Files broken into chunks (typically 64 MB)Master manages metadataData transfers happen directly between clients/chunks
11、erversClientClientClientReplicasMastersGFS MasterGFS MasterC0C1C2C5Chunkserver 1C0C2C5Chunkserver NC1C3C5Chunkserver 2ClientClientClientClientClientClientTypical Cluster26Scheduling mastersGFSchunkserverSchedulerslaveLinuxMachine 1User app2Userapp1GFS masterLock serviceGFSchunkserverSchedulerslaveLi
12、nuxMachine NUserapp3User app2Userapp1GFSchunkserverSchedulerslaveLinuxMachine 2Userapp3MapReduce27More specifically28Programmer specifies two primary methods:map(k, v) *reduce(k, *) *All v with same k are reduced together, in order.Usually also specify:partition(k, total partitions) - partition for
13、koften a simple hash of the keyallows reduce operations for different k to be parallelized29BigTableDistributed multi-level mapWith an interesting data modelFault-tolerant, persistentScalableThousands of serversTerabytes of in-memory dataPetabyte of disk-based dataMillions of reads/writes per second
14、, efficient scansSelf-managingServers can be added/removed dynamicallyServers adjust to load imbalance30BigTable: Basic Data ModelDistributed multi-dimensional sparse map(row, column, timestamp) cell contentsGood match for most of our applications“”ROWSCOLUMNSTIMESTAMPS“contents”BigTable: System Arc
15、hitectureCluster Scheduling Masterhandles failover, monitoringGFSholds tablet data, logsLock serviceholds metadata,handles master-electionBigtable tablet serverserves dataBigtable tablet serverserves dataBigtable tablet serverserves dataBigtable masterperforms metadata ops,load balancingBigtable cellBigtable clientBigtable clientlibraryOpen()ThanksQ&A