《面向云平台的软件故障和容忍机制研究》由会员分享,可在线阅读,更多相关《面向云平台的软件故障和容忍机制研究(47页珍藏版)》请在金锄头文库上搜索。
1、面向云平台的软件故障 容忍机制研究,2009-12-25,大纲,大纲,研究背景,计算需求的快速发展,用户对软件日益增长的需求:可靠,准确,可扩展。 据调查,财富500强公司里面59%的公司每周至少有1.6小时的宕机时间 据调查,由于系统宕机而导致的商业损失是每小时84,000$108,000$。2008年8月,Google的云计算服务出现严重问题,Blogger和Spreadsheet等服务均长时间宕机,Gmail服务两周内3次停摆。,虚拟化技术(Virtualization Technology)能够充分利用底层硬件的处理能力,支持多个操作系统(OS)同时运行。 软件容错技术是在软件出错的情
2、况下保证软件在性能和安全方面可接受的情况下继续提供服务。,研究背景,研究内容,除了在传统架构下的研究软件容错机制,还应该在虚拟化环境下调整软件容错机制来适应虚拟化架构。 我们旨在应用虚拟化技术来重新研究软件容错机制 单机虚拟化架构下软件容错机制研究 虚拟集群下软件容错机制研究 项目支持 973项目:计算系统虚拟化基础理论与方法研究,总体框架,故障检测,研究工作,大纲,研究现状,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability and Security Thr
3、ough Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Practical Byzant
4、ine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability and Security Through Failur
5、e-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Practical Byzantine Fault T
6、olerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,研究现状,Application,Error,Reboot,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability and Secu
7、rity Through Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Practica
8、l Byzantine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,研究现状,Read,杜撰值,Write,丢弃,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availabil
9、ity and Security Through Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容
10、错机制 Practical Byzantine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,研究现状,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability an
11、d Security Through Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Pr
12、actical Byzantine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,研究现状,研究现状,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability and
13、 Security Through Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Pra
14、ctical Byzantine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008, copy_len buff_size ,Server,Community machines,观察正常行为获取其不变量,copy_len buff_size,copy_len buff_size,copy_len =
15、 buff_size,检测工具攻击收集信息,Server, copy_len buff_size ,违反: copy_len buff_size,比较攻击下的行为,Candidate patches: Set copy_len = buff_size Set copy_len = 0 Set buff_size = copy_len Return from procedure,Server,产生候选补丁来修复错误,Predictive: copy_len buff_size,Server,Patch 1,Patch 3,Patch 2,分发补丁到社区机器上,Ranking: Patch 1:
16、0 Patch 2: 0 Patch 3: 0 ,Ranking: Patch 3: +5 Patch 2: 0 Patch 1: -5 ,Server,Patch 1 失败,Patch 3 成功,评估补丁 成功 = 检测工具没有检测到错误,Server,Patch 3,分发最优的补丁,Ranking: Patch 3: +5 Patch 2: 0 Patch 1: -5 ,研究现状,传统的软件容错机制 Microreboot A Technique for Cheap Recovery, OSDI 2004 Enhancing Server Availability and Security
17、 Through Failure-Oblivious Computing, OSDI 2004 Rx: Treating Bugs As Allergies A Safe Method to Survive Software Failures, SOSP 2005, Best Paper ASSURE: Automatic Software Self-healing Using REscue points, ASPLOS 2009 Automatically Patching Errors in Deployed Software, SOSP 2009 拜占庭容错机制 Practical Byzantine Fault Tolerance, OSDI 1999 Zyzzyva: Speculative Byzantine Fault Tolerance, SOSP 2007, Best Paper Diverse Replication for Single-Machine Byzantine-Fault Tolerance, USENIX 2008,