海量存储系统中数据分布进化及其关键技术的研究

资源描述

《海量存储系统中数据分布进化及其关键技术的研究》由会员分享，可在线阅读，更多相关《海量存储系统中数据分布进化及其关键技术的研究（102页珍藏版）》请在金锄头文库上搜索。

1、华中科技大学博士学位论文海量存储系统中数据分布进化及其关键技术的研究姓名：王宇德申请学位级别：博士专业：计算机系统结构指导教师：谢长生;曹强 2010-05-27 I 华华中科中科技技大学博士学位论大学博士学位论文文摘摘要要信息化的飞速发展推动存储系统在规模、体系结构等方面都出现了新的改变，朝着大规模、复杂化的方向演化。同时系统所服务的 I/O 负载也呈现出多样性、不平衡性和动态性。而当前海量存储系统往往直接继承传统小规模存储系统结构和运行机制，还很难适应具有动态、并发、多样等特性的大规模 I/O 负载要求。现有的存储系统物理和逻

2、辑的组织基于静态的结构，该结构很难感知外部负载请求特征和系统的运行状态动态改变，从而无法调整自身的存储组织结构以满足不同 I/O 负载在时间和空间上的变化，无法有效和自动提高系统的整体存储效率。面对上述问题，设计针对海量存储系统的数据分布进化机制，对存储系统的动态数据存取负载特性进行分析，根据数据的历史访问信息通过热度模型对未来访问趋势加以预测，将不同热度的数据与不同性能等特征级别的存储资源组加以匹配映射，动态对数据进行迁移和重新分布，以达到提升整体存储效率的目的。数据分布进化的过程完全是自动化的，由进化规则进行控制，通过进化规则管理系统进行调度。阐述了能够根据当前运行环境自

3、动调整存储组织模式的海量存储系统中的数据分布进化技术。该系统能在运行过程中根据 I/O 负载及自身状态的变化，自动选择最适合当前存取负载特征的数据分布模式，满足多用户环境下负载对于性能和可靠性等方面的要求。建立了数据存取的热度计算模型，对数据存取负载的数据热点进行量化计算和预测。和一般的热度研究结果仅仅涉及数据集的访问次数和频率不同，改进后的热度还综合考虑到访问请求的时间序列因素，使之能够更加有效的体现负载的历史信息，从而能更准确的反映存取负载未来趋势。分别针对文件和 LUN 的热度进行了分析和定义，以真实的 trace 数据对热度计算模型加以测试，深入的分析了热度公式的

4、实际数据表现。测试中发现，数据的热度和访问次数和频率正相关，和访问的时间间隔负相关，实验证明热度公式能够较好的对未来的访问行为进行趋势预测。设计了根据数据热度进行数据分布进化的数据迁移机制。数据分布进化中，需 II 华华中科中科技技大学博士学位论大学博士学位论文文要动态调整数据的分布，以适应系统工作负载的变化，从而提升整体系统的效率。在一般的设计中，根据 RAID 级别或 RAID 组对系统存储资源进行分级，而在进化存储系统中，对系统内所有存储资源按照性能和可靠性等特性进行分级。依据程序访问的局部性原理，针对不同行为特性和需求的热点数据，匹配以相

5、应级别的存储资源，从而有效的利用存储池中的不同存储资源，以显著提高进化存储系统的整体效率。数据迁移策略中还对数据迁移的触发条件和开销进行了定义，并设计了进化存储系统的数据替换策略。在实验部分，通过原型系统验证了分级存储数据迁移对性能提升的效果。设计了独立的进化规则管理系统，实现海量存储系统的自动化管理。在大规模存储系统中，无论存储系统的物理管理和海量数据的逻辑组织和分布都是极为复杂和动态的，仅仅依赖人工管理是不可行的，因此需要设计基于一系列存储规则的系统来管理和调度系统运行状态。在通常的系统中，规则的所有参数都硬编码在代码中，使得规则的定义、更改和查询非常困难。在规则管理系统

6、中，通过对规则词汇的定义，并引入决策表和决策树的管理，使得系统可以灵活、清晰、快速地定义、查询和变更规则，并且通过规则的引用记录来统计和分析规则的使用情况。研究工作设计和构造可适应自身运行环境的进化存储系统，对数据分布进化中存取负载特性分析和数据迁移机制，以及进化规则的管理都做出了新的尝试，实验证明具有良好的运行效果。关键词：关键词：进化存储系统，数据分布进化，存取热点，数据迁移，进化规则 III 华华中科中科技技大学博士学位论大学博士学位论文文 Abstract So many new changes appear in terms of si

7、ze and architecture of storage system, led by the rapid development of information technologies, evolving towards large-scale, complex direction. At the same time, system service I/O workload also appears to be various, unbalanced and dynamic. However, current mass storage systems are often directly

8、 inherited from traditional small-scale storage system structure and operating mechanism. Therefore, it is difficult to satisfy the dynamic, parallel and diversity characteristics of requirements by a large-scale I/O workload. The physical and logical organization of existing storage system is based

9、 on static structure, and the structure is difficult to perceive the external workload request status characteristics and the system dynamics changes. So the system cant adjust its storage organizational structure to satisfy different I/O workload changes in temporal and spatial. It cant effectively

10、 and automatically increase the overall efficiency of system. To address the problems mentioned above, data distribution evolution mechanism for mass storage systems is designed, and dynamic data access workload characteristics is analyzed, to predict future access trends according to the history of

11、 the data access record by the heat model. Different performance level resource groups with different heat data, dynamic data migration and re-distribution are matched, in order to achieve the purpose of improving overall storage efficiency. Data distribution process of evolution is completely autom

12、ated, controlled by the evolutionary rules, scheduling through rule management system. Large-scale storage systems evolution techniques, which can adjust the storage organizing model automatically, is described according to the current running environment. The system can run the process according to

13、 I/O workload and its status, and automatically select the most suitable system organizational model for the current workload characteristics, to satisfy the performance and reliability requirements under the workload of multi-user environment. The study includes the system physical evolutionary met

14、hod, the system logical structure evolutionary method and data distribution evolutionary methods. Special design of data distribution mechanism is proposed for system evolution. IV 华华中科中科技技大学博士学位论大学博士学位论文文 Heat computing model of data access is built for calculating and predicting

15、the data workload hot spot in quantifying. Different from general data heat research which only involves frequency of data access, improved heat computing is also taken into account to the access time series factor, so that it can more effectively reflect the history of the workload information, to

16、more accurately reflect the future trends of I/O workload. Analysis and definition of heat are made separately for files and LUNs. The heat computing model is tested by the real trace data. And in-depth analysis of the heat equation result with actual data is given. As a result, data heat is positive related with the number of visits and frequency, and a negative correlation with access interval time. And experimental resul

展开阅读全文