上海交通大学硕士学位论文基于网格的数据融合平台研究与设计姓名:徐志麟申请学位级别:硕士专业:计算机软件与理论指导教师:陆朝俊20080101上海交通大学硕士学位论文 摘 要 I基于网格的数据融合平台研究与设计 摘 要 网格计算的概念是随着高性能计算的应用需求发展起来的,主要是从学术角度出发考虑广域网内计算机资源的共享,从而达到资源的最大化利用随着互联网近年来的高速发展,网格中的信息资源也随之多样化,其类型由传统的结构化资源延伸到高度异构的半结构化和非结构化资源如何搭建基于网格的数据融合平台,融合网格中海量的、动态的异构数据源,给用户提供统一、透明且基于一定语义的数据访问服务,成为了当前研究热点 参考了一些传统的信息集成技术,本文针对此问题为数据源制定了一套新的面向对象的元数据描述规范,并在此基础上给出了一套包括全局数据模式、数据融合策略、数据访问机制等方面内容在内的较完整的数据融合平台设计方案其中主要的研究内容和创新点如下: 1. 研究网格数据融合技术,设计并实现网格数据融合平台 SGDAI,介绍其主要框架、特点以及设计目标,分析了该信息集成系统中主要模块之间的调用关系和功能。
2. 提出了一套面向对象的元数据描述规范 SGMDP该规范以全局类为最小数据访问单位组织平台范围内的数据,并统一数据源提供者对数据的语义描述形式平台通过整合数据源提供的类访问方法方便地向用户提供跨类跨数据源的异构数据访问服务 3. 提出一种基于聚类的有效类挖掘算法 C-ECM 以优化平台的数据访问效率该算法通过用户行为分析优化全局类视图,从而一定程上海交通大学硕士学位论文 摘 要 II度上弥补 SGMDP 为实现异构访问所牺牲的效率 4. 采用一种多属性决策算法实现基于可信度的数据级融合工具SGDataFilter,从而为用户提供最有价值的数据 关键词:信息网格,信息集成,数据融合,网格计算 上海交通大学硕士学位论文 Abstract IIIRESEARCH AND DESIGN ON GRID-BASED DATA FUSION PLATFORM ABSTRACT Grid computing developed with the requirement of high performance computing, considering about sharing computer resources and the maximization of using resources. With the development of internet, information resources vary from traditional structured resources to semi-structured resources, even unstructured resources. So lots of researchers concern about how to build a data fusion platform based on grid, integrate huge volume of dynamic heterogeneous data resources and provide users uniform, transparent and semantic data access service. Refer to the traditional information integration technology; this paper proposes a novel design solution for a grid-based data fusion system. It includes an object-oriented metadata description specification, a data query mechanism and a schema integration strategy. The main research and innovative work is as follows: 1. Research on grid-based data fusion technique, design and implement a grid-base data fusion platform: SGDAI. Describe the architecture, features and design goal of SGDAI, especially the function of main modules and key relationships among them. 2. Propose an object-oriented metadata description protocol SGMDP, which organizes the data by global class and unifies the descript method 上海交通大学硕士学位论文 Abstract IVused by data resource providers. SGDAI integrate the class read methods provided by data resources so as to offer a multi-resources data access service. 3. Propose an effectual class mining algorithm C-ECM based on cluster to optimize the data access of SGDAI, which optimizes the global class view by user behavior analysis. 4. Use a multi-attributes decision algorithm to implement data-level fusion based on confidence, which is called SGDataFilter and provides the most valuable data for users. KEY WORDS: Information Grid, Information Integration, Data Fusion, Grid Computing 上海交通大学硕士学位论文 图片目录 VIII图片目录 图 1 多元级系统架构 ···························································································13 图 2 自动级系统架构 ···························································································14 图 3 SGDAI 门户界面··························································································17 图 4 SGDAI 框架图······························································································21 图 5 SGMDP 中的一个类·····················································································26 图 6 SGMDP 中的一个子类·················································································27 图 7 关系型资源的访问方法················································································28 图 8 Read Method 接口·························································································29 图 9 局部类融合 ···································································································31 图 10 基于聚类的有效类挖掘··············································································43 图 11 局部类视图的优化流程··············································································44 图 12 添加新资源 ·································································································45 图 13 编辑元数据 ·································································································45 图 14 生成的元数据 ·····························································································46 上海交通大学硕士学位论文 表格目录 IX表格目录 表 1 有效类挖掘宽表 ···························································································42 表 2 实验聚类结果 ·······························································································47 表 3 候选记录集 ···································································································48 上海交通大学上海交通大学 学位论文原创性声明学位论文原创性声明 本人郑重声明:所呈交的学位论文,是本人在导师的指导下,独立进行研究工作所取得的成果。
除文中已经注明引用的内容外,本论文不包含任何其他个人或集体已经发表或撰写过的作品成果对本文的研究做出重要贡献的个人和集体,均已在文中以明确方式标明本人完全意识到本声明的法律结果由本人承担 学位论文作者签名:徐 志 麟 日期:2008 年 1 月 17 日 I上海交通大学上海交通大学 学位论文版权使用授权书学位论文版权使用授权书 本学位论文作者完全了解学校有关保留、使用学位论文的规定,同意学校保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借阅本人授权上海交通大学可以将本学位论文的全部或部分内容编入有关数据库进行检索,可以采用影印、缩印或扫描等复制手段保存和汇编本学位论文。