1、安徽农业大学 硕士学位论文 基于半监督机器学习耕地等级评价研究 姓名:高琪娟 申请学位级别:硕士 专业:计算机应用技术 指导教师:杨旸 2011 I 摘要 近几年,随着经济社会的快速发展,城市化的进程的加速,居民收入水平稳步提 高,对土地的需求量也相应增大,占用了大量的优质农用耕地,面对日益严峻的耕地 的流失问题,必须制定切实有效的方法来指导农业和土地部门的土地利用策略。 本研究从保护耕地的角度出发,对安徽省耕地等级进行评价,以揭示耕地的适宜 性与限制性。在传统的层次分析法 AHP 方法的基础之上,利用基于约束的半监督聚类 算法对土地等级进行评价, 克服传统 AHP 模型中人为因素诸多的影响,

2、提高了农用 地的等级评价的准确性。 耕地地力评价实质是评价地形地貌、 土壤理化性状等自然要素对农作物生长限制 程度的强弱。论文首先运用地理信息系统技术分别建好图形数据库和属性数据库后, 通过统一的编码来实现图形数据库和属性数据库的无缝连接, 最终形成完整的空间数 据库。克服了传统的关系数据库在空间数据的表示、存储、管理、检索上存在的许多 缺陷。在空间数据库的基础之上,用传统的层次分析法 AHP 的方法,将人们的经验思 维数量化,用以检验决策者判断的一致性,采用累加法计算每个评价单元的综合地力 指数,从而确定土地做等级评价。考虑到实际工作中,对样本的标定计算开销比较 大,专家标定类别的样本也很有

3、限,采集到的土地评价因子未标签样例的数量 往往远大于标签样例的数量,包括数据的类别标签、数据部分特征的缺失等, 如果只使用已标签的数据,会造成大量未标签数据的浪费,而且其得到的模型 不具有很好的泛化能力,相反若是只使用大量的未标签样例,已标签的数据样 例价值无法得到体现,准确率往往较低。为了克服传统土地评价 AHP 模型的人 为因素多且效率不高的影响,需要将层次分析法和其他结合起来应用,例如半监 督聚类算法,来提高土地评价的简便性和科学性。 在本研究中,以歙县土壤和农业生产等实际情况为例,采用传统的 AHP 方法为 基础,得到标签数据的相似性阖值,同时得到不相似性阖值,综合这两个聚类过程的 数

4、据对象为约束条件,进行标记。利用关联规则挖掘算法从已知类别的样本提取一定 的分类关联规则作为监督信息,对其进行聚类划分,对半监督算法起到指导作用。并 对传统的 AHP 方法和基于约束的半监督聚类算法为核心进行建模, 通过测得的各项评 价因子进行测试得到两组数据, 可以看出基于约束的半监督聚类算法在数据缺失严重 的情况下仍然能对某一固定区域进行土地评价,测出那一区域的土地等级,同时在显 示该地区的属性和图形的空间数据,节省了大量人力和物力资本,提高了工作效率和 准确率。 关键字关键字:空间数据库,AHP 层次分析发,半监督聚类,耕地等级评价 II Abstract In recent years

5、, with the rising speed of the development and the urbanization process , peoples income goes up day by day, and the demanding of land increases accordingly. Consequently, a large number of high-quality arable land has been taken. Therefore, with the increasingly loss of arable land, it is needed to

6、 develop effective methods for guiding the use of agricultural-land. From the perspective of protecting cultivated land, this study evaluates the levels of farmland in Anhui Province in order to reveal the suitability and limitations of the land. Based on the traditional research approach-AHP, this

7、study uses the Semi-supervised clustering algorithm method with constrains to evaluate the levels of the land which avoids subjectivities, so that the validity is being assured This paper firstly introduces the construction of spatial database, during the process of fertility level evaluation It has

8、 occurred the deficiency about the evaluation factors, such as the Point, Line, and other related graphic information using the traditional relational database to store, manage, retrieval in the spatial data representation. Therefore, it uses the common spatial database approach to establish the rel

9、evant information. Spatial database is composed tow parts, attribute database and the graph database, using the geographic information system technology to built graphics and attribute database, through uniform coding of graphics database and attribute database seamlessly, and ultimately form a comp

10、lete space database. Secondly, based on the spatial database, using the traditional approach -AHP to do the evaluation of land level, the essence of land level evaluation is to evaluate the real topography, soil properties and other natural factors limit the degree of crop growth , AHP method is a s

11、ystem analysis method, which is developed on the basis of qualitative methods from the quantitative evaluation factors to determine the weight of factors, this method makes the experience thinking in figuresand test the consistency of decision-makers judge, benefit for the conducive to quantitative

12、evaluation. Accumulating Calculated used for each evaluation unit integrated soil fertility index, which determine the optimal number of cultivated land fertility. AHP has its own deficiencies in the evaluation of land, AHP is particularly used for the evaluation of qualitative indicators, it is not

13、 enough to discuss the qualitative diagnosis and quantitative , there is no full use of existing quantitative information and select the optimum from the options, and can not provide new solutions for decision-making. III Therefore, the AHP method and other methods should be combined, such as combin

14、ing with the semi-supervised clustering, semi-supervised clustering algorithm is initially identified by the data tag information and non-similarity similarity criteria and automatically adjust the clustering process, played the role of discipline and guidance. According to different ways for markin

15、g, one method called the Semi-supervised clustering algorithm method with constrains which labels the data by studying the similarity and the dissimilarity value of tag data to obtained the two parts values. Both parts of values as the clustering of two data objects must meet two conditions (constra

16、ints). In this study, the research method is based on the traditional approach AHP to obtain the similarity and dissimilarity values of tag data, these two objects of clustering data as the constraints to tag for directing the process of clustering, then improving the accuracy and efficiency of clustering. This research has constructed one core model using the traditional method AHP and Semi-supervised clustering algorithm meth



