《聚类分析外文文献及翻译》由会员分享,可在线阅读,更多相关《聚类分析外文文献及翻译(15页珍藏版)》请在金锄头文库上搜索。
1、本科毕业论文外文文献及译文文献、资料题目:Cluster AnalysisBasicConceptsand Algorithms文献、资料来源:文献、资料发表(出版)日期:院 (部): 土木工程学院专 业: 土木工程班 级:姓 名:学 号: 指导教师:翻译日期:外文文献:Cluster AnalysisBasicConceptsand AlgorithmsCluster analysis divides data into groups (clusters) that are meaningful, useful,or both. If meaningful groups are the go
2、al, then the clusters should capture thenatural structure of the data. In some cases, however, cluster analysis is only auseful starting point for other purposes, such as data summarization. Whetherfor understanding or utility, cluster analysis has long played an importantrole in a wide variety of e
3、lds: psychology and other social sciences, biology,statistics, pattern recognition, information retrieval, machine learning, anddata mining.There have been many applications of cluster analysis to practical problems. We provide some specic examples, organized by whether the purposeof the clustering
4、is understanding or utility.Clustering for UnderstandingClasses, or conceptually meaningful groupsof objects that share common characteristics, play an important role in howpeople analyze and describe the world. Indeed, human beings are skilled atdividing objects into groups (clustering) and assigni
5、ng particular objects tothese groups (classication). For example, even relatively young children canquickly label the objects in a photograph as buildings, vehicles, people, animals, plants, etc. In the context of understanding data, clusters are potentialclasses and cluster analysis is the study of
6、 techniques for automatically ndingclasses. The following are some examples:Biology. Biologists have spent many years creating a taxonomy (hierarchical classication) of all living things: kingdom, phylum, class,order, family, genus, and species. Thus, it is perhaps not surprising thatmuch of the ear
7、ly work in cluster analysis sought to create a disciplineof mathematical taxonomy that could automatically nd such classication structures. More recently, biologists have applied clustering toanalyze the large amounts of genetic information that are now available.For example, clustering has been use
8、d to nd groups of genes that havesimilar functions. Information Retrieval. The World Wide Web consists of billions ofWeb pages, and the results of a query to a search engine can returnthousands of pages. Clustering can be used to group these search results into a small number of clusters, each of wh
9、ich captures a particularaspect of the query. For instance, a query of “movie” might returnWeb pages grouped into categories such as reviews, trailers, stars, andtheaters. Each category (cluster) can be broken into subcategories (sub-clusters), producing a hierarchical structure that further assists
10、 a users exploration of the query results. Climate. Understanding the Earths climate requires nding patternsin the atmosphere and ocean. To that end, cluster analysis has beenapplied to nd patterns in the atmospheric pressure of polar regions andareas of the ocean that have a signicant impact on lan
11、d climate. Psychology and Medicine. An illness or condition frequently has anumber of variations, and cluster analysis can be used to identify thesedifferent subcategories. For example, clustering has been used to identifydifferent types of depression. Cluster analysis can also be used to detectpatt
12、erns in the spatial or temporal distribution of a disease. Business. Businesses collect large amounts of information on current and potential customers. Clustering can be used to segment customers into a small number of groups for additional analysis and marketing activities.Clustering for Utility:C
13、luster analysis provides an abstraction from individual data objects to the clusters in which those data objects reside. Additionally, some clustering techniques characterize each cluster in terms of a cluster prototype; i.e., a data object that is representative of the other objects in the cluster.
14、 These cluster prototypes can be used as the basis for a number of data analysis or data processing techniques. Therefore, in the context of utility, cluster analysis is the study of techniques for nding the most representative cluster prototypes. Summarization. Many data analysis techniques, such a
15、s regression or PCA, have a time or space complexity of O(m2) or higher (where m is the number of objects), and thus, are not practical for large data sets. However, instead of applying the algorithm to the entire data set, it can be applied to a reduced data set consisting only of cluster prototype
16、s. Depending on the type of analysis, the number of prototypes, and the accuracy with which the prototypes represent the data, the results can be comparable to those that would have been obtained if all the data could have been used. Compression. Cluster prototypes can also be used for data compres-sion. In particular, a table is created that consists of the prototypes for each cluster; i.e., each prototype is