MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP

上传人:飞*** 文档编号:48802159 上传时间:2018-07-20 格式:PPTX 页数:34 大小:933.74KB
返回 下载 相关 举报
MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP_第1页
第1页 / 共34页
MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP_第2页
第2页 / 共34页
MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP_第3页
第3页 / 共34页
MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP_第4页
第4页 / 共34页
MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP_第5页
第5页 / 共34页
点击查看更多>>
资源描述

《MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP》由会员分享,可在线阅读,更多相关《MACHINE LEARNING ON SPARK - UC BERKELEY AMP CAMP(34页珍藏版)》请在金锄头文库上搜索。

1、Machine Learning on SparkShivaram Venkataraman UC BerkeleyComputer Science Machine learningStatisticsMachine learningSpam filtersRecommendationsClick predictionSearch rankingMachine learning techniquesClassificationRegressionClusteringActive learningCollaborative filteringImplementing Machine Learni

2、ng Machine learning algorithms are- Complex, multi-stage- Iterative MapReduce/Hadoop unsuitable Need efficient primitives for data sharing Spark RDDs efficient data sharing In-memory caching accelerates performance- Up to 20x faster than Hadoop Easy to use high-level programming interface- Express c

3、omplex algorithms 100 lines.Machine Learning using SparkMachine learning techniquesClassificationRegressionClusteringActive learningCollaborative filteringK-Means Clustering using SparkFocus: Implementation and PerformanceClusteringGrouping data according to similarityDistance EastDistance NorthE.g.

4、 archaeological digClusteringGrouping data according to similarityDistance EastDistance NorthE.g. archaeological digK-Means AlgorithmBenefits Popular Fast Conceptually straightforwardDistance EastDistance NorthE.g. archaeological digK-Means: preliminariesFeature 1Feature 2Data: Collection of valuesd

5、ata = lines.map(line=parseVector(line)K-Means: preliminariesFeature 1Feature 2Dissimilarity: Squared Euclidean distancedist = p.squaredDist(q)K-Means: preliminariesFeature 1Feature 2K = Number of clustersData assignments to clustersS1, S2,. . ., SKK-Means: preliminariesFeature 1Feature 2K = Number o

6、f clustersData assignments to clustersS1, S2,. . ., SKK-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.K-Means AlgorithmFeatu

7、re 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign

8、each data point to the cluster with the closest center. Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with

9、 the closest center. Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence: Assign each data point to the cluster with the closest center. Assign each cl

10、uster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)cl

11、osest = data.map(p =(closestPoint(p,centers),p)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)K-M

12、eans AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:Assign each cluster center to be the mean of its clusters data points.centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)K-Means AlgorithmFeature 1Feature 2 Initialize K clus

13、ter centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest

14、= data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,cente

15、rs),p)pointsGroup = closest.groupByKey()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupBy

16、Key()newCenters = pointsGroup.mapValues(ps = average(ps)K-Means AlgorithmFeature 1Feature 2 Initialize K cluster centers Repeat until convergence:centers = data.takeSample(false, K, seed)closest = data.map(p =(closestPoint(p,centers),p)pointsGroup = closest.groupByKey()newCenters =pointsGroup.mapValues(ps = average(ps)while (dist(centers, newCenters) )K-Means AlgorithmFeature 1Featur

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 行业资料 > 其它行业文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号