《用机器学习的方法理解社会媒体幻灯片》由会员分享,可在线阅读,更多相关《用机器学习的方法理解社会媒体幻灯片(96页珍藏版)》请在金锄头文库上搜索。
1、,Understanding Social Media with Machine Learning Xiaojin Zhu jerryzhucs.wisc.edu Department of Computer Sciences University of WisconsinMadison, USA CCF/ADL Beijing 2013,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,1 / 95,Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Soc
2、ial Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,2 / 95,Spatio-Temporal S
3、ignal Recovery from Social Media Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Z
4、hu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,3 / 95,Spatio-Temporal Signal Recovery from Social Media Spatio-temporal Signal: When, Where, How Much Direct instrumental sensing is di cult and expensive,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,4 / 95,Spatio
5、-Temporal Signal Recovery from Social Media Humans as Sensors,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,5 / 95,Spatio-Temporal Signal Recovery from Social Media Humans as Sensors Not “hot trend” discovery: We know what event we want to monitor Not natural language processing
6、for social media: We are given a reliable text classier for “hit” Our task: precisely estimating a spatiotemporal intensity function fst of a pre-dened target phenomenon.,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,6 / 95,Spatio-Temporal Signal Recovery from Social Media Challe
7、nges of Using Humans as Sensors Keyword doesnt always mean event,I I,I was just told I look like dead crow. Dont blame me if one day I treat you like a dead crow.,Human sensors arent under our control Location stamps may be erroneous or missing,I I I I,3% have GPS coordinates: (-98.24, 23.22) 47% ha
8、ve valid user prole location: Bristol, UK, New York 50% dont have valid location information Hogwarts, In the tra cblah, Sitting On A Taco,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,7 / 95,Spatio-Temporal Signal Recovery from Social Media Problem Denition Input: A list of time
9、 and location stamps of the target posts. Output: fst Intensity of target phenomenon at location s (e.g., New York) and time t (e.g., 0-1am),Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,8 / 95,Spatio-Temporal Signal Recovery from Social Media Why Simple Estimation is Bad fst = x
10、st, the count of target posts in bin (s,t) Justication: MLE of the model x Poisson(f) However,I I I,Population Bias: Assume fst = fs0t0, if more users in (s,t), then xst xs0t0 Imprecise location: Posts without location stamp, noisy user prole location Zero/Low counts: If we dont see tweets from Anta
11、rctica, no penguins there?,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,9 / 95,Machine Learning Basics Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probability Statistical Estimation Decision Theory Graphical Models Regularization Sto
12、chastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,10 / 95,Machine Learning Basics,Probability,Outline,1 2 3 4,Spatio-Temporal Signal Recovery from Social Media Machine Learning Basics Probabilit
13、y Statistical Estimation Decision Theory Graphical Models Regularization Stochastic Processes Socioscope: A Probabilistic Model for Social Media Case Study: Roadkill,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,11 / 95,Machine Learning Basics,Probability,Probability The probabil
14、ity of a discrete random variable A taking the value a is P(A = a) 2 0,1. Sometimes written as P(a) when no danger of confusion. Normalization Joint probability P(A = a,B = b) = P(a,b), the two events both happen at the same time. Marginalization P(A = a) = B”. P(a,b) The product rule P(a,b) = P(a)P(b|a) = P(b)P(a|b).,Zhu (U Wisconsin),Understanding Social Media,CCF/ADL Beijing 2013,12 / 95,Bayes rule P(a|b) =,P(b|a)P(a).,In general, P(a|b,C) =,P(b|C),R,p(D|)p()d the evidence,Machine Learning Basics,Probability,Bayes Rule,P(b) P(b|a,C)P(a|C),where C