《Lecture5缺失值处理策略课件》由会员分享,可在线阅读,更多相关《Lecture5缺失值处理策略课件(54页珍藏版)》请在金锄头文库上搜索。
1、OutlineoftheproblemMissingvaluesinlongitudinaltrialsisabigissueFirstaimshouldbetoreduceproportionEthicsdictatethatitcantbeavoidedThereisnomagicmethodtofixitMagnitudeofproblemvariesacrossareas8-weekdepressiontrial:25%50%maydropoutbyfinalvisit12-weekasthmatrial:maybeonly5%10%1DateName,department2Outli
2、neofthelecturePartI:MissingdataPartII:MultipleimputationExample:Theanalgesictrial34DateName,department5PartI:MissingdataInrealdatasets,like,e.g.,surveysandclinicaltrials,itisquitecommontohaveobservationswithmissingvaluesforoneormoreinputfeatures.Thefirstissueindealingwiththeproblemisdeterminingwheth
3、er the missing data mechanism has distorted the observed data.LittleandRubin(1987)andRubin(1987)distinguishbetweenbasicallythreemissingdatamechanisms.Dataaresaidtobemissingatrandom(MAR)ifthemechanismresultinginitsomissionisindependentofits(unobserved)value.Ifitsomissionisalsoindependentoftheobserved
4、values,thenthemissingnessprocessissaidtobemissingcompletelyatrandom(MCAR).Inanyothercasetheprocessismissingnotatrandom(MNAR),i.e.,themissingnessprocessdependsontheunobservedvalues.http:/www.emea.europa.eu/pdfs/human/ewp/177699EN.pdf1.Introductiontomissingdata?Variables Cases?=missing6Whatismissingda
5、ta?The missingness hides a real value that is useful for analysis purposes.Survey questions:1.What is your total annual income for FY 2008?2.Who are you voting for in the 2009 election for the European parlament?7Whatismissingdata?Clinical trials:StartFinishcensored at this point in timetime8Missing
6、nessIt matters why data are missing.Supposeyouaremodellingweight(Y)asafunctionofsex(X).Somerespondentswouldntdisclosetheirweight,soyouaremissingsomevaluesforY.Therearethreepossiblemechanismsforthenondisclosure:1.Theremaybenoparticularreasonwhysomerespondentstoldyoutheirweightsandothersdidnt.Thatis,t
7、heprobabilitythatYismissingmayhasnorelationshiptoXorY.Inthiscaseourdataismissing completely at random2.Onesexmaybelesslikelytodiscloseitsweight.Thatis,theprobabilitythatYismissingdependsonlyonthevalueofX.Suchdataaremissing at random3.Heavy(orlight)peoplemaybelesslikelytodisclosetheirweight.Thatis,th
8、eprobabilitythatYismissingdependsontheunobservedvalueofYitself.Suchdataarenotmissingatrandom9Missingdatapatterns&mechanisms Pattern:Which values are missing?Mechanism:Is missingness related to the response?(Yi,Ri)=Data matrix,with COMPLETE DATARij=1,Yij missing0,Yij observedRij=Missing data indicato
9、r matrix=Observed part of Y=Missing part of Y10Missingdatapatterns&mechanisms“Pattern”concerns the distribution of R“Mechanism”concerns the distribution of R given YRubin(Biometrika 1976)distinguishes between:Missing Completely at Random(MCAR)P(R|Y)=P(R)for all Y Missing at Random(MAR)P(R|Y)=P(R|)fo
10、r all Not Missing at Random(NMAR)P(R|Y)depends on11MissingAtRandom(MAR)Whatarethemostgeneralconditionsunderwhichavalidanalysiscanbedoneusingonlytheobserveddata,andnoinformationaboutthemissingnessvaluemechanism,Theanswertothisiswhen,given the observed data,the missingness mechanism does not depend on
11、 the unobserved data.Mathematically,ThisistermedMissing At Random,andisequivalenttosayingthatthebehaviouroftwounitswhoshareobserved valueshavethesamestatisticalbehaviourontheotherobservations,whetherobservedornot.12Asunits1and2havethesamevalueswherebothareobserved,giventheseobservedvalues,underMAR,v
12、ariables3,5and6fromunit2havethesamedistribution(NBnotthesamevalue!)asvariables3,5and6fromunit1.NotethatunderMARtheprobabilityofavaluebeingmissingwillgenerallydependonobservedvalues,soitdoesnotcorrespondtotheintuitivenotionofrandom.Theimportantideaisthatthemissingvaluemechanismcanbeexpressedsolelyint
13、ermsofobservations that are observed.Unfortunately,thiscanrarelybedefinitivelydeterminedfromthedataathand!Example13If data are MCAR or MAR,youcanignorethemissingdatamechanismandusemultipleimputationandmaximumlikelihood.If data are NMAR,youcantignorethemissingdatamechanism;twoapproachestoNMARdataares
14、election modelsandpattern mixture.14SupposeYisweightinpounds;ifsomeonehasaheavyweight,theymaybelessinclinedtoreportit.SothevalueofYaffectswhetherYismissing;thedataareNMAR.Twopossibleapproachesforsuchdataareselectionmodelsandpatternmixture.Selection models.Inaselectionmodel,yousimultaneouslymodelYand
15、theprobabilitythatYismissing.Unfortunately,anumberofpracticaldifficultiesareoftenencounteredinestimatingselectionmodels.Pattern mixture(Rubin1987).WhendataisNMAR,analternativetoselectionmodelsismultipleimputationwithpatternmixture.Inthisapproach,youperformmultipleimputationsunderavarietyofassumption
16、saboutthemissingdatamechanism.Inordinarymultipleimputation,youassumethatthosepeoplewhoreporttheirweightsaresimilartothosewhodont.Inapattern-mixturemodel,youmayassumethatpeoplewhodontreporttheirweightsareanaverageof20poundsheavier.Thisisofcourseanarbitraryassumption;theideaofpatternmixtureistotryoutavarietyofplausibleassumptionsandseehowmuchtheyaffectyourresults.Patternmixtureisamorenatural,flexible,andinterpretableapproach.15Simpleanalysisstrategies(1)Complete Case(CC)analysisAdvantages:Complete