Lecture5缺失值处理策略课件

上传人:des****85 文档编号:331545544 上传时间:2022-08-23 格式:PPTX 页数:54 大小:2.02MB
返回 下载 相关 举报
Lecture5缺失值处理策略课件_第1页
第1页 / 共54页
Lecture5缺失值处理策略课件_第2页
第2页 / 共54页
Lecture5缺失值处理策略课件_第3页
第3页 / 共54页
Lecture5缺失值处理策略课件_第4页
第4页 / 共54页
Lecture5缺失值处理策略课件_第5页
第5页 / 共54页
点击查看更多>>
资源描述

《Lecture5缺失值处理策略课件》由会员分享,可在线阅读,更多相关《Lecture5缺失值处理策略课件(54页珍藏版)》请在金锄头文库上搜索。

1、OutlineoftheproblemMissingvaluesinlongitudinaltrialsisabigissueFirstaimshouldbetoreduceproportionEthicsdictatethatitcantbeavoidedThereisnomagicmethodtofixitMagnitudeofproblemvariesacrossareas8-weekdepressiontrial:25%50%maydropoutbyfinalvisit12-weekasthmatrial:maybeonly5%10%1DateName,department2Outli

2、neofthelecturePartI:MissingdataPartII:MultipleimputationExample:Theanalgesictrial34DateName,department5PartI:MissingdataInrealdatasets,like,e.g.,surveysandclinicaltrials,itisquitecommontohaveobservationswithmissingvaluesforoneormoreinputfeatures.Thefirstissueindealingwiththeproblemisdeterminingwheth

3、er the missing data mechanism has distorted the observed data.LittleandRubin(1987)andRubin(1987)distinguishbetweenbasicallythreemissingdatamechanisms.Dataaresaidtobemissingatrandom(MAR)ifthemechanismresultinginitsomissionisindependentofits(unobserved)value.Ifitsomissionisalsoindependentoftheobserved

4、values,thenthemissingnessprocessissaidtobemissingcompletelyatrandom(MCAR).Inanyothercasetheprocessismissingnotatrandom(MNAR),i.e.,themissingnessprocessdependsontheunobservedvalues.http:/www.emea.europa.eu/pdfs/human/ewp/177699EN.pdf1.Introductiontomissingdata?Variables Cases?=missing6Whatismissingda

5、ta?The missingness hides a real value that is useful for analysis purposes.Survey questions:1.What is your total annual income for FY 2008?2.Who are you voting for in the 2009 election for the European parlament?7Whatismissingdata?Clinical trials:StartFinishcensored at this point in timetime8Missing

6、nessIt matters why data are missing.Supposeyouaremodellingweight(Y)asafunctionofsex(X).Somerespondentswouldntdisclosetheirweight,soyouaremissingsomevaluesforY.Therearethreepossiblemechanismsforthenondisclosure:1.Theremaybenoparticularreasonwhysomerespondentstoldyoutheirweightsandothersdidnt.Thatis,t

7、heprobabilitythatYismissingmayhasnorelationshiptoXorY.Inthiscaseourdataismissing completely at random2.Onesexmaybelesslikelytodiscloseitsweight.Thatis,theprobabilitythatYismissingdependsonlyonthevalueofX.Suchdataaremissing at random3.Heavy(orlight)peoplemaybelesslikelytodisclosetheirweight.Thatis,th

8、eprobabilitythatYismissingdependsontheunobservedvalueofYitself.Suchdataarenotmissingatrandom9Missingdatapatterns&mechanisms Pattern:Which values are missing?Mechanism:Is missingness related to the response?(Yi,Ri)=Data matrix,with COMPLETE DATARij=1,Yij missing0,Yij observedRij=Missing data indicato

9、r matrix=Observed part of Y=Missing part of Y10Missingdatapatterns&mechanisms“Pattern”concerns the distribution of R“Mechanism”concerns the distribution of R given YRubin(Biometrika 1976)distinguishes between:Missing Completely at Random(MCAR)P(R|Y)=P(R)for all Y Missing at Random(MAR)P(R|Y)=P(R|)fo

10、r all Not Missing at Random(NMAR)P(R|Y)depends on11MissingAtRandom(MAR)Whatarethemostgeneralconditionsunderwhichavalidanalysiscanbedoneusingonlytheobserveddata,andnoinformationaboutthemissingnessvaluemechanism,Theanswertothisiswhen,given the observed data,the missingness mechanism does not depend on

11、 the unobserved data.Mathematically,ThisistermedMissing At Random,andisequivalenttosayingthatthebehaviouroftwounitswhoshareobserved valueshavethesamestatisticalbehaviourontheotherobservations,whetherobservedornot.12Asunits1and2havethesamevalueswherebothareobserved,giventheseobservedvalues,underMAR,v

12、ariables3,5and6fromunit2havethesamedistribution(NBnotthesamevalue!)asvariables3,5and6fromunit1.NotethatunderMARtheprobabilityofavaluebeingmissingwillgenerallydependonobservedvalues,soitdoesnotcorrespondtotheintuitivenotionofrandom.Theimportantideaisthatthemissingvaluemechanismcanbeexpressedsolelyint

13、ermsofobservations that are observed.Unfortunately,thiscanrarelybedefinitivelydeterminedfromthedataathand!Example13If data are MCAR or MAR,youcanignorethemissingdatamechanismandusemultipleimputationandmaximumlikelihood.If data are NMAR,youcantignorethemissingdatamechanism;twoapproachestoNMARdataares

14、election modelsandpattern mixture.14SupposeYisweightinpounds;ifsomeonehasaheavyweight,theymaybelessinclinedtoreportit.SothevalueofYaffectswhetherYismissing;thedataareNMAR.Twopossibleapproachesforsuchdataareselectionmodelsandpatternmixture.Selection models.Inaselectionmodel,yousimultaneouslymodelYand

15、theprobabilitythatYismissing.Unfortunately,anumberofpracticaldifficultiesareoftenencounteredinestimatingselectionmodels.Pattern mixture(Rubin1987).WhendataisNMAR,analternativetoselectionmodelsismultipleimputationwithpatternmixture.Inthisapproach,youperformmultipleimputationsunderavarietyofassumption

16、saboutthemissingdatamechanism.Inordinarymultipleimputation,youassumethatthosepeoplewhoreporttheirweightsaresimilartothosewhodont.Inapattern-mixturemodel,youmayassumethatpeoplewhodontreporttheirweightsareanaverageof20poundsheavier.Thisisofcourseanarbitraryassumption;theideaofpatternmixtureistotryoutavarietyofplausibleassumptionsandseehowmuchtheyaffectyourresults.Patternmixtureisamorenatural,flexible,andinterpretableapproach.15Simpleanalysisstrategies(1)Complete Case(CC)analysisAdvantages:Complete

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 教学/培训

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号