数学建模--高维统计分析

资源描述

《数学建模--高维统计分析》由会员分享，可在线阅读，更多相关《数学建模--高维统计分析（28页珍藏版）》请在金锄头文库上搜索。

1、数学建模数学建模高高维数据统计分析维数据统计分析来来鹏鹏高维数据统计分析高维数据统计分析面临的机遇和挑战面临的机遇和挑战统计降维模型统计降维模型高高维数据降维维数据降维变量选择变量选择 pn。当当数据中的变量数据中的变量个数个数p 随着样本量的增加，呈现指数次的随着样本量的增加，呈现指数次的增长增长，绝大多数，绝大多数的统计分析方法以及变量选择方法都将由的统计分析方法以及变量选择方法都将由于变量个数的巨大，使得在降维的过程中于变量个数的巨大，使得在降维的过程中面临面临计算量大，计算量大，效率低的问题效率低的问题；由于由于变量个数随样本急剧增加使得变量之间出现假象的变量

2、个数随样本急剧增加使得变量之间出现假象的高高度度相关，变量不可忽略，无法准确降维相关，变量不可忽略，无法准确降维；各种各种变量选择方法的应用条件受到质疑，变量选择方法的应用条件受到质疑，无法满足；无法满足；原本原本具有的具有的Oracle性质也变得无法保证。性质也变得无法保证。面临的机遇和挑战面临的机遇和挑战调查问卷分析调查问卷分析基因组分析基因组分析金融投资分析金融投资分析社交网络分析社交网络分析文字分类识别文字分类识别统计降维模型统计降维模型在处理高维数据时，许多参数和半参数模型被提在处理高维数据时，许多参数和半参数模型被提出用来避免“维数祸根”的出用来避免“维数祸

3、根”的问题。问题。例如：参数模型、可加模型、部分线性模型、单例如：参数模型、可加模型、部分线性模型、单指标模型、指标模型、部分部分线性单指标模型、变系数模型和线性单指标模型、变系数模型和变系数部分线性模型等变系数部分线性模型等模型。模型。传统降维方法传统降维方法通过专家进行降维选择通过专家进行降维选择统计检验与显著性统计检验与显著性利用利用AIC,BIC等最优准则进行子集选择等最优准则进行子集选择向前回归、向后回归、逐步自回归向前回归、向后回归、逐步自回归聚类分析聚类分析主成分分析主成分分析因子分析因子分析高维数据降维高维数据降维 SIR- Sliced Inverse

4、 Regression 1991， Ker-Chau Li, JASA. SIR：切片逆回归，借助主成分分析的思想，：切片逆回归，借助主成分分析的思想，通过对响应变量通过对响应变量Y的分析，构造的分析，构造X的降维。的降维。变量选择变量选择 p=n LASSO (Tibshirani, R.J., JRSSB, 1996) SCAD (Fan, J.Q., JASA, 2001) Adaptive LASSO (Zou, H., JASA, 2006) 超高维数据降维超高维数据降维 SIS (Fan, J.Q., 2008) SIRS (Zhu, L.P. et.al, JASA, 201

5、1) DC (Li, R.Z. et.al, JASA, 2012) Kolmogorov filter (Mai, Q. and Zou, H., Biometrika, 2013) Chi-squared based method (Huang, D.Y. et.al, 2015) SIR切片逆回归方法 46 A Family of Solutions: Penalization Denote L as the log-likelihood function MLE: argmaxL Penalized MLE: argmaxL-penalty Old and well-known pen

6、alties: AIC/BIC Can be easily extended to other M-estimates 47 Ridge Penalization 48 Ridge Penalization Note: for (almost) all M-estimation problems, ridge penalty can be added It has a long history for ill-posed regression problems (can be traced back to 1970s) The ridge penalty is smooth. Computat

7、ion can be carried out using gradient-based methods, for example Newton-Raphson 49 Why Ridge? Computationally easy Under the compactness assumption (of covariates and regression coefficients), if the tuning is small enough, the ridge estimate is consistent 50 Consider linear model with orthogonal co

8、variates 51 Why not ridge? All estimates are nonzero Remember that not all genes are cancer- associated Possible external selection 7 Lasso L1 penalty also has a history in statistics Credit of Lasso goes to R. Tibshirani (1996; Stanford Statistics) Tuning parameter can be selected via cross validat

9、ion L1 penalty is equivalent to a L1 constraint (we know there is a one-one correspondence; we cannot write down though) 8 Lasso Consider the simplest case: linear regression model; covariates have zero correlations. Then Lasso is equivalent to thresholding If orthogonal, 9 Lasso A direct consequenc

10、e: when penalty is moderate to large, some estimates will be exactly zero 10 Note the difference in contour Such a difference causes variable selection 11 Variable Selection Aspect Note that the variable selection mechanism here is different from some other approaches Embedded methods: achieve simul

11、taneous variable selection and model construction Extensions from linear regression to survival and classification are “almost trivial” 12 Parameter path: a useful way to understand penalized methods 13 Computing the Lasso I: LARS LARS的算法实际执行步骤如下：LARS的算法实际执行步骤如下： 1.对Predictors进行标准化（去除不同尺度的影响），对Targ

12、etVariable进行中心化（去除截距项的影响初始的所有系数都设为此时残差就等 g 影响），初始的所有系数都设为0，此时残差 r就等于中心化后的TargetVariable 2 找出和残差r相关度最高的变量X j2.找出和残差r相关度最高的变量X_j 3.将X_j的系数Beta_j 从0开始沿着LSE（只有一个变量X j的最小二乘估计）的方向变化，直到某个新变量X_j的最小二乘估计）的方向变化，直到某个新的变量X_k与残差r的相关性大于X_j时 4.X_j和X_k的系数Beta_j和Beta_k，一起沿着新的（加入了新变量的最小乘估计）的方向移LSE（加入了新变量X_k的最小二乘估计）的方向移动，直到有新的变量被选入 5 重复234直到所有变量被选入最后得5.重复2，3，4，直到所有变量被选入，最后得到的估计就是普通线性回归的OLS 14 LARS Credit goes to Stanford statistics! It is first designed for linear models. With nonlinear models, consider “transforming” into an iterative, weighted estimate

展开阅读全文