matlab数据的基本统计分析

资源描述

《matlab数据的基本统计分析》由会员分享，可在线阅读，更多相关《matlab数据的基本统计分析（7页珍藏版）》请在金锄头文库上搜索。

1、金融计算与编程(2007年1月) 上海财经大学金融学院曹志广 1第四讲数据的基本统计分析数据的基本统计分析数据的基本统计分析数据的基本统计分析 1数据的描述性统计分析通常在得到数据并对数据进行除错的预处理后，需要对数据进行描述性的统计分析。比如：对数据中变量的最小值、最大值、中位数、平均值、标准差、偏度、峰度以及正态性检验等进行分析。对于这些经常性遇到的重复过程，我们可以自己编写函数，将函数保存在 MATLAB 自动搜索文件夹下，然后就可以直接调用自己定义的函数了。对于上述描述性统计分析，我们可以在 MATLAB 命令窗口中输入：edit discription，然后在弹出的

2、窗口中选择 yes，就创建了一个文件名为 discription 的 M 文件。然后在弹出的空白文件中编写以下 M 函数： function D=discription(x) %descriptive statistic analysis %input: %x is a matrix, and each colummn stands for a variable %output: %D:structure variable,denotes Minimium,Maximium,Mean,Median, %Standard_deviation,Skewness,Kurtosis,and norm

3、al distribution test,respectively. %notes:when the number of oberservations of the colummn variables less than 30, %Lilliefors test is used for normal distribution test,and output D.LSTA denotes %test statistic and D.LCV denote critical value under 5% significant level; %otherwise, Jarque-Bera test

4、is used, and output D.JBSTA denotes test statistic %and D.JBCV denote critical value under 5% significant level.If test statistic is %less than critical value,the null hypothesis (normal distribution) can not %be rejected under 5% significant level. D.Minimium=min(x); D.Maximium=max(x); D.Mean=mean(

5、x); D.Median=median(x); D.Standard_deviation=std(x); D.Skewness=skewness(x); D.Kurtosis=kurtosis(x); if size(x,1)30 disp(small observations,turn to Lilliefors test for normal distribution) for i=1:size(x,2) h(i),p(i),Lilliefors(i),LCV(i)=lillietest(x(:,i),0.05); end 金融计算与编程(2007年1月) 上海财经大学金融学院曹志广

6、2D.LSTA=Lilliefors; D.LCV=LCV; else for i=1:size(x,2) h(i),p(i),Jarque_Bera(i),JBCV(i)=jbtest(x(:,i),0.05); end D.JBSTA=Jarque_Bera; D.JBCV=JBCV; end 注意在上面给出的函数例子中，我们使用了 discription 作为文件名，这与函数文件中第一行中的 discription 保持了一致。这样就可以以 D=discription(x) 形式调用该函数。如果使用不同于 discription 的文件名保存，比如： statistic，则调用该

7、函数时，必须以 D=statistic(x)形式调用。为避免调用时的麻烦，尽量使用相同的名称保存函数。在上面的函数 discription 中给出了正态分布检验的统计量与 5%显著水平下的临界值。当样本容量低于 30 时，使用 Lilliefors 检验；当样本容量超过 30 时使用 Jarque-Bera 检验。下面我们以上证综合指数为例来调用刚刚自定义的函数 discription。假定我们只关心以开盘价、最高价、最低价、收盘价表示的日收益率。在读入数据并对数据进行除错的预处理后（将数据按照日期升序进行重新排列），我们得到变量 b、c、d、e 分别表示 1990 年 12

8、月 19 日到 2006 年 9 月 27 日之间的开盘价、最高价、最低价、收盘价数据。然后在 MATLAB 命令窗口中输入： x=price2ret(b,c,d,e);%将价格转换为对数收益率 D=discription(x)%调用自定义函数 discription 得到以下结果： D = Minimium: -0.3170 -0.1565 -0.4498 -0.1791 Maximium: 0.7138 0.7607 0.7372 0.7192 Mean: 7.4406e-004 7.3581e-004 7.4450e-004 7.3574e-004 Median: 7.0916e-00

9、4 8.0367e-004 3.6515e-004 4.3624e-004 Standard_deviation: 0.0291 0.0253 0.0278 0.0265 Skewness: 4.5113 8.2876 4.2696 6.1913 Kurtosis: 111.7483 229.2601 162.1498 156.0935 JBSTA: 1.9186e+006 8.2927e+006 4.0928e+006 3.8010e+006 JBCV: 5.9915 5.9915 5.9915 5.9915 2样本分布函数与概率密度函数在对数据进行基本的描述性统计分析后，有时我们还需要对

10、变量的样本分布函数与样本概率密度函数进行分析。甚至有时候，基于研究的需要，我们还要根据样本的历史数据，来产生随机样本进行某些研究。下面以 1990 年 12 月 19 日到 2006 年 9 月 27 日之间的上证综合指数收盘价为例，给出如何利用 MATLAB 得到上证综合指数日对数收益率的经验分布函数以及样本的概率密度函数，还有如何根据历史收益率的经验分布来生成随机数。（1）样本分布函数假定我们在 MATLAB 中已经读入了 2000 年 1 月 1 日到 2006 年 6 月 1 日之间的上证综合指数的日期和收盘价数据，在经过数据的预处理后，得到列向量 a 和金融计算与编

11、程(2007年1月) 上海财经大学金融学院曹志广 3e，分别表示时期和收盘价。在 MATLAB 命令窗口下输入： log_ret=price2ret(e); h=figure; set(h,color,w) plot(a(2:end),log_ret) datetick(x,23) xlabel(date) ylabel(return) title(daily return of Shanghai Composite) 图形输出结果如图所示。上证综合指数日对数收益率为了得到样本的分布函数，我们可以编写以下 M 函数，并以 empirical_dist 的文件名保存在 MATLAB 自动

12、搜索的文件夹下。 function x,cumpr=empirical_dist(data) % generate empirical distribution function % input: % data is a vector % output: % x is sample observation vector % cumpr is cumulative probability vector if min(size(data)=1 error(data must be a vector) end 金融计算与编程(2007年1月) 上海财经大学金融学院曹志广 4n=length(da

13、ta); data=reshape(data,n,1); data=sort(data); x,a,b=unique(data); frequency=a(1);diff(a); cumpr=cumsum(frequency)/n; 然后在 MATLAB 命令窗口下输入： x,cumpr=empirical_dist(log_ret); h=figure; set(h,color,w) plot(x,cumpr) ylabel(cumulative probability) title(empirical distribution of daily returns on Shanghai Co

14、mposite) 图形输出结果如图所示。上证综合指数日对数收益率的经验分布（2）样本概率密度函数为了得到样本的概率密度函数，我们可以编写以下 M 函数，并以 empirical_density 的文件名保存在 MATLAB 自动搜索的文件夹下。 function x,density=empirical_density(data,m) %generate relative frequency and probability density %input: %data is a vector %m is number of intervals 金融计算与编程(2007年1月) 上海财经大学金

15、融学院曹志广 5% output: % x is a vector points of intervals % density is probability density if min(size(data)=1 error(data must be a vector) end n=length(data); data=reshape(data,n,1); zeta=min(abs(data)/10; min1=min(data)-zeta;%locate low ending point max1=max(data)+zeta;%locate high ending point x=linspace(min1,max1,m+1);%generate intervals density=hist(data,x)./(n*(x(2)-x(1); 在上面的程序中，区间数目的由m确定。利用前面得到的上证综合指数的日对数收益率 log_ret，在 MATLAB 命令窗口下输入：

展开阅读全文