《【复旦大学】R统计软件使用方法》由会员分享,可在线阅读,更多相关《【复旦大学】R统计软件使用方法(51页珍藏版)》请在金锄头文库上搜索。
1、R: Statistics? Programme?and Who are You?- An ABC introduction to RPresented byGuohui DingR&D, SIBS, CASFor Fudan UniversityMain Topics Today What is R?How to administrate R?How does R work?How to apply R for statistical problem?How to program your R function?What is R?A brief history of RThe legend
2、 of RR started in the early 1990s as a project by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, intended to provide a statistical statistical environmentenvironment in their teaching lab. The lab had Macintosh computers, for which no suitable commercial environment was
3、available.Robert GentlemanRoss IhakaRs Parents(1)The S languageS: an interactive environment for data analysis developed at Bell Laboratories since 1976Exclusively licensed by AT&T/Lucent to Insightful Corporation, Seattle WA. Product name: “S-plus”.You can learn more from:http:/cm.bell- father is S
4、, mother is Scheme, but why my name is “R”?The Scheme languageScheme is a statically scoped and properly tail-recursive dialect of the Lisp programming language invented by Guy Lewis Steele Jr. and Gerald Jay Sussman. Learn more: http:/swiss.csail.mit.edu/projects/scheme/Schemes underlying semantics
5、 + Ssyntax = RRs Parents(2) “ We have named our language R in part to acknowledge the influence of S and in part to celebrate our own efforts.”- R. Ihaka R. Gentleman - Ihaka R. & Gentleman R., 1996R NowSince mid-1997 there has been a core group who can modify the R source code CVS archive.The R pac
6、kage system CRAN (the Comprehensive R Archive Network )http:/www.r-project.orgThe characters of RR is “GNU S” A language and environment for data manipula-tion, calculation and graphical display. That is R is a Free Software (or Open source software). (Here, Free refers to freedom, not price, althou
7、gh R is free in that sense as well.)The core of R is an interpreted computer language.A mosaic of procedure-based programming and object-oriented programming Good interface to procedures written in C, C+, FORTRAN and other languagesA flexible data exchange mechanism accessingrelational databases -OD
8、BC, PostgreSQL, MySQL and so on.小偷与强盗的谈判R and StatisticsMost packages deal with statistics and data analysis.Powerful statistical graphics.Well crosstalking with other statistical softwares.Most R user are statistical experts. You can learn more modern analysis method from they by email.You can do i
9、t when you come across a thing no body do it before.Install and administrate R Focus on Windows(MS)How do I get R?The informational web site http:/www.r-project.org/CRAN - the Comprehensive R Archive Network. The primary site is http:/cran.r-project.org/ .Mirror sites are available for many countrie
10、s.CRAN sites have binary distributions for Windows 95, 98, ME, NT4, 2000 and XP on Intel, for the Macintosh (System 8.6 to 9.1 and MacOS X), and for several Linux distributions.New releases occur frequently about every 3 months.Be prepared to re-install frequently.Also you can get it from your frien
11、ds, teachers, etc.Down it!It is about 20.6M in size.Using Precompiled Binary DistributionsInstalling RDouble click “rw1091.exe” using your mouse. That is OK. You can install it as all other standard MS softwares.R Console/RGui in Windows(MS)Command boxGraphics boxMenuIconsSeveral concepts in Adminis
12、trating R Workspacexxx.RDataHistoryxxx.RhistoryPackageObjectSession ConsoleRun your R codesLoad/save workspaceLoad/save HistoryChange your working directory- Ihaka R. & Gentleman R., 1996Add a new packageCommands:library()add a package in the librarydetach(package : xxx)detach a packageAll can do in
13、 the GUI (except detach()Load a local packageInstall packages frominternet or localUpdate the local package from internetPackages in R EnvironmentBasic packagespackage:methods package:stats package:graphics“ package:utils package:base Recommanded packagesgrid; lattice;e1071Contributed packages (more
14、 than 366 packages nowadays)You can see what packages loaded now by the command search().Dont lose your way!Three useful system commandgetwd()Get Working Directorysetwd() Set Working Directorylist.files()List the Files in a Directory/FolderShow the Demonstrations of the Packages/FunctionsCommandsdem
15、o()Demonstrations of R Functionalityexample()Run an Examples Section from the Online HelpGetting HelpsSeveral commandshelp.start()help() or ?()help.search()apropos()Internet searchingI like it verymuch. It seemsomnipotence.Quit RCommandq()Terminate an R SessionHow does R work?Basic R Structure and d
16、ata manipulationBasic R working flow(Object orientation)package- R for Beginners. Emmanuel ParadisObject orientationObject: a collection of atomic variables and/or other objects that belong togetherParlance:class: the “abstract” definition of itobject: a concrete instancemethod: other word for funct
17、ionslot: a component of an objectTypes of Data in RThe basic data object is a vector of elements of type:numeric numbers - either floating point or integercharacter each element is a character stringlogical each element is TRUE or FALSElist elements can be any type of object, including other listsCo
18、mponents of the S language, such as functions, are also vectors.Any vector can include the missing data marker NA as an element.All vectors have a length and a mode. The functions length and mode return this information as does the str function.A structure consists of a data object plus additional i
19、nformation. Matrices (or arrays, in general) and time series are examples of structures.OperatorsVectors, Matrices and ArraysCommand: array(data = NA, dim = length(data), dimnames = NULL) matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)ListsList vs. Vectorlist: an ordered collec
20、tion of data of arbitrary types. vector: an ordered collection of data of the same type.Typically, vector elements are accessed by their index (an integer), list elements by their name (a character string). But both types support both access methods.FactorsFactors: classification variablesIf the lev
21、els of a factor are numeric (e.g. the treatments are labelled“1”, “2”, and “3”) it is important to ensure that the data are ctually stored as a factor and not as numeric data. Always check this by using summary.Data framesdata frame: is supposed to represent the typical data table that researchers c
22、ome up with like a spreadsheet.It is a rectangular table with rows and columns; data within each column has the same type (e.g. number, text, logical), but different columns may have different types. ( A list actually)Subsetting Individual elements of a vector, matrix, array or data frame are access
23、ed with “ ” by specifying their index, or their nameUsing R on Windows(MS)Basic statistical analysis by RData InputFrom the keyboard one by onec( ); scan( )From the fileread.table(); read.csv(); read.csv2(); read.dta(); read.spss(); By a spreadsheetdata.entry()edit()fix()Data EditCommandsedit()fix()
24、Tips: edit() can invokean notepad in the RGui!Data DiscriptionCommandssummary()mean()sd()hist()boxplot()Probability DistributionThree useful prefix in Probability Distribution Functiondxxx for the densitypxxx for the CDFqxxx for the quantile functionrxxx for the simulation(random deviates)They are d
25、ifferent!The seed is set by the system. You can set seed yourselfby set.seed().Statistical InferenceCommandsqxxx () for the quantile functiont.test()wilcox.test(stats)kruskal.test(stats)var.test(); shapiro.test();qqnorm(); qqline()- Analysis of variance and Regression AnalysisCommandsanova()lm()Expe
26、riment DesignCommandssample()power.t.test()Save Object/DataEvery R object can be stored into and restored from a file with the commands “save” and “load”. save(x, file=“x.Rdata”) load(“x.Rdata”)Importing and exporting data with rectangular tables in the form of tab-delimited text files. write.table(
27、x, file=“x.txt”, sep=“t”)Graphics with RA Friendly R Environment - RcmdrIf you dont like a command line environment, package Rcmdr may be a good choice!R programming (.R)Program your R code ownControl Flow if(cond) expr if(cond) cons.expr else alt.expr for(var in seq) expr while(cond) expr repeat ex
28、pr break nextLoopsThe main loop construct in R is for. The commonest use, as in C and other languages, is to count from 1 to n.for (i in 1:n) # do somethingLeaving loopsThe break and next commands allow the flow of a loop to be alteredbreak jumps out the loopnext jumps to the next iteration of the l
29、oopAvoiding IterationThe canonical bad R program looks like this# multiply two vectorsfor(i in 1:n) di - ai * bi#compute the inner products - 0for (i in 1:n)s - s + diThe right way to do this iss-sum(a*b)apply(); lapply(); sapply()Write R functionA function definition looks likemedian - function(x,
30、na.rm = FALSE)lots of code.# a return valueMore PackagesObjects and methodsDebugging and optimisationConnecting to other packagesInterface to other programme language or DataBaseR+? +R!Some ResourcesA Course (The ppt is showed with R Development Core Group)http:/faculty.washington.edu/tlumley/Rcours
31、e/ A Paper (citing R in a publication)Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5: 299314.Two URLhttp:/www.r-project.orghttp:/www.ats.ucla.edu/stat/Several BooksUsing R for Data Analysis and GraphicsAn Introduction.
32、J.H. MaindonaldAn Introduction to R. The R Development Core TeamsimpleR Using R for Introductory Statistics. John VerzaniR for Beginners. Emmanuel ParadisThe R Reference Manual Base Package. The R Development Core TeamAcknowledgePhD. Qi Liu Prof. Naiqing ZhaoProf. Gang Pei Everyone HereProf. Yixue LiAny Question?