S2.4a Sampling and hypothesis tests

上传人:工**** 文档编号:571873937 上传时间:2024-08-12 格式:PPT 页数:58 大小:1.86MB
返回 下载 相关 举报
S2.4a Sampling and hypothesis tests_第1页
第1页 / 共58页
S2.4a Sampling and hypothesis tests_第2页
第2页 / 共58页
S2.4a Sampling and hypothesis tests_第3页
第3页 / 共58页
S2.4a Sampling and hypothesis tests_第4页
第4页 / 共58页
S2.4a Sampling and hypothesis tests_第5页
第5页 / 共58页
点击查看更多>>
资源描述

《S2.4a Sampling and hypothesis tests》由会员分享,可在线阅读,更多相关《S2.4a Sampling and hypothesis tests(58页珍藏版)》请在金锄头文库上搜索。

1、 Boardworks Ltd 20061 of 58These icons indicate that teachers notes or useful web addresses are available in the Notes Page.This icon indicates the slide contains activities created in Flash. These activities are not editable.For more detailed instructions, see the Getting Started presentation. Boar

2、dworks Ltd 20061 of 58A2-Level Maths: Statistics 2for OCRS2.4a Sampling and hypothesis tests Boardworks Ltd 20062 of 58Contents Boardworks Ltd 20062 of 58Introduction to sampling Introduction to sampling Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing

3、on binomial dataChocolate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regions Boardworks Ltd 20063 of 58The British government carries out a census of the entire population of the United Kingdom every 10 years (most recently in April 2001).The

4、 first census in the United Kingdom was carried out in 1086 with the construction of the Doomesday Book. However they have only been conducted on a regular basis since 1801.The census provides the government with a detailed picture of the population living in each part of the country (town, city or

5、countryside). The results are used to help plan public services (health, housing, transport and education) for the future.National census Boardworks Ltd 20064 of 58In statistics we often want to obtain information from a group of individuals or about a group of objects. Introduction to samplingA sam

6、pling frame is a list of all members of the population.A census is an investigation in which information is obtained from every member of the population.The population is the set of all individuals or objects that we wish to study. Boardworks Ltd 20065 of 58Introduction to samplingExamples:1.A head

7、teacher is interested in finding out how long her sixth form students spend in part-time employment each week.2.The population is the set of all sixth form students in her school. A possible sampling frame would be the registers of sixth form tutor groups.3.2.A newspaper is interested in obtaining t

8、he views of residents living close to the site of a proposed new airport.4.The population might be all adults living within a 10 mile radius of the site. A possible sampling frame could be the local electoral roll. Boardworks Ltd 20066 of 58Examples:3. A car company has discovered a fault that affec

9、ts one of their models of car. The company may wish to know how widespread the problem might be.The population would be all cars produced of this particular model.A possible sampling frame would be a list of all registered cars of this model provided by the DVLA.Introduction to sampling Boardworks L

10、td 20067 of 58Carrying out a census of the entire population is usually not feasible or sensible.Introduction to sampling money time resourcesIn addition, some investigations could result in the destruction of the entire population! For example, if a light bulb manufacturer wished to investigate the

11、 lifetime of its bulbs, a census would result in the destruction of all the bulbs it produced.A census is usually costly in terms of Boardworks Ltd 20068 of 58Instead of surveying the whole population, information can instead be obtained from a sample. The sampling process should be undertaken caref

12、ully to ensure that the sample is representative of the entire population. Bias can occur if one section of the population is over- or under-represented.Introduction to samplingQuestion: A local council wishes to know the views of local people on public transport. Criticize each of the following sam

13、pling regimes:1.Ask the people waiting at the town centre bus stop.2.Leave questionnaires in local libraries for people to fill in.3.Ask people at the shopping centre on a Thursday morning. Boardworks Ltd 20069 of 58One way to obtain a fair sample is to use random sampling. This method gives every m

14、ember of the population an equal chance of being chosen for the sample.A more formal definition of a random sample is as follows:There are a number of ways in which a random sample can be chosen. One commonly used technique is to use random number tables.Sampling methodsA sample of size n is called

15、a random sample if every possible selection of size n has the same probability of being chosen. Boardworks Ltd 200610 of 58The table below gives a list of random digits:793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905834 193 439 102 356 206 753 335 713 416 584 438 085 966 235

16、418 626 411835469 807 561 925 290 692 923 229 288 631 523 040 940 642 775 838 281 475Here is how to use random digits to obtain a sample:Random number tablesExample: A sample of size 15 is required from a population of size 300.One possible approach would be to obtain a sampling frame for the popula

17、tion and number every member from 001 to 300. You could then obtain chains of 3 random digits from tables. If the chain corresponds to a number between 001 and 300 you could select that member of the population; otherwise you could discard that chain and choose another. Boardworks Ltd 200611 of 58Ex

18、ample (continued): This method is wasteful of random digits since most chains of 3 digits will be discarded.A more efficient strategy would be to assign each member of the population to several chains of random digits:Random number tablesPopulation memberRandom digits1001 301 6012002 302 6023003 303

19、 603300300 600 900This approach leads to only chains of digits between 901 and 000 being discarded.793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905834 193 439 102 356 206 753 335 713 416 584 438 085 966 235 418 626 411835469 807 561 925 290 692 923 229 288 631 523 040 940 642

20、775 838 281 475 Boardworks Ltd 200612 of 58Example (continued): Suppose that we use the 2nd line of random digits in the above table, then the sample chosen would be:834 234193 193439 139102 102356 56206 206753 153 335 35713 113416 116584 284438 138085 85966 (cannot be used)235 235418 118Random numb

21、er tables793 259 976 452 401 234 393 053 225 197 549 628 444 212 885 355 169 905834 193 439 102 356 206 753 335 713 416 584 438 085 966 235 418 626 411835469 807 561 925 290 692 923 229 288 631 523 040 940 642 775 838 281 475 Boardworks Ltd 200613 of 58Introduction to sampling Sampling from a normal

22、 distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents13 of 58 Boardworks Ltd 2006Sampling from a normal distribution Boardworks Ltd 20061

23、4 of 58Sampling from a normal distribution Boardworks Ltd 200615 of 58Suppose that a sample of size n is taken from a N, 2 distribution and that the sample mean is .If the sampling process were to be repeated again, a different sample would be extracted and a slightly different value for the sample

24、mean would be obtained. The value of the sample mean is therefore subject to sampling variability.The sample mean therefore has a distribution, known as its sampling distribution.It is possible to show that, when a sample of size n is drawn from a normal distribution with mean and standard deviation

25、 , the sampling distribution of the sample mean is:Sampling from a normal distribution Boardworks Ltd 200616 of 58Example: If a sample of size 40 is taken from a N15, 24 distribution, then the sampling distribution of the sample mean is:Notice that the variance of is . This shows that thesampling va

26、riability can be decreased by taking larger samples (i.e., increasing the value of n). The standard deviation of is . This is usually referred to as the standard error of the sample mean.Sampling from a normal distribution Boardworks Ltd 200617 of 58Introduction to sampling Sampling from a normal di

27、stributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200617 of 58Calculating from samples Boardworks Ltd 200618 of 58Recall

28、the formula we met earlier for finding the variance of a set of data: variance = Sample standard deviationThese formulae are actually only normally used when we wish to calculate the variance or standard deviation using data from the entire population.The standard deviation is the square root of thi

29、s, and is sometimes called the root mean squared deviation (rmsd):rmsd =The variance is sometimes called the mean squared deviation (msd). Boardworks Ltd 200619 of 58When a large population is being studied, data will only be collected for a sample. The sample data is then used make inferences about

30、 the population. Sample data may be used to estimate the mean and variance of the whole population.Sample standard deviationBut the most accurate estimate of the population variance is provided by the following formula:This is referred to as the sample variance, with the square root being the sample

31、 standard deviation, s.It can be shown that the sample mean, , gives themost accurate estimate possible of the population mean. Boardworks Ltd 200620 of 58Example: A crisp manufacturer carries out regular monitoring of its packing machines by taking samples of 20 packets of crisps. The masses (x g)

32、obtained in one such sample were as follows:Find the mean and the standard deviation of the masses in this sample of crisp packets.Sample standard deviationNote: the question clearly mentions that the data is from a sample. We will therefore use the formula for the sample standard deviation. Boardwo

33、rks Ltd 200621 of 58The sample mean is given by:Sample standard deviationThe sample standard deviation (s) is found as follows: Boardworks Ltd 200622 of 58Introduction to sampling Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolat

34、e tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents22 of 58 Boardworks Ltd 2006Unbiased estimates Boardworks Ltd 200623 of 58A statistic is a quantity that is calculated from a sample of data. Examples include:Introduction to estimat

35、ionWe are particularly interested in finding estimates of the population mean and standard deviation.the quartiles;the highest value. the sample mean, ; the sample variance, Note that sample variance uses n 1 instead of just n. Boardworks Ltd 200624 of 58It can be shown that the sample mean provides

36、 an unbiased estimate of the population mean i.e. if the sampling process was carried out over and over again, the sample mean would on average produce the population mean. Likewise the sample standard deviation, s, is an unbiased estimate for the population standard deviation.Unbiased estimatesStat

37、isticParameterSample mean Population mean Sample variance S2 Population variance 2Note that the formula gives a biased estimate of the population variance. Estimator forEstimator for Boardworks Ltd 200625 of 58Example: An examiner takes a random sample of 12 of the students sitting a particular A-le

38、vel examination. Their percentage marks were:55%, 64%, 76%, 48%, 73%, 51%, 67%, 31%, 55%, 85%, 60%, 62%.Calculate unbiased estimates of the mean and the standard deviation of the marks for all students sitting the exam.Unbiased estimates Boardworks Ltd 200626 of 58Unbiased estimatesSo the sample sta

39、ndard deviation, s = = 14.2% (to 3sf)The sample standard deviation gives an unbiased estimate of the population standard deviation: Boardworks Ltd 200627 of 58Introduction to sampling Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChoc

40、olate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200627 of 58Hypothesis testing on binomial data Boardworks Ltd 200628 of 58Consider the following simple situation.You suspect that a die is biased towards the nu

41、mber six. In order to test this suspicion, you could perform an experiment in which the die is thrown 20 times. If the die were fair, you would expect about 3 sixes. If you obtained a lot more than 3 sixes then you might decide that there is evidence to support your suspicions. But how do you decide

42、 on what a suspicious number of sixes is?A simple introductory example Boardworks Ltd 200629 of 58Consider throwing a fair dice 20 times. The probability of obtaining different numbers of sixes is shown in the graph:A simple introductory example Boardworks Ltd 200630 of 58So, we noticed from the pre

43、vious slide that, with 20 throws of a fair die, the probability of getting 7 or more sixes is about 0.0371. This means that if a fair die were thrown 20 times over and over again, then you would obtain 7 or more sixes less than once in every 20 experiments.The figure of 1 in 20 (or 5%) is often take

44、n as a cut-off point results with probabilities below this level are sometimes regarded as being unlikely to have occurred by chance.However, in situations where more evidence is required, cut-off values of 1% or 0.1% are typically used.A simple introductory example Boardworks Ltd 200631 of 58In hyp

45、othesis testing we are essentially presented with two rival hypotheses. Examples might include:A formal introduction to hypothesis testsThese rival hypotheses are referred to as the null and the alternative hypotheses.“The coin is fair” or “the coin is biased”;“The proportion of local people in favo

46、ur of a by-pass is 80%” or “the proportion is smaller than 80%”;“The drug has the same effectiveness as an existing treatment” or “the drug is more effective”. Boardworks Ltd 200632 of 58The null hypothesis (H0) is often thought of as the cautious hypothesis it represents the usual state of affairs.

47、 The alternative hypothesis (H1) is usually the one that we suspect or hope to be true.Hypothesis testing is concerned with examining the data collected in experiments, and deciding how likely the result is to have occurred if the null hypothesis is true. The significance level of the test is the ch

48、osen cut-off value between the results that might plausibly have been obtained by chance if H0 is true, and the results that are unlikely to have occurred.A formal introduction to hypothesis tests Boardworks Ltd 200633 of 58Significance levels that are typically used are 10%, 5%, 1% and 0.1%.These s

49、ignificance levels correspond to different rigours of test the lower the significance level, the stronger the evidence the test will provide.A formal introduction to hypothesis testsNote: It is important to appreciate that it is not possible to prove that a hypothesis is definitely true in statistic

50、s. Hypothesis tests can only provide different degrees of evidence in support of a hypothesis. A 10% significance level can only provide weak evidence in support of a hypothesis. A 0.1% test is much more stringent and can provide very strong evidence. Boardworks Ltd 200634 of 58Introduction to sampl

51、ing Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200634 of 58Chocolate tasting practical Bo

52、ardworks Ltd 200635 of 58Do you think you can taste the difference between branded chocolate and supermarket own-label chocolate?You are going to perform an experiment to find out.There will be 2 pieces of chocolate to try: one will be a branded make of chocolate, the other will be a supermarkets ow

53、n-brand. Try to identify the branded make.Chocolate tasting practical Boardworks Ltd 200636 of 58Chocolate tasting practical Boardworks Ltd 200637 of 58Introduction to sampling Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate t

54、asting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200637 of 58One-sided hypothesis tests on binomial data Boardworks Ltd 200638 of 58Example: Mr Jones, a candidate in a local election, claims to have the support of 40%

55、of the electorate. A rival candidate, Miss Smith, believes that Mr Jones is exaggerating his level of support. She asks a random sample of 12 local people and discovers that 3 of them support Mr Jones. Carry out a test at the 5% significance level to see whether there is evidence that Mr Jones is ex

56、aggerating his level of support.One-sided hypothesis tests on binomial data Boardworks Ltd 200639 of 58Solution: We begin by writing down the 2 rival hypotheses. Let p represent the proportion of the electorate who support Mr Jones.H0: p = 0.4H1: p 0.7The new treatment is no more successful than the

57、 existing treatment.The new treatment is better than the standard treatment.Significance level = 1%Let X be the number of people successfully treated by the new drug.If the null hypothesis is true, then X B(20, 0.7).The observed data is x = 19. Using tables, P(X 19) = 0.0076 1%.We reject the null hy

58、pothesis at the 1% level there is quite strong evidence that the new treatment is more successful.Examination-style question Boardworks Ltd 200644 of 58Introduction to sampling Sampling from a normal distributionCalculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate t

59、asting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200644 of 58One-sided versus two-sided tests Boardworks Ltd 200645 of 58The examples considered so far can all be classified as one-sided tests we have been testing for

60、either an increase or a decrease in the value of the parameter, p.Sometimes we are not looking specifically for an increase (or decrease) in p, but instead we may want to examine whether the value of p has changed. In these situations we use a two-sided (or a two-tailed) test.A two-sided hypothesis

61、test carried out at the % significance level is in a sense two separate one-sided tests. The significance level is therefore shared between these two tests, % for each tail.One-sided versus two-sided tests Boardworks Ltd 200646 of 58Example: A restaurant has traditionally found that 60% of its custo

62、mers have been pleased or very pleased with the quality of the food served. A new chef is appointed and the restaurant management wish to find out whether this has changed the proportion of customers who are happy with their food.The management question 16 diners and discover that 14 of them are ple

63、ased or very pleased with their food.Test at the 5% significance level whether there has been a change in the proportion of contented customers.One-sided versus two-sided tests Boardworks Ltd 200647 of 58Solution: Let p represent the proportion of customers pleased or very pleased with the quality o

64、f the food served.The hypotheses can be stated as follows:Ho: p = 0.6 (i.e. no change)H1: p 0.6 (i.e. a change in the proportion).5% significance level (2.5% for each tail).Let X represent the number of customers that are pleased or very pleased with their food. Then under the null hypothesis, X B(1

65、6, 0.6).One-sided versus two-sided tests Boardworks Ltd 200648 of 58If H0 were true, we would expect 16 0.6 = 9.6 customers to be pleased with the food quality. The observed number, 14, is on the high side.We calculate P(X 14):One-sided versus two-sided testsSo P(X 14) = 0.0183 5%.Conclusion: We are

66、 unable to reject the null hypothesis. The data does not provide enough evidence to suggest that the proportion of candidates passing their driving test at the first attempt has altered.Examination-style question Boardworks Ltd 200652 of 58Introduction to sampling Sampling from a normal distribution

67、Calculating from samplesUnbiased estimatesHypothesis testing on binomial dataChocolate tasting practicalOne-sided hypothesis tests on binomial dataOne-sided versus two-sided testsCritical regionsContents Boardworks Ltd 200652 of 58Critical regions Boardworks Ltd 200653 of 58Example 1: Police records

68、 show that 25% of the vehicles using a stretch of road exceed the speed limit. A new speed camera is installed. The police wish to find out whether this has led to a reduction in the proportion of drivers speeding.The police sample 20 cars driving along the stretch of road.Critical regionsThe critic

69、al (or rejection) region for a hypothesis test is the range of values for which the null hypothesis could be rejected.a) Find the critical region for a test carried out at the 5% significance level.b) Comment on the implications of the test if the police find 2 speeding drivers. Boardworks Ltd 20065

70、4 of 58a)H0: p = 0.25where p = proportion of drivers who speed.b)H1: p 5%(so the critical region does not contain 2)P(X 1) = 0.0243 5%P(X 16) = 0.0468 5%P(X 24) = 1 P(X 23) = 1 0.9726 = 0.0274 5%Therefore part of the critical region for the upper tail is x 24.Combining these two parts, the critical region for the whole test is x 16 or x 24.Examination-style question

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 大杂烩/其它

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号