市场研究中的统计技术培训资料

上传人:工**** 文档编号:591677846 上传时间:2024-09-18 格式:PPT 页数:67 大小:197KB
返回 下载 相关 举报
市场研究中的统计技术培训资料_第1页
第1页 / 共67页
市场研究中的统计技术培训资料_第2页
第2页 / 共67页
市场研究中的统计技术培训资料_第3页
第3页 / 共67页
市场研究中的统计技术培训资料_第4页
第4页 / 共67页
市场研究中的统计技术培训资料_第5页
第5页 / 共67页
点击查看更多>>
资源描述

《市场研究中的统计技术培训资料》由会员分享,可在线阅读,更多相关《市场研究中的统计技术培训资料(67页珍藏版)》请在金锄头文库上搜索。

1、Copyright 2008 CIIC &COMR- 1 -Statistics is a diverse body of theory and application which ranges from simple averages to complex statistical modeling andmultivariate analysis. It plays a key role in Marketing Research from project design through analysis and interpretation of data.BackgroundCopyrig

2、ht 2008 CIIC &COMR- 2 -BackgroundThough frequently seen as an esoteric and intimidating discipline, the basic concepts in statistics require no more than high school algebra to master. And, like many other skills, hands-on experience is the best teacher (!).Our focus in this seminar will be on funda

3、mental concepts and methods which have the widest application in marketing research. We will avoid complex theory and aim to provide the background necessary for you to begin applying these concepts on the job.Copyright 2008 CIIC &COMR- 3 -To provide a practical working knowledge of fundamental stat

4、istical concepts and methods in order to:1) help you to analyze and interpret quantitative Marketing Research data; and,2) to broaden client service skills. Objectives:Copyright 2008 CIIC &COMR- 4 -I. Basic DefinitionsII. SamplingIII.Types of DataIV.Summarizing DataV. Inferential StatisticsSyllabus:

5、Copyright 2008 CIIC &COMR- 5 - VariableA quantity which is free to vary, e.g., purchase interestrating Univariate AnalysisThe investigation of one variable at a time, e.g., mean purchase interest rating in a product test Bivariate AnalysisThe investigation of the relationship between twovariables, e

6、.g., the correlation between an attribute rating and purchase interestI. Basic DefinitionsCopyright 2008 CIIC &COMR- 6 - Multivariate AnalysisThe investigation of the interrelationships among several variables, e.g., the joint relationship between 15 attribute ratings and purchase interest Populatio

7、n (universe)All objects (e.g., consumers) in the group of interestex:- All male beer drinkers in their 30s living in Japan- All Japanese housewives aged 25-49 who have purchased canned condensed soup in the past 3 monthsCopyright 2008 CIIC &COMR- 7 - SampleSelected subset of populationex:- 200 male

8、beer drinkers in their 30s- 500 housewives aged 25-49 who have purchasedcanned condensed soup in the past 3 months Census and Sample SurveyA census is the gathering of information about allmembers of a population (e.g., survey of all ACNielsenemployees). A sample survey is the gathering ofinformatio

9、n about a selected subset of the population (e.g., random sample of 400 ACNielsen employees). Copyright 2008 CIIC &COMR- 8 - Sampling ErrorThe deviation of a figure obtained from a samplefrom the true (i.e., population) value Inferential StatisticsUsed to generalize or make inferences about a popula

10、tionfrom a sample.ex:A market research survey of 500 housewives aged 25-49 finds that they are more frequent buyers of dry soup than of condensed soup; how likely is this to be true forall housewives aged 25-49 in Japan?Copyright 2008 CIIC &COMR- 9 - Parameters and StatisticsParameters are numbers u

11、sed to describe a populationNumbers used to describe a sample are called statistics Objects/CasesIn Marketing Research, these usually refer to respondents,sometimes brands. Raw Data and Aggregate DataRaw data are case or object level data, e.g., data for eachrespondent. Aggregate data are data which

12、 have been grouped in some way, and often consist of percentagesand means.Copyright 2008 CIIC &COMR- 10 - Independent and Dependent VariablesAn independent variable may sometimes be viewed as a cause and a dependent variable as an effect. Moregenerally, independent variables (e.g., age, income) areu

13、sed to better understand or predict dependent variables(e.g., purchase likelihood).ex:Purchase interest in a new product concept differs byrespondent age; in this example, age is the independentvariable and purchase interest the dependent variablesince age could effect purchase interest but not the

14、otherway around.Copyright 2008 CIIC &COMR- 11 - Statistical NotationSome of the notational conventions used in statistics are:N:number of objects/cases (e.g., respondents) in thepopulation n:number of objects in the sample (i.e., sample size) Pi:the Greek letter pie; percentage or proportion corresp

15、onding to group i :the Greek letter mu; the population arithmetic meanX:the sample arithmetic mean (X bar):the Greek letter sigma; population standard deviation s:sample standard deviation:the Greek capital letter sigma; the sum of a seriesof numbers*computer symbol for multiplication: a*b = a x bCo

16、pyright 2008 CIIC &COMR- 12 -III.Types of DataThere are two main types of data which can be further subdivided into two categories each. Different types of statistical procedures are appropriate for each type of data.Non-metric- nominal- ordinal Metric- interval- ratioA basic understanding of the co

17、ncepts which follow is importantfor good questionnaire design.Copyright 2008 CIIC &COMR- 13 - NominalThe lowest form of data in terms the information it provides.Examples are male/female, Tokyo/Osaka, user/non-user.No ranking or order of data is presumed (e.g., frequency of use).Usually expressed in

18、 percentages or frequencies. OrdinalAn ordered category. Ordinal data indicates whether an object has more or less of a characteristic than another object, but not how muchmore. Examples are age groups and heavy/medium/light usage of a product category.Medians, ranks and percentiles can be computed

19、on ordinal data. Copyright 2008 CIIC &COMR- 14 - IntervalData are measured in constant units. An example is a numeric rating scale, where 5-4 = 4-3 = 3-2 = 2-1. The unit of measurementis 1.There is no true zero, however; Fahrenheit and Celsius temperaturescales are interval. One cannot say that 50 i

20、s twice as hot as 25because the zero on either scale is arbitrary.ex:30 Celsius = 86 Fahrenheit15 Celsius = 59 Fahrenheit, not 43Copyright 2008 CIIC &COMR- 15 -In actuality, most rating scales used in Marketing Research lie somewhere between ordinal and interval. For example, can we really say that

21、the difference between Very much want to buy and Want to buy” is the same as the difference between Want to buy and Cant say either way?If the data are judged reasonably close to being interval, it is acceptable to compute means and to treat them as interval for analysis. Statistical procedures desi

22、gned for ordinal data (non-parametric) and those designed for interval data (parametric) frequently yield similar results.Copyright 2008 CIIC &COMR- 16 - RatioThis is the highest form of data. Possesses all properties of intervaldata and has a true zero.A Kelvin scale is ratio; so are age and income

23、 when not categorized.Copyright 2008 CIIC &COMR- 17 - Some Guidelines1)Many significance tests, such as the t-test, assume that the dataare interval or ratio. While it is not uncommon in practice to employ these methods when the data are non-metric, strictly speaking, non-parametric tests such as th

24、e Kruskal-Wallis test are more appropriate.Copyright 2008 CIIC &COMR- 18 -2)Weights are often assigned to frequency of usage/purchase data in order to compute means, as in the example below:Frequency consume coffee Weighttwice a day or more (3.0)once a day (1.0)3-5 times a week (0.3)1-2 times a week

25、 (0.2)less than once a week (0.1)Weights such as these are often quite arbitrary and the resulting means only rough approximations. In such cases, it may be preferable to treat the data as ordinal rather than interval.Copyright 2008 CIIC &COMR- 19 -IV. Summarizing Data Frequency Distribution One of

26、the most useful means of summarizing data.Can be represented in tabular form, e.g.,Respondent Age n %Teens 75 1520s100 2030s125 2540s100 2050s 75 1560s 25 5Total500100or in graphic form.Copyright 2008 CIIC &COMR- 20 -HistogramCopyright 2008 CIIC &COMR- 21 - Shape of Frequency Distribution Normal Dis

27、tributionThis plays a key role in many statistics. Many parametric inferential statistics assume the population isat least approximately normally distributed.Severe departures from normality can invalidate descriptive statisticssuch as means and standard deviations.Copyright 2008 CIIC &COMR- 22 -Exa

28、mple of a Normal Distribution 1Copyright 2008 CIIC &COMR- 23 -Example of a Normal Distribution 2Copyright 2008 CIIC &COMR- 24 -Example of a Normal Distribution 3Copyright 2008 CIIC &COMR- 25 - Departures from NormalitySkewnessWhen a distribution is asymmetrical, it is skewed.If the distribution lean

29、s to the left and the longer tail points to the right, it is positively skewed. On the other hand, if it leans to theright and the longer tail points to the left, it is negatively skewed.Copyright 2008 CIIC &COMR- 26 -Positively Skewed DistributionCopyright 2008 CIIC &COMR- 27 -Negatively Skewed Dis

30、tributionCopyright 2008 CIIC &COMR- 28 - KurtosisWhen the tails are unusually fat or unusually thin, the distributionis said to be kurtotic.Copyright 2008 CIIC &COMR- 29 -Example of Platykurtic DistributionCopyright 2008 CIIC &COMR- 30 -Example of Leptokurtic Distribution Copyright 2008 CIIC &COMR-

31、31 - Measures of Central Location (Center of Data)Averages3 kinds of averages are typically used in Marketing Research:- arithmetic mean- median- modeIf a distribution is symmetrical, the mean, median and mode areall the same.Copyright 2008 CIIC &COMR- 32 -Example of Symetrical Distributionmean, med

32、ian, modeCopyright 2008 CIIC &COMR- 33 - Example of AsymetricalPositively Skewed Distributionmode median meanCopyright 2008 CIIC &COMR- 34 - Arithmetic MeanThe most commonly-used average for metric data; it is calculated as follows: X = X nex:The mean of 5, 2, 1, 3 is 5 + 2 + 1 + 3/4 = 2.75Copyright

33、 2008 CIIC &COMR- 35 -Means of grouped data may also be estimated by using the following formula:X*W/W,Where X is the interval midpoint and w (weight) is the frequency or (percent).Copyright 2008 CIIC &COMR- 36 -ex:Respondent Age x n %Teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s 54.5 75

34、 1560s64.5 25 5Total500100 Copyright 2008 CIIC &COMR- 37 -X = (75*14.5) + (100*24.5) + (125*34.5) + (100*44.5) + (75*54.5) + (25*64.5)/500 = 36years Or, if percentages are used as the weights: X = (15*14.5) + (20*24.5) + (25*34.5) + (20*44.5) + (15*54.5) + (5*64.5)/100 = 36 yearsCopyright 2008 CIIC

35、&COMR- 38 -2 drawbacks of the arithmetic mean, however, are:It is sensitive to extreme values (outliers), especially when the number of data points is small. In the earlier hypothetical series of numbers (5, 2, 1, 3), 5 appears to be an outlier. If 3 is substituted for 5, the mean of these numbers d

36、ecreases from 2.75 to 2.25:3 + 2 + 1 + 3/4 = 2.25A second disadvantage is more general; the mean may be misleading when the data distribution is non-normal.Copyright 2008 CIIC &COMR- 39 -Example of Bi-Modal Distributionmode mean mode medianCopyright 2008 CIIC &COMR- 40 -Example of Uniform Distributi

37、onmean, median, mode identicalCopyright 2008 CIIC &COMR- 41 -Example of U-Shaped Distributionmode mean mode medianCopyright 2008 CIIC &COMR- 42 - MedianThe middle value of ordered data.Appropriate for ordinal, interval, and ratio data.Computational Procedure:First, rank the data from smallest value

38、to largest value.Then, find the position of the middle value with the followingformula: X = n + 1 2where n is the number of data points.Copyright 2008 CIIC &COMR- 43 -ex:There are 11 data points (numbers) in the following ranked data set:6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19The median is the sixth

39、 largest (or smallest) number (11 + 1/2 = 6) :6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19Note that the above data set has an odd number of values (n=11). When there is an even number of data points, there will be two middle values. In these instances, the median is the arithmetic meanof the two middle v

40、alues.Copyright 2008 CIIC &COMR- 44 -ex:Consider the median of the following data set with 12 data points:6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30.The median lies between the 6th and 7th largest (or smallest) value(12 + 1/2 = 6.5):6, 9, 9, 10, 11, 11, 12, 12, 13, 16, 19, 30 or 11.5Notice that the

41、 addition of an outlier (30) had little impact on themedian. By contrast, the mean of the first data set is 11.64 and themean of the second data set is 13.2. This is a major reason for using the median rather than the arithmetic mean.Copyright 2008 CIIC &COMR- 45 -However, the median uses less infor

42、mation about the data than does the mean and is often less in formative.Statistical procedures (non-parametric methods) developed for the median are generally less flexible and informative than those which analyze means.Medians can also be calculated from grouped data.Copyright 2008 CIIC &COMR- 46 -

43、ex:Respondent Age x n(w) %Teens14.5 75 1520s24.5100 2030s34.5125 2540s44.5100 2050s54.5 75 1560s64.5 25 5Total500100There is an even number of interval midpoints (6), thus the medianmust lie somewhere between the 3rd and 4th values:n + 1/2 = 3.5; 34.5 and 44.5 are the two middle categories.Copyright

44、 2008 CIIC &COMR- 47 -The (weighted) mean of 34.5 and 44.5 is(125*34.5) + (100*44.5)/225or 38.9.Copyright 2008 CIIC &COMR- 48 -ModeThe mode is closest in meaning to the laymanss term, average - that is typical. It is very commonly used in Marketing Research but rarely referred to by name.It is simpl

45、y the most frequent value.ex:Brand X (33%) leads in terms of P3M purchase, followed by brand Y (21%) and Brand Z (17%). The mode of these data is 33%.ex:The mode of the following data set,6, 9, 10, 11, 15, 16, 16, 16, 20is 16.Copyright 2008 CIIC &COMR- 49 -Modes can also be obtained from grouped dat

46、a.ex:Respondent Age x n(w) % Teens14.5 75 1520s24.5 100 2030s34.5 125 2540s44.5 100 2050s54.5 75 1560s64.5 25 5Total 500 100Here, the modal age group is 30-39 and the modal age is the midpoint 34.5 of this age range.Copyright 2008 CIIC &COMR- 50 -The major disadvantages of the mode are:-It does not

47、lend itself well to inferential statistical methods.-Sometimes, there is no distinct mode. Consider the following data:BrandP1M Purchase A 29% B 28% C 27%There is no meaningful difference in P1M purchase among the three brands.Copyright 2008 CIIC &COMR- 51 -Measures of Dispersion (Spread of Data)The

48、 most commonly-used measures of dispersion are:- Variance- Standard Deviation- Range- Percentiles, Quartiles, Quintiles, Terciles (Ntiles)- Inter-quartile range Copyright 2008 CIIC &COMR- 52 -Variance and Standard DeviationTwo of the most widely-used statistics and play a role in most parametric sta

49、tistical procedures. They are related to one another in the following way: the standard deviation is the square root of the variance and the variance the square of the standard deviation. Copyright 2008 CIIC &COMR- 53 -Formula:For Sample For PopulationVariance:S = (X - X) = (X - m) n-1 NStandardDevi

50、ation:S = (X - X) = (X - m) n-1 N S = S, = Note that the term n-1 for the sample statistics is known as the degrees of freedom.Copyright 2008 CIIC &COMR- 54 -ex:The standard deviation of the hypothetical sample data below is computed as follows:9, 8, 5, 11, 7, 5Compute mean:X = 9 + 8 + 5 + 11 + 7 +

51、5 = 7.5 6Copyright 2008 CIIC &COMR- 55 -Calculate squared deviations from the mean:X XX-X (X-X)97.5 1.5 2.2587.5 0.5 0.2557.5-2.5 6.25 117.5 3.5 12.2577.5 0.5 0.2557.5-2.5 6.25 45 0 27.50Copyright 2008 CIIC &COMR- 56 - Substitute (X - X) and n into formula:S = 27.5 = 2.3 6-1 S = 2.3 = 5.29The term (

52、X - X) is known as the sum of squares.Copyright 2008 CIIC &COMR- 57 - Some Uses of Standard Deviation and VarianceAverages tell us about the center of a distribution, but nothingabout its spread. In a peaked distribution, most observations willfall close to the average. In a flat distribution, on th

53、e other hand, the average may have little meaning.If the distribution of the data is approximately normal, about 68%of the observations lie within +1 standard deviation of the mean and about 95% within + 1.96 standard deviations of the mean. Copyright 2008 CIIC &COMR- 58 - Z scores (“standard scores

54、”) can be computed so that different types of scales can be compared. For example, ratings collected from a 5-point scale and those collected from a 7-point scale can be analyzed by expressing each respondents ratings in terms of standard deviation units from the mean. This is typically done in fact

55、or and cluster analysis, for example.Formula for z Scores:z score = X - X SCopyright 2008 CIIC &COMR- 59 -Areas Under Normal Curve mean-1.65 SD 90% +1.65 SD-1.96 SD 95% +1.96 SD-2.58 SD 99% +2.58 SDCopyright 2008 CIIC &COMR- 60 - Other Measures of DispersionRangeThe largest value minus the smallest

56、value, e.g., the range of a 5 pt. Purchase interest scale is 4. The range only has meaning with metric data.Copyright 2008 CIIC &COMR- 61 - Ntiles: Percentiles (100ths), Quintiles (5ths), Quartiles (4ths), Terciles (3rds).Data are ranked according to magnitude and partitioned into equal sized rages

57、(e.g., 4ths). These are appropriate if the data are at least ordinal.ex:A commercial ranking higher in terms of overall liking than 55% of the products in the BASES II norm bank and lower than 45% in the norm bank is in the 55th percentile, the third quintile, the second quartile and the second terc

58、ile.Copyright 2008 CIIC &COMR- 62 - Inter-quartile RangeThe percentage of observations falling between the 25th percentile and the 75th percentile; I.e., the middle 50% of the distribution.Copyright 2008 CIIC &COMR- 63 - Lies, Damned Lies, and StatisticsThe research and development department of a p

59、aint manufacturertests 5 cans of three new kinds of paint to see which can cover thelargest area (square meters) per can and obtained the following results.Test Product A:505, 516, 478, 513, 503Test Product B:512, 486, 511, 486, 510Test Product C:496, 485, 490, 520, 484A wins! Its mean is the highes

60、t-503.B wins! Its median is the highest-510.C wins! Its sample midrange (the mean of the smallest and largest values) is the highest-502.Copyright 2008 CIIC &COMR- 64 -4.1 What is the mean, median, and mode of the following data set? 33, 35, 32, 32, 35, 30, 33, 344.2 If a former U.S. president, addr

61、essing a luncheon meeting of the Executives Club, spoke for 42 minutes at an average rate of 6,780 words per hour and received a fee of $14,000, what was the average fee per word of his speech?4.3 An elevator has a rated maximum capacity of 1,818 kg. Is it overloaded if at one time it carries 12 wom

62、en whose mean weight is 52 kg and 15 men whose mean weight is 82 kg?EXERCISESCopyright 2008 CIIC &COMR- 65 -4.4150 housewives are interviewed and purchase interest in a new product concept is obtained. What is the mean purchase interest rating? Very much would like to buy (5) 7% Would like to buy (4

63、) 23Cant say either way (3) 51Would not like to buy (2) 13Would not like to buy at all (1) 64.5What is the range and standard deviation of the following data set?16.2, 15.9, 15.8, 16.1 Copyright 2008 CIIC &COMR- 66 -4.6Find the mean, variance, and standard deviation of the following distribution of

64、ages of members of a market research company.Age (years) Frequency20-241125-292430-343035-391840-441145-49 550-54 1Copyright 2008 CIIC &COMR- 67 -ANSWERS TO EXERCISES4.1Mean: 33; Median: 33; Modes: 32, 33 and 35.4.2$2.954.3Yes4.43.124.5The range is 0.4 and standard deviation 0.18 if the data are froma sample.4.6Mean: 32.65; S: 47.75; S: 6.91The End

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 大杂烩/其它

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号