Key

ALCOHOL.REC

(A) The Univariate Description

(B) Alcohol Consumption by Income Group

The research focus is to determine whether alcohol scores differ by income.

1. Summary statistics: alcohol score by income level; alcohol consumption scores scaling is documented in the assignment

Alcohol Consumption Score 

Income Level

1 

(n = 46)

2 

(n = 88)

3 

(n = 140)

4 

(n = 250)

5 

(n = 189)

Mean  2.8 3.9 4.5 3.5 3.6
SD 3.1 4.2 4.2 3.5 2.9

Mean alcohol scores vary from 2.8 to 4.5, with income level 3 having the highest mean score and income level 1 having the lowest mean score. Standard deviations range from 0.22 to 0.46.

2. Side-by-side quartile plot

Not shown on Web. Graph shows medians values of 3 in all five groups. Interquartile ranges vary considerably.

3. Descriptive comparison of group variances

The descriptive statistics (SDs) and side-by-side quartile plot seem to suggest that groups have unequal variances ("heteroscedasticity").

4. Bartlett's Test

Comments: This test confims the exploratory results.

5. Mean � se estimates (graph not shown on Web)

Alcohol Consumption Score 

Income Level

1 

(n = 46)

2 

(n = 88)

3 

(n = 140)

4 

(n = 250)

5 

(n = 189)

Mean  2.8 3.9 4.5 3.5 3.6
se 0.46 0.44 0.35 0.22 0.21

Comment: Standard error calculations do not assume homoscedasticity.

After drawing and viewing the graph, an interesting pattern emerges . . . don't you think?

6. Test of Means or Medians

Since data are assumed to be heteroscedastic, the Kruskal-Wallis test is used.

Comment: Perhaps we should do a power analysis to determine the power of the analysis (?).

7. Summary

Some points to consider:

(C) Alcohol Consumption Score by Age Group

The research focus is to determine whether alcohol scores differ by age group.

(1) Summary Statistics, Alcohol Scores by Age Group
 

Age Group (Years)

20 to 29 
(n = 234)
30 to 42 
(n = 231)
43+ 
(n = 248)
Mean 4.8 3.6 2.8
Standard Dev. 3.9 3.3 3.2

Comment:  Note trend.


2. Side-by-side quartile plot

Not shown on Web. Trend in medians noted. Interquartile ranges: some variabililty, difficult to evaluate.

3. Descriptive comparison of group variances

Hard to evaluate -- some discrepancy in SDs and inter-quartile ranges (as seen in side-by-side quartile plots), but these are modest.

4. Bartlett's Test

5. Mean � Standard Error Estimates

Age Group (Years)
20 to 29
(n = 234)
30 to 42 
(n = 231)
43+ 
(n = 248)
Mean 4.83 3.55 2.80
Standard Error sqrt (15.072 / 234) = 0.25 sqrt (11.179 / 231) = 0.22 sqrt (10.556 / 248) = 0.21

6. Test of Means or Medians

Let us use the Kruskal-Wallis test so as to avoid a violation of assumptions (in particular, a violation of the equal variance assumption).

Therefore, significant differences are noted all around.

7. Summary

Both the variability and expected values (means) of alcohol scores differ significantly by age group, with average alcohol consumption inversely associated with age, and greater alcohol consumption variability associated with younger age.

DEERMICE

The research focus is to determine whether weight gain differs by diet.

1. Summary Statistics: Weight Gain by Diet

Weight Gain  
(grams)
Diet A  
(Standard Diet) 
n = 5
Diet B 
(Junk Food) 
n = 5
Diet C  
(Health Food) 
n = 5
Mean 11.14 13.44 9.14
Standard Deviation 1.27 0.62 0.58

2. Side-by-side quartile plot

Not shown on Web.

3. Descriptive comparison of group variances

Hard to evaluate -- seem to differ(?).

4. Test for Inequality of Population Variances

5. Mean � standard error estimates

Weight Gain  
(grams)
Diet A  
(Standard Diet) 
n = 5
Diet B 
(Junk Food) 
n = 5
Diet C  
(Health Food) 
n = 5
Mean Estimate 11.14 13.44 9.14
Standard Error Estimates sqrt (0.780 / 5) = 0.39 0.39 0.39

6. Test of Means

Comment: Independent t tests were used above. This is equivalent to ANOVA tests with k = 2.

7. Summary

Average weight gain differs significantly by diet type, with junk food associated with the greatest gain (mean = 13.44 gms; sd = 0.62 mgs) and health food associated with the least gain (mean = 9.14, sd = 0.58).

11.3 ROOSTER

The research problem is to determine whether testosterone levels differ by rooster strain.

a.     Create ROOSTER.REC

b.     Summary statistics. Testerone Levels by Rooster Strain

 
Testosterone (µg/dl) Strain A  
n = 6
Strain B 
n = 6
Strain C  
n = 6
Mean ± SD 43.27 ± 274.0 112.8 ± 10.5 102.0 ± 7.4
(minimum, maximum) (134, 897) (98, 126) (89, 110)

 

c.     Test for Inequality of Variances

H0: sigma-squared1 = sigma-squared2 = sigma-squared3 vs. H1: at least one population variance differs
Let alpha = .05
Bartlett's Chi-square(2, N = 18) = 47.99, p < .0001.
Conclusion: reject H0.
The conclusion to the hypothesis tests, combined with the widely varying sample standard deviations (Table, above), suggest that the groups are heteroscedastic. (This is interesting. I wonder what there's more variability in Strain A than in the other Strains.) We will therefore proceed under the assumption of unequal variance.
 

d.     Test for Inequality of Means

The nonparametric K-W test will be performed. (See comments about heteroscedasticity, above.)
Let alpha = .05
K-W Chi-square(2, N = 18) = 12.55, p = .0019.
Conclusion: rejected the null hypothesis of equal means and proceed on with pairwise comparisons at alphaBonf = .018 (so as to maintain an "experiment-wise" alpha of .05; see Reader pp. 11.7 - 11.9. The Kruskal-Wallis procedure will be used because of the assumed heteroscedasticity. Results are as follows:

Conclusion: Strain A differs from Strain B and Strain C, but there is no significant difference between Strain B and Strain C.
 

e. Power Analysis:

Assumptions: alpha = .05, k = 2, df(between) = 1, df(within) = 18, s2 = 100.

i.     For a minimal detectable difference (MDD) of 10, phi = 1.58 and power = .58
ii.    For a MDD of 15, phi = 2.37 and power = .89
iii.   For a MDD of 20, phi = 3.16 and power > .98
iv.   What size samples are needed to achieve a MDD of 10? We know that n (per group) of 10 won't do the trick (see part i.) so, we might try to determine the power when n = 11, n = 12, and so on. One enterprising student did just this, and here are her results:
n  phi power
10 1.58 .58
11 1.66 .61
12 1.73 .65
13 1.80 .68
14 1.87 .72
15 1.94 .74
16 2.00 .77
17 2.06 .79
18 2.12 .81
 
The conclusion, therefore, is use a sample size of 18 per group to achieve  .81 power.

A rough answer can be achieved by assuming df(within) is "big", and then look up the phi value need to achieve at least 80% power. In this case, phi is approximately equal to 2. Then, use our sample size formula, n = (22)(2)(2)(100)/(102) = 16, which is a good approximation to the more accurate estimates, above.
 

NEW (Assigned in 3/4 Class): Confidence Intervals for Group Means

Notes:

95% confidence interval for the mean of Strain A = 432.67 +/- (2.57)(sqrt 75077.867 / 6) = 432.67 +/- 287.48 = (145.2, 720.2)

95% confidence interval for the mean of Strain B = 112.88 +/- (2.57)(sqrt 110.167 / 6) = (101.50, 124.16)

95% confidence interval for the means of Strain C = 102.00 +/- (2.57)(sqrt 55.20 / 6) = (94.20, 109.8)