The research focus is to determine whether alcohol scores differ by income.
Alcohol Consumption Score |
Income Level | ||||
1
(n = 46) |
2
(n = 88) |
3
(n = 140) |
4
(n = 250) |
5
(n = 189) | |
Mean | 2.8 | 3.9 | 4.5 | 3.5 | 3.6 |
SD | 3.1 | 4.2 | 4.2 | 3.5 | 2.9 |
Mean alcohol scores vary from 2.8 to 4.5, with income level 3 having the highest mean score and income level 1 having the lowest mean score. Standard deviations range from 0.22 to 0.46.
Not shown on Web. Graph shows medians values of 3 in all five groups. Interquartile ranges vary considerably.
The descriptive statistics (SDs) and side-by-side quartile plot seem to suggest that groups have unequal variances ("heteroscedasticity").
Comments: This test confims the exploratory results.
Alcohol Consumption Score |
Income Level | ||||
1
(n = 46) |
2
(n = 88) |
3
(n = 140) |
4
(n = 250) |
5
(n = 189) | |
Mean | 2.8 | 3.9 | 4.5 | 3.5 | 3.6 |
se | 0.46 | 0.44 | 0.35 | 0.22 | 0.21 |
Comment: Standard error calculations do not assume homoscedasticity.
After drawing and viewing the graph, an interesting pattern emerges . . . don't you think?
Since data are assumed to be heteroscedastic, the Kruskal-Wallis test is used.
Comment: Perhaps we should do a power analysis to determine the power of the analysis (?).
Some points to consider:
The research focus is to determine whether alcohol scores differ by age group.
Age Group (Years) | |||
20 to 29 (n = 234) |
30 to 42 (n = 231) |
43+ (n = 248) | |
Mean | 4.8 | 3.6 | 2.8 |
Standard Dev. | 3.9 | 3.3 | 3.2 |
Comment: Note trend.
Not shown on Web. Trend in medians noted. Interquartile ranges: some variabililty, difficult to evaluate.
Hard to evaluate -- some discrepancy in SDs and inter-quartile ranges (as seen in side-by-side quartile plots), but these are modest.
Age Group (Years) | |||
20 to 29 (n = 234) |
30 to 42 (n = 231) |
43+ (n = 248) | |
Mean | 4.83 | 3.55 | 2.80 |
Standard Error | sqrt (15.072 / 234) = 0.25 | sqrt (11.179 / 231) = 0.22 | sqrt (10.556 / 248) = 0.21 |
Let us use the Kruskal-Wallis test so as to avoid a violation of assumptions (in particular, a violation of the equal variance assumption).
Therefore, significant differences are noted all around.
Both the variability and expected values (means) of alcohol scores differ significantly by age group, with average alcohol consumption inversely associated with age, and greater alcohol consumption variability associated with younger age.
The research focus is to determine whether weight gain differs by diet.
Weight Gain (grams) |
Diet A (Standard Diet) n = 5 |
Diet B (Junk Food) n = 5 |
Diet C (Health Food) n = 5 |
Mean | 11.14 | 13.44 | 9.14 |
Standard Deviation | 1.27 | 0.62 | 0.58 |
Not shown on Web.
Hard to evaluate -- seem to differ(?).
Weight Gain (grams) |
Diet A (Standard Diet) n = 5 |
Diet B (Junk Food) n = 5 |
Diet C (Health Food) n = 5 |
Mean Estimate | 11.14 | 13.44 | 9.14 |
Standard Error Estimates | sqrt (0.780 / 5) = 0.39 | 0.39 | 0.39 |
Comment: Independent t tests were used above. This is equivalent to ANOVA tests with k = 2.
Average weight gain differs significantly by diet type, with junk food associated with the greatest gain (mean = 13.44 gms; sd = 0.62 mgs) and health food associated with the least gain (mean = 9.14, sd = 0.58).
The research problem is to determine whether testosterone levels differ by rooster strain.
Testosterone (µg/dl) | Strain A n = 6 |
Strain B n = 6 |
Strain C n = 6 |
Mean ± SD | 43.27 ± 274.0 | 112.8 ± 10.5 | 102.0 ± 7.4 |
(minimum, maximum) | (134, 897) | (98, 126) | (89, 110) |
H0: sigma-squared1 = sigma-squared2 = sigma-squared3 vs. H1: at least one population variance differs
Let alpha = .05
Bartlett's Chi-square(2, N = 18) = 47.99, p < .0001.
Conclusion: reject H0.
The conclusion to the hypothesis tests, combined with the widely varying sample standard deviations
(Table, above), suggest that the groups are heteroscedastic. (This is interesting. I wonder what there's
more variability in Strain A than in the other Strains.) We will therefore proceed under the assumption of
unequal variance.
The nonparametric K-W test will be performed. (See comments about heteroscedasticity, above.)
Let alpha = .05
K-W Chi-square(2, N = 18) = 12.55, p = .0019.
Conclusion: rejected the null hypothesis of equal means and proceed on with pairwise comparisons at
alphaBonf = .018 (so as to maintain an "experiment-wise" alpha of .05; see Reader pp. 11.7 - 11.9. The
Kruskal-Wallis procedure will be used because of the assumed heteroscedasticity. Results are as follows:
Conclusion: Strain A differs from Strain B and Strain C, but there is no significant difference between
Strain B and Strain C.
Assumptions: alpha = .05, k = 2, df(between) = 1, df(within) = 18, s2 = 100.
i. For a minimal detectable difference (MDD) of 10, phi = 1.58 and power = .58
ii. For a MDD of 15, phi = 2.37 and power = .89
iii. For a MDD of 20, phi = 3.16 and power > .98
iv. What size samples are needed to achieve a MDD of 10? We know that n (per group) of 10 won't do
the trick (see part i.) so, we might try to determine the power when n = 11, n = 12, and so on. One
enterprising student did just this, and here are her results:
n phi power
10 1.58 .58
11 1.66 .61
12 1.73 .65
13 1.80 .68
14 1.87 .72
15 1.94 .74
16 2.00 .77
17 2.06 .79
18 2.12 .81
The conclusion, therefore, is use a sample size of 18 per group to achieve .81 power.
A rough answer can be achieved by assuming df(within) is "big", and then look up the phi value need to
achieve at least 80% power. In this case, phi is approximately equal to 2. Then, use our sample size
formula, n = (22)(2)(2)(100)/(102) = 16, which is a good approximation to the more accurate estimates,
above.
Notes:
95% confidence interval for the mean of Strain A = 432.67 +/- (2.57)(sqrt 75077.867 / 6) = 432.67 +/- 287.48 = (145.2, 720.2)
95% confidence interval for the mean of Strain B = 112.88 +/- (2.57)(sqrt 110.167 / 6) = (101.50, 124.16)
95% confidence interval for the means of Strain C = 102.00 +/- (2.57)(sqrt 55.20 / 6) = (94.20, 109.8)