StatPrimer Chapter 8 Exercise

(1) The National Health and Examination Survey of 1976 - 80 found that the mean serum cholesterol level of U. S. males aged 20 - 75 was approximately 210 (mg/dl) with a standard deviation of approximately 90 mg /dl. Consider the sampling distribution of sample means based on samples of n = 36 drawn from this population.
(A) What is the mean of this sampling distribution?
(B) What is the standard error of the sampling distribution?
(C) Assuming that the central limit theorem holds, describe this distribution of the sample means. (i.e., How is "x bar" distributed, and what is its mean and standard error?)
(D) Find the probability that a given sample mean drawn from this population will be less than 210.
(E) Find the probability that a given sample mean drawn from this population will be less than 225.
(F) Find the probability that a given sample mean will be at least 225.
(G) Would it be "unusual" (defined as occurring less than .05 of the time) to see a sample mean of at least 225?
(H) Now find the probability that a given sample mean from this population will be at least 240.
(I) Would it be "unusual" to see a sample mean of at least 240?

ANSWERS: (A) 210 (B) SEM = 90 / sqrt(36) = 15 (C) Sample mean~N(210, 152). (D) .5000 (E) This sample mean would be one standard deviation above the expected mean. Thus, P(sample mean < 225) = .8413. (F) P(sample mean > 225) = 1 - .8413 = 0.1587. (G) No, since the probability of this occurrence is greater than .05. (H) P(sample mean > 240) = .0228. (I) Yes, since the probability of this occurrence is less than .05.

(2) A population consists of the values: 1, 3, 5, 7, and 9.
(A) Calculate the mean of this population. Show all work.
(B) List all possible samples of size n = 2 from this population. There are 10 such samples.
(C) Calculate the means of 10 samples identified in part B.
(D) Construct a stem-and-leaf plot of the 10 sample means identified in part C. Plot each sample mean to two significant digits, so that a sample mean of 2 is plotted as 2.0. Use a stem similar to the one below. What do you notice about the shape of this sampling distribution?
|2|
|3|
|4|
|5|
|6|
|7|
|8|
Stem for Part D of Exercise
(E) Calculate the mean of the sampling distribution of means. Show all work. How does this compare to the mean of the population?
(F) Calculate the standard deviation of the initial population and the sampling distribution of means. Show all work. How do these standard deviations compare?
ANSWERS:
(A) � = (1 + 3 + 5 + 7 + 9) / 5 = 25 / 5 = 10.
(B) All possible samples of n = 2 from the population: {1, 3}, {1, 5}, {1, 7}, {1, 9}, {3, 5}, {3, 7}, {3, 9}, {5, 7}, {5, 9}, {7,9}
(C) Means of the 10 possible samples: {2, 3, 4, 5, 4, 5, 6, 6, 7, 8}.
(D) The sampling distribution of the means:

|2|0
|3|0
|4|00
|5|00
|6|00
|7|0
|8|0

Notice that this distribution is mound-shaped.

(E) The mean of the sampling distribution = (2 + 3 + 4 + 5 + 4 + 5 + 6 + 6 + 7 + 8) / 10 = 50 / 10 = 5. This is equal to the mean of the population.
(D)
(2.928) divided by the square root of each sample's size (sqrt(2) = 1.414).
(F) For the initial population: SS = (1 - 5)2 + (3 - 5)2 + (5 - 5)2 + (7 - 5)2 + (9 - 5)2 = 16 + 4 + 0 + 4 + 16 = 40; Population variance = SS / N = 50 / 5 = 8; Population standard deviation = sqrt(8) = 2.828.
For the sampling distribution of means: SS = (2 - 5)2 + (3 - 5)2 + (4 - 5)2 + (5 - 5)2 + (4 - 5)2 + (5 - 5)2 + (6 - 5)2 + (6 - 5)2 + (7 - 5)2 + (5 -8)2 = 9 + 4 + 1 + 0 + 1 + 0 + 1 + 1 + 4 + 9 = 30
Variance = (SS / N ) = 30 / 10 = 3
Standard deviation = sqrt(3) = 1.732
Notice that the standard deviation of the sampling distribution is less than the standard deviation of the population. We would have to apply a finite population correction factor this standard deviation for it to be equal to the sem [I think?].
(3) Calculate the standard error of the means for each of the continuous variables in the FEV.SAV data set.

ANS:
Variable standard deviation (s)
(computed w/SPSS)
n sem = s / sqrt(n)
AGE (years) 2.95 654 2.95 / sqrt(654) = 0.115
FEV (l/sec) .867059 654 .867059 / sqrt(654) = 0.0339047
HEIGHT (in) 5.704 654 5.704 / sqrt (654) = 0.2230

(4) Calculate the standard error of the proportions for each of the categorically binary variables in the FEV.SAV data set.

ANS:
Variable Sample proportion n SEP = sqrt(pq/n)
SEX (male) .514 654 sqrt[(.514)(.486)/(654)] = .0195
SMOKE (yes) .099 654 sqrt[(.099)(.901)/(654)] = .0117

(5) In an effort to detect hypertension in young children, blood pressure measurements were taken in 30 children aged 5 - 6 years living in a specific community. For these children the mean diastolic blood pressure was found to be 56.2 mm Hg with standard deviation 7.9 mm Hg. (Based on Rosner, 2000, p. 205. #6.31)
(A) What is the standard error of the mean diastolic blood pressure in this population, based on this sample?
(B) What is the margin of error of the sample mean?
(C) If it is known from a nationwide study that the mean diastolic blood pressure is 64.2 mm Hg for 5- to 6-year-old children. Is it likely that children in this specific community differ in their blood pressure from this national average? Justify your response.

ANS:

Solution A (from the point of view of the hypothesized population mean, �): We may think of possible sample means that are likely to be derived from a population with mean 64.2. Most of these sample means will be within �2.9 margins of error from the (hypothesized) population mean of 64.2. Therefore, most sample means will lie in the range 61.3 to 67.1. Since our sample mean is 56.2, it is unlikely to have come from the hypothesized population.

Solution B (from the point of view of the observed sample mean, "x bar"): The second solution considers the hypothetical sampling distribution from the point of view of the observed sample mean. That is, the population mean is likely to be within 2.9 margins of error from the sample mean: 56.2 � 2.9 = 53.3 to 59.1. This, then, suggests that this sample mean came from a different population.
(6) Much discussion has taken place concerning possible health hazards from exposure to anesthetic gases. In one study of 525 Michigan nurse anesthetists, 7 of 525 women reported having a new malignancy (other than skin malignancy). (Based on Rosner, 2000. p. 205. #6.44)
(A) What is the incidence (proportion) for new malignancies in this sample?
(B) Calculate a standard error for this proportion.
(C) Nationwide data suggest an expected incidence rate of .4% (.004) for new malignancies for women in this age group. Is the incidence proportion in this sample of nurse anesthetists significantly different from this nationwide norm?

ANS:

(6A) p^ = 7 / 525 = .0133

(6B) SEP = sqrt[(.0133)(1 - .0133)/(525)] = 0.0050 and d = (2)(0.0050) = 0.01
(6B) Once again, two solutions may be applied.

Solution A: (From the point of view of the assumed population proportion.) Most sample proportions will be within �0.01 of the assumed population proportion of 0.004, or between 0.000 (since you can't have a negative proportion) and 0.014. The sample proportion falls within this range of possibility.

Solution B: (From the point of view of the observed sample proportion.) We would expect that the likely incidence of this outcome in the population of nurse anesthetists is �d of the sample proportion. This range is equal to .0133 � (2)(.0050) = .0133 � .010 = .0033 to .0233. Since the nationwide norm of .004 is captured by this interval, it may very well have been part of this population.

(7) Define each of the following terms:

(A) Sampling distribution of a mean
(B) Central Limit Theorem
(C) Unbiased (or an estimate)
(D) Law of large numbers
(E) Standard error of the mean
(F) Standard error of the proportion
(G) Margin of error

ANS: See Vocabulary list at end of chapter

(8) "Directed Paraphrasing Question."

You do a study in which you estimate a mean cost of treatment per patient of $200 with a margin or error of $150. Explain to a health care manager what this means in plain terms. One of your goals is to convince her that additional money must be spent in order to collect more data.

Exercises

(1) Define the following terms:
(A) Statistical inference
(B) Induction
(C) Estimation
(D) Hypothesis testing
(E) Parameter
(F) Statistic

(G) Alpha level
(H) (1 - alpha)100% confidence interval
(I) Margin of error

ANSWERS:
(A) Statistical inference - generalizing from the sample to the population with calculated degree of certainty.
(B) Induction - a logical process in which we attempt to draw inferences from the particular to the general
(C) Estimation - an inferential method that uses sample statistics to directly determine the probable value of a population parameter.
(D) Hypothesis [significance] testing - an inferential method that provides a way to assess the "statistical significance" of findings, allowing categorical conclusions to specific questions.
(E) Parameter - a statistical characteristic of the population.
(F) Statistic - a mathematical summary of the sample.
(G) Alpha - the chance the researcher is willing to take of not capturing the parameter.
(H) (1 - alpha)100% confidence interval - a interval that has a (1 - alpha)100% chance of capturing the parameter
(I) Margin of error - the "plus or minus" wiggle room that defines half of the confidence interval.

(2) List three ways in which"x bar" differs from µ? List one way in which they are similar?

ANSWERS: Differences: (1) "X bar" represents the sample mean, whereas µ represents the population mean. (2) "X bar" is a statistic and µ is a parameter. (3) "X bar" is a random variable, whereas µ is a constant. (4) After you collect data, "x bar" is known (calculated) and µ is unknown (and must be inferred). Similarity: Both "x bar" and µ represent arithmetic averages or the center / "expected value" of a distribution.

(3) Suppose that from a population with a variance of 42 we select a sample of n = 25 and find a (sample) mean of 55.5 Assuming that the population is approximately normally distributed,
(A) Describe the sampling distribution of the sample mean.

(B) What is the standard error of the sample mean?
(C) Calculate a 95% confidence interval for the population mean.
(D) What is the margin of error of the above confidence interval?
(E) Calculate a 90% confidence interval for the population mean.
(F) Why is the 90% confidence interval shorter than the 95% confidence interval?ANSWERS:
(A) "x bar" ~ N(µ, 16/25); note the sampling distribution is not centered on the sample mean, but is centered on the (unknown) population mean.
(B) SE("x bar") = sqrt(16/25) = 0.80
(C) 95% confidence interval for µ = 55.5 ± (1.96)(0.80) = 55.5 ± 1.568 = (53.9, 57.1)
(D) 1.568
(E) 90% confidence interval for µ = 55.5 ± (1.645)(0.80) = 55.5 ± 1.316 = (54.2, 56.8)
(F) Because we are will to accept a greater chance of failing to capture µ.

(4) We select a sample of n = 34 from a population with unknown mean and variance and find a sample mean of 25 and sample standard deviation of 5. Based on this information,

(A) Calculate the estimated standard error of the sample mean.
(B) Calculate a 95% confidence interval for µ.
(C) Calculate a 90% confidence interval for µ.

ANS:
(A) se("x bar") = 5 / sqrt(34) = 0.8575.
(B) 95% confidence interval for µ = 25 ± (t33,.975)(0.8575) = 25 ± (2.035)(0.8575) = 25 ± 1.75 =
(C) 90% confidence interval for µ = 25 ± (t33,.95)(0.8575) = 25 ± (1.692)(0.8575) = 25 ± 1.45 =

(5) In FEV.SAV,

(A) For the variable AGE (years), compute the sample mean and a 95% confidence interval for µ.
(B) For the variable FEV (l/sec), compute the sample mean and a 95% confidence interval for µ.
(C) For the variable HEIGHT (inches), compute the sample mean and a 95% confidence interval for µ.
(D) For the variables SEX (0 = female, 1 = male), compute the number of males (X), the sample size n, the sample proportion ("p hat"), and a a 95% confidence intervals for population proportion p.
(E) For the variables SMOKE (0 = non-smoker, 1 = smoker), compute the number of smokers (X), n, the sample proportion ("p hat"), and a 95% confidence intervals for population proportion p.

ANSWERS:
(A) AGE (years): sample mean = 9.93 (95% confidence interval for µ: 9.70, 10.16)
(B) FEV (liters per second): sample mean = 2.6368 (95% confidence interval for µ: 2.5702, 2.7033)
(C) HEIGHT (inches): sample mean = 61.144 (95% confidence interval for µ: 60.706, 61.582)
(D) SEX (male): "p hat" = 318 / 654 = .4862 (95% confidence interval for p: .4473, .5253)
(E) SMOKE (current smoker): "p hat" = 65 / 654 = .099 (95% confidence interval for p: .077, .525).

(6) Suppose that we find that 14 out of 100 sampled subjects are smokers.

(A) What specific pmf might be used to model the distribution of the number of smokers (X) in similar random samples of size n = 100?
(B) Can a normal approximation be applied in this situation? (Justify your response.) If so, how will X be distributed?
(C) What is the point estimate of p, based on this sample?
(D) Assuming the normal approximation holds, describe the sampling distribution of sample proportion.
(E) What is the standard error estimate of the sampling distribution?
(F) Calculate a 95% confidence interval for p.
(G) What is the margin of error for the above confidence interval?
(H) How would you decrease your margin of error?

ANSWERS:
(A) A binomial random variable, so that X ~ b(100, p)
(B) Yes, since if we assume that p is about equal to p hat (14/100), then npq = (100)(.14)(.86) = 12.04, which is greater than 5. We can therefore assume that X ~ N(14, 12.04)
(C) "p hat" = 14 / 100 = .14.
(D) "p hat" ~ N(p, pq/n); notice that the sampling distribution is centered on unknown value p.
(E) se("p hat") = sqrt ((.14)(.86) / (100)) = .0347
(F) 95% confidence interval for p = .14 ± (1.96)(.0347) = .14 ± .0680 = (.07, .21)
(G) margin of error = .0680 (or as they would say on the news, ±7%)
(H) Increasing the sample size would decrease your standard error and hence decrease the margin of error of your estimate.

(7) Suppose that 75 people are given an antihypertensive drug and the drug is effective in 20 of the people. By effective we mean that their blood pressure was lowered sufficiently to reduce their risk of heart disease while no serious side effects were encountered as judged from an evaluation taken 1 month after starting the drug.
(A) What is the point estimate of the probability p of the drug's effectiveness?
(B) Could the normal approximation to the binomial be used in estimating a confidence interval for effectiveness parameter p? Show all work.
(C) Calculate a 95% confidence interval for p. Show all work.
(D) Use a Web calculator to determine an exact confidence interval for p. How does this compare with the confidence interval calculated by the normal approximation method?

ANS:

(A) p^ = 20 / 75 = .2667
(B) (p^)(q^)n = (.2667)(1 - .2667)(75) = 14.67. Therefore, the normal approximation can indeed be applied.
(C) 95% confidence interval for p = (.2667) ± (1.96)(sqrt[(.2667)(1 - .2667) / (75)]) = .2667 ± (1.96)(.0511) ~= .27 ± .10 = (.17, .37)
(D) The Web calculator gives the 95% confidence interval (.17, .38), which is very close to the confidence interval calculated by the normal approximation method.

(8) Data in the table below concerns the mean triceps skin-fold thickness in a group of men and a group of men with chronic obstructive pulmonary disease (source: Rosner, 1995, p. 185, # 6.5)
Group Mean SD n
Normal 1.35 0.5 40
COPD 0.92 0.4 32
(A) Calculate a 95% confidence interval for the mean skin-fold thickness for the normal group. Show all work.
(B) Calculate a 95% confidence interval for the mean skin-fold thickness for the COPD group.
(C) Based on these two confidence intervals, would you be confident that the populations differ? Justify your response.

 ANS:
(A) 95% confidence interval for µnormal = (sample mean) ± t(n-1, .975)(sem) = 1.35 ± t(39,.975)(0.5/sqrt(40)) = 1.35 ± (2.02)(0.079) = 1.35 ± 0.16 = (1.19, 1.51)
(B) 95% confidence interval for µnormal = 0.92 ± t(31, .975)(0.4/sqrt(32)) = 0.92 ± (2.04)(0.071) = 0.92 ± 0.14 = (0.78, 1.06)
(C) Since the confidence intervals do not overlap, we may be confident that the population means differ.