Continuous Outcome, Paired Samples

Introduction | Descriptive Statistics | Confidence Interval | p Value | Exercises

Introduction

This chapter considers the analysis of a continuous outcome with data collected by paired samples. An example of this type of sample is a  pre-test/post-test sample in which an outcome is measured before and after an intervention. Paired samples can also be achieved by matching  and by non-experimental sequential measurements. 

Illustrative example. The dataset BED.ZIP  contains data on the number of hospital beds (per 1000 residents) in the fifty United States and District of Columbia for 1980 (BED80) and 1986 (BED86). The first 3 records and last record of this data set are:

STATE  BED80 BED86
-----  ----- -----
  1      4.7   4.2
  2      3.9   3.3
  3      4.4   4.0
etc.
 51      3.1   2.6

We want to describe changes in the number of hospital beds over the study interval. As usual, we begins with a careful descriptive analysis. 

Descriptive Statistics

We begin by describing each measurement separately. This is accomplished by READing the data set into the current session and issuing separate MEANS commands against the two variables:

EPI6> READ BED.REC
EPI6> MEANS BED80
EPI6> MEANS BED86

Output (not shown to save space) reveals that the mean number of beds (per 1000) decreased from 4.56 in 1980 to 4.23 in 1986. 

Next, we create a new variable (DELTA) with the DEFINE command as follows:

EPI6> DEFINE DELTA ##.#
EPI6> DELTA = BED80 - BED86

When defining DELTA, make certain the numeric field indicator is sufficient to capture all possible values for within-pair differences. For the illustrative example, the variable was defined with the structure ##.# in order to reserve space for the negative sign and decimal values.

Individual differences are LISTed:

STATE BED80 BED86  DELTA
---   ----- -----  -----
  1     4.7   4.2   0.5
  2     3.9   3.3   0.6
  3     4.4   4.0   0.4
etc.

Summary statistics for DELTA are computed:

EPI6> MEANS DELTA

Output from this command is:

 DELTA

      Total        Sum       Mean   Variance    Std Dev    Std Err
         51         16      0.324      0.113      0.336      0.047

    Minimum     25%ile     Median     75%ile    Maximum       Mode
     -0.800      0.200      0.400      0.500      0.900      0.200

Therefore, the mean decline is 0.32 (sd = 0.34, n = 51).

Comments
(1) See the prior unit for comments regarding the reporting and interpretation of descriptive statistics.
(2) The above output (derived by EpiInfo v.6) has a bug. It reports the minimum as -0.8 when in fact it is -1.0.

Confidence Interval for Mean Difference

The point estimator of the expected difference (�d) is the MEAN of DELTA (0.32 for the illustrative example). An interval estimate (i.e., 95% confidence interval for �d) can be calculated with the formula:

(MEANDELTA) � (tn-1,.975)(Std Err)

where MEANDELTA = the mean of the DELTA variable, tn-1,.975 = the 97.5th percentile on a t distribution with n - 1 degrees of freedom (click here for a t table), and Std Err = the standard error of the mean as reported in the EpiInfo output: Std Err = Std Dev / n. For the illustrative data, the 95% confidence interval for �d  = 0.324 � (t50, .975)(sed) = 0.324 � (2.01)(0.047) = 0.324 � 0.094 = (0.23, 0.42).

Comments
(1) The above interval locates �d with 95% confidence.
(2) The width of the confidence interval is a measure of the estimate's precision.
(3) The method assumes data are free of nonrandom error sources of error (i.e., information bias, selection bias, and confounding) and that random error tends toward normality.

p Value (Paired t Test)

To test H0: �d = 0, use the paired t statistic:

tstat = (MEANDELTA) / (Std Err)

Under H0, this statistic has a t sampling distribution with n - 1 degrees of freedom. For the illustrative data, tstat = 0.324 / 0.047 = 6.87 and df = 51 - 1 = 50. The p value is the area in the tail (or tails) of the appropriate t distribution.

Epi Info computes the paired t test when its MEANS command is directed against the DELTA variable:

Student's "t", testing whether mean differs from zero.
T statistic = 6.872,  df =    50   p-value = 0.00000

Comments
(1) Small p values provide evidence against H0.
(2) The test assumes data are free of nonrandom error (i.e., information bias, selection bias, and confounding)and the random error distribution of DELTA tends toward normality (i.e., either data are approximately normal or the sample is large for the central limit theorem to have an effect).

Exercises

(1) FLUORIDE.ZIP: Effect of Water Fluoridation on Carie-Free Rates (Data from Osborne, 1980, p. 40). Data representing the number of carie-free subjects per 100 children before and after city water fluoridation projects in 16 cities are shown below. Download or create a file with these data and then compare carie-rates BEFORE and AFTER fluoridation. Compute the mean change and a 95% confidence interval for �d. Summarize your results in narrative form.

REC  BEFORE AFTER
---  ------ -----
  1    18.2  49.2
  2    21.9  30.0
  3     5.2  16.0
  4    20.4  47.8
  5     2.8   3.4
  6    21.0  16.8
  7    11.3  10.7
  8     6.1   5.7
  9    25.0  23.0
 10    13.0  17.0
 11    76.0  79.0
 12    59.0  66.0
 13    25.6  46.8
 14    50.4  84.9
 15    41.2  65.2
 16    21.0  52.0

(2) OATBRAN.ZIP: Oat Bran and Low Density Lipoproteins (Data from Pagano and Gauvreau, 1993, pp. 252-253). A study was conducted to investigate whether oat bran lowers serum cholesterol levels in hypercholesterolemic men. Fourteen individuals were randomly placed on a diet that included either oat bran or corn flakes. Then, subjects were "crossed-over" to the alternative diet. Data are shown below. Analyze these data as you see fit.

    REC  CORNFLK OATBRAN
   ----  ------- -------
      1    4.61    3.84
      2    6.42    5.57
      3    5.40    5.85
      4    4.54    4.80
      5    3.98    3.68
      6    3.82    2.96
      7    5.01    4.41
      8    4.34    3.72
      9    3.80    3.49
     10    4.56    3.84
     11    5.35    5.26
     12    3.89    3.73
     13    2.25    1.84
     14    4.24    4.14

(3) COT-NEW.ZIP: Degradation of Salivary Cotinine (Fictitious data). Cotinine is a by-product of tobacco. When found in saliva, it suggests prior tobacco use or exposure. As part of a study on the use of this methods, volunteers smoked a cigarette. Salivary cotinine levels were then monitored 12- and 24-hours post-exposure. Data are shown below. Calculate a 95% confidence interval for the expected change in cotinine levels.

REC   COT12HRS   COT24HRS
---   --------   ---------
 1        83         14
 2        68         27
 3        68         29
 4        98         29
 5        30          4
 6        14          9
 7       141         53
 8        54         16

(4) BPH-SAMP: A minimally invasive therapy for the treatment of Benign Prostatic Hyperplasia (Data from J. Morales, 2000 SJSU Graduate student). Benign prostate hyperplasia (BPH) is a noncancerous enlargement of the prostate gland which restricts the flow of urine from the bladder. The onset of BPH is associated with aging, and is most commonly seen in men over the age of 50. This study looks at pretreatment quality of life (QoL) and urine flow (MaxFlow) measures at the start of treatment (TX) and 3 months later (3Mo) in 10 individuals . QoL was ascertained by asking patients to rate their quality of life, the lower the number, the better the quality of life (0 = Delighted, 1 = Pleased, 2 = Mostly Satisfied, 3 = Mixed, 4 = Mostly Dissatisfied, 5 = Unhappy, 6 = Terrible). The MaxFlow variable was measured using a uro-flowmeter. A typical "normal" value for MaxFlow is 19.6 ml/s, with low values an indication of obstruction to the urinary path. After 3 months, follow-up measurements were taken for both outcomes (QoL3Mo and MaxFlow3Mo, respectively) to see if the procedure had made a positive impact on patients' symptoms. Data for 10 patients participating in the study are shown below:
ID QoLTX QoL3Mo MaxFlow TX MaxFlow3Mo
001  7.00  5.00 
011  8.00  18.00 
021  8.10  13.15 
031  8.80  15.55 
041  11.05  8.10 
051  3.50  8.50 
061  9.25  12.25 
071  9.70  5.90 
081  8.25  13.60 
082  10.45  13.10 

(A) Create an Epi Info file with these data.
(B) Describe QoL at the start of treatment.
(C) Describe QoL variable at the 3-month mark.
(D) Describe the change in QoL.
(E) Plot the change in QoL in the form of a stem-and-leaf plot. Interpret your graph.
(G) Test H0: �d = 0.
(H) Analyze the change in MaxFlow using methods you deem appropriate.

Key