How To Know What To Use 
(Selecting Statistical Methods)

B_Gerstman@compuserve.com

Last update: 5/17/00

Background

Selection of a valid statistical technique depends on a clear understanding of the research question being asked, the way data are collected (sampling methods), and the way variables are measured.

To start, one must understand the nature of the outcome variable. In general, this variable is either:

You will then want to look at the ways groups were sampled. In a fundamental sense, data may be based on:

This, then, provides our framework for selecting an approach. This chart summarizes the most common methods:
Outcome 
Variable
Sample Type Predictor Variable Graphs (examples) Summary stats Main Parameter Most common test
Continuous Single  None histogram, stem-and-leaf boxplot mean, sd, 5-point summary mean  one-sample t test
Continuous Paired None same as above, directed toward  paired differences same as above, directed toward "DELTA" mean difference (paired) paired sample t test
Continuous Independent Categorical side-by-side boxplots or quartile plots group means and standard deviations mean difference (independent) independent t test or ANOVA
Continuous Independent Continuous scatter plot N/A Correlation or
Regression coefficients 
either ANOVA or  t test
Categorical Single None Usually unnecessary numerator and denominator counts proportions Binomial
Categorical Paired None "" discordancy rates odds ratios McNemar's
Categorical Independent Categorical "" incidences (cohort) or exposure  proportions (case-cntl) relative risks (cohort) or odds ratios (case-cntl) Chi-square or Fisher's
Categorical Independent Continuous "" odds ratio Logistic regression

 For example, if we want to study the relationship between cholesterol (continuous outcome) and type A and B behavior (categorical predictor), we estimate the independent mean difference and test whether this is significant using an independent t test. If we want to study heart attack risk (categorical outcome) and type A and B behavoir (categorical predictor) we compare the incidence of heart attacks in the two groups in the form of a relative risk and test the relationship using a chi-square method. If we want to study the relationship between systolic blood pressure (continuous outcome) and age (continuous preditor), we estimate the correlation between these factors and estimate the average change in blood pressure per each year of age using linear regression. (And so on.)

Remarks

OK. A few parting shots:

    1. Start with data of good quality and know the strength and weakness of the data set in detail.
    2. Make careful description the first step.
    3. "Define" the population being studied as precisely as possible.
    4. Select control groups carefully, for this is one of the most difficult of all judgments to make. As a rule, strive for comparability.
    5. Reduce the data to simple summary descriptions; compare groups in intuitive ways.
    6. The strongest case for a result is one that meets causal criteria.
    7. Always try to determine the role that bias may play in explaining your results.
    8. In assessing an association, do not rely on hypothesis tests of significance. In the words of Sir Austin Bradford Hill, ". . there are innumerable situations in which [tests of statistical significance] are totally unnecessary -- because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too small to be of any practical importance."

Sample Review Questions

(1) Fill in the blank: With a continuous outcome, descriptive statistics are based on sums and averages. With a categorical outcome, descriptive statistics are based on _________________ and ___________________.

ANS: counts and proportions

(2) What is the main test used to determine statistical significance when testing a continuous dependent variable from two independent gropus?

ANS: An independent t test or, alternatively, ANOVA

(3) What type of procedure is quantify the relationship between a continuous dependent variable and continuous independent variable?

ANS: Regression or correlation can be used, depending on whether a true independent variable is present (regression) and whether one want to predict the average change in Y per unit X (regression) or correlational "fit" (correlation).

(4) What type of test is normally used to determine whether there is a statistically significance relationship between a categorical dependent variable and categorical independent variable?

ANS: A chi-square test.

(5) List the (two-sided) null hypotheses used by each of tests addressed in (2) - (4), above.

ANS:
For question (2), the Independent t test H0: µ1 = µ2
For question (3), regression test H0: beta1 = 0; correlation test  H0: "rho" = 0
For question (4), the chi-square test: H0: no association between row and column variables

(6) List the assumptions required by each of the above tests.

ANS: Using short descriptors,
Independent t test / ANOVA: Independence, Normality, Equal Variance
Regression test: Linearity, Independence, Normality, Equal Variance
Chi-square test: Independence, Expected Values >= 5

Exercises

For each study described below, please:
(A) Identify the outcome variable and determine whether it is continuous or categorical.
(B) Determine whether the sampling is single sample, paired sample, or independent samples.
(C) If samples are independent, identify the independent variable and determine whether it is continuous or categorical.
(D) Identify appropriate descriptive and exploratory statistical methods for the problem.
(E) Identify the parameter being estimated and appropriate estimation methods.
(F) List the null and alternative hypotheses, and the name of the most common method used to test the problem.
(G) Identify factors that you would need to determine or assume before you could determine the sample size requirements of such a study.

  1. HDL: An investigator wishes to determine whether high density lipoprotein levels (mg/dl) differ in men and women.
  2. GLAUCOMA: An investigator treats one eye of bilateral glaucoma with a new drug intended to lower intra-ocular pressure and the other eye with a placebo. (Intraocular pressure is measurements in mm Hg units.)
  3. BIRTHWT: An investigator studies the functional relationship between gestational age (weeks) and birth weight (grams).
  4. CARDIAC: An investigator studies the relationship between cardiac output (liters/minute) and body weight (kilograms).
  5. HEADTRAU: An investigator hypothesizes that head trauma during childhood is associated with the development of seizures.
  6. ROGAINE: An investigator treats 50 men with hair loss pattern with either treatment I, II, or III and then studies the hair growth (new follicles per centimeter) associated with each treatment.
  7. WT&BP: An investigator wants to determine whether systolic blood pressure (mm Hg) is related to body weight (kilograms).
  8. CLIN-TRI: An investigator randomizing 100 women with breast cancer to a treatment and 100 women a standard therapy. Subjects are followed for 5 years to determine whether they survive or not.
  9. ANALGESIC: We want to investigate the relationship between analgesic abuse and kidney disease by studying creatinine levels (mg/dl) in analgesic abusers and a control group's values.

Click Here for a Key to These Problems

References

Dallal, G. E. (1998). Some Aspects of Study Design. http://www.tufts.edu/~gdallal/STUDY.HTM.

Gerstman, B. B. (1998). Epidemiology Kept Simple. New York: John Wiley & Sons.

Tyler, C. W. Jr. & Last, J. M. (1998). Epidemiology. In: Maxcy-Rosenau-Last Public Health & Preventive Medicine. R. B. Wallis (Ed.) Stamford, CN: Appleton & Lange.