Some of the Basics

B. Gerstman (6/17/99)

| Study Design | Data Collection & Management | Data Description Basics | The Basis of Inference |
| Power & Sample Size | Narrative Summary of Results | How to Report Statistics | Other Advice |

Study Design

A common problem encountered during statistical analyses and in student research in general is lack of focus. When analyzing data, one must keep the question that prompted the study in the first place ("the research question") clearly in mind every step of the way, and this question must be articulated concisely and accurately.

Once the research question has been defined, the study must be carefully designed to answer it. This is the critical element in determining the success of your study, for nearly all study disasters are failures of design, not of analysis. Some issues you will have to consider when designing your study are:

These and other study design elements are discussed in the very good "Web article" by Dallal (1997). Click here to go to this article.

Data Collection and Management

Good data is expensive and takes a lot of time (and effort) to collect. "Instrumentation" (including study questionnaires) must be carefully designed, tested, and maintained. Survey questions must be simple, direct, non-ambiguous, and asked in a non-leading way. To encourage accuracy and compliance, questionnaires should be brief (no more than a page or two); nothing is to be taken for granted.

The study protocol must be documented and adhered to. Criteria for dealing with missing and messy data must be outlined before these inevitable problems are encountered. How will a representative sample be achieved? (Use chance mechanisms for subject selection, whenever possible.) How will you deal with subjects who refuse to participate or who are noncompliant with the study protocol? Once data are collected, how will you ensure that data handling and processing errors are prevented? Who is going to enter, document, and clean the data? Is a back-up procedure in place? Your study is only as good as your data, so take care in setting up your study protocol and collecting your data.

Data Description Basics

The first step of analysis is to describe the data using summary statistics, frequency tables, and exploratory plots. In so doing, one hopes to describe the data's shape, location, spread, and, associations.

The Basis of Inference

R. A. Fisher (1935, p. 39) has said:

For everyone who does habitually attempt the difficult task of making sense of figures is, in fact, essaying a logical process of the kind we call inductive, in that he is attempting to draw inferences from the particular to the general; or, as we more usually say in statistics, from the sample to population. Such inferences we recognize to be uncertain inferences . . .

Inference is the act of generalizing from the sample to the population with calculated degree of certainty. Classically, there are two forms of inference: estimation and hypothesis testing. Both are used to infer parameters.Parameters are constants that describe a shape, location, spread, or association within the population. Estimation predicts parameters directly. Hypothesis testing uses a (quasi-) deductive method of inference.

There are two forms of estimation: point estimation and interval estimation. Point estimation provides a single estimate of the parameter. For example, the sample mean ("x bar") is an efficient estimator the population mean ("mu"). Interval estimation provides a range of values with known likelihood of capturing the parameter. For example, we might want to calculate a 95% confidence interval for populaiton mean "mu." This interval has a 95% chance of capturing the parameter in question.

So what about hypothesis testing? First, we must note that there exists much misunderstanding about hypothesis testing! In reference to this misunderstanding, John Tukey (1991) has written:

Statisticians classically asked the wrong questions -- and were willing to answer with a lie, one that was often a downright lie. They asked "Are the effects of A and B different?" and they were willing to answer "no."
All we know about the world teaches us that the effects of A and B are always different -- in some decimal place - for any A and B. Thus asking "Are the effects different?" is foolish.
What we should be answering first is "Can we tell the direction in which the effects of A differ from the effects of B?" In other words, can we be confident about the direction from A to B? Is it "up," "down" or "uncertain"?
The third answer to this first question is that we are "uncertain about the direction" - it is not, and never should be, that we "accept the null hypothesis."

Yet, many researchers fail to distinguish between a negative test and the uncertainty it implies. So where are we to start in clarifying this confusion?  Let us start by understanding what we hope to accomplish with our test. In general, the test starts with an assumption of no differences or association in the population. This premise is formalized in the form of a null hypothesis (H0). The goal of the test is now to limit false rejections of the null hypothesis. (Nothing more.) A false rejection of this null hypothesis is referred to as a type I error.
 

Decision H0 is actually true H0 is actually false
Retain H0  OK Type II Error
Reject H0  Type I Error! OK

Alpha, or the "significance level" of a test, is the probability we are willing to take in the making a Type I error. Classically, alpha is set to .05 (so that we have no more than a 1-in-20 chance of a false rejection). However, other levels of alpha are possible (e.g., .01).

Our decision about the null hypothesis will be based on a test statistic. When the observed test statistic is unlikely to have come from a population described by the null hypothesis, the null hypothesis will be rejected. This likelihood is quantified in the form of a p value, which represents the probability of observing a test statistic that is equally extreme or more extreme than current test statistic assuming that the null hypothesis to be true. When the p value gets sufficiently small -- defined as less than or equal to alpha -- the null hypothesis is rejected. This "rule" seems straight-forward enough. However, the basis of this rule is so often forgotten, that many prominent statisticians have suggested that hypothesis testing should be dismissed. (Yes: rid the world of hypothesis testing once and for all.) Why dismiss this well-established procedure, you might ask? Cohen (1994) argues as follows:

Well, among many other things, [hypothesis testing] does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does! What we want to know it "Given these data, what is the probability that H0 is true?" But as most of us know, what it tells us is "Given that H0 is true, what is the probability of these (or more extreme) data?" These are not the same, as has been pointed out many times over the years. So, the p value is not a measure of how good the null hypothesis is. It is a measure of how well the data are, assuming the null hypothesis is true. p Values should, thereby, not be viewed as objective probabilities, since they are based on a premise that may be -- and probably is -- entirely false.

Nevertheless, hypothesis testing remains the sine quo non of inferential statistics, and is so entrenched that it is not going away any time soon. Moreover, when combined with other forms of information (such as exploratory plots, descriptive statistics, biological reasoning, and so on), hypothesis testing is actually a very good procedure. It is therefore our responsibility to interpret hypothesis testing results properly.

Power and Sample Size

But what of the retained null hypothesis? Might this be false as well? Of course it can, and the probability of a false retention of H0 has a name. It is called a type II error, and its probability is called "beta".
 

Decision H0 is actually true H0 is actually false
Retain H0 OK Type II Error!
Reject H0 Type I Error OK

 
The complement of a type II error -- its probability of avoidance -- is known as "power" (i.e., power = 1 - beta). Studies with inadequate power are a waste of time, money, energy, etc. This is why we should address sample size before collecting data. For an introduction to sample size calculation, please see www.tufts.edu/~gdallal/SIZE.HTM

Narrative Summary of Results

Abelson in his excellent book Statistics as Principled Argument (1995) suggests that "the presentation of the inferences drawn from statistical analysis importantly involves rhetoric." The virtues of a good  statistician, therefore, involve not only the skills of a good detective, but also the skills of a good storyteller. He further states (1995, p. 2):

Somewhere along the line of teaching statistics . . . the importance of good judgement got lost amidst the minutiae of null hypothesis testing. It is all right, indeed essential, to argue flexibly and in detail for a particular case when you use statistics.

Data analysis should not be pointlessly formal. It should make an interesting claim; it should tell a story that an informed audience will care about, and it should do so by intelligent interpretation of appropriate evidence from empirical measurements and observation.In summarizing our data, our goal is to relate our statistical findings to the question that initially motivated the study in the first place.

My advice: Speak plainly, be descriptive, and do not embellish.

How To Report Statistics

Reporting and presenting study results is an important part of the statistician's job. In general, we should always use judgement when reporting data and always report findings in a way this is consistent with the precision of the data and what you wish to learn.With this in mind, here are some guidelines.

Other Advice

References

Abelson R. P. (1995). Statistics as Principled Argument. Hillsdale, NJ: Lawrence Erlbaum Associates.

American Psychological Association [APA]. (1994). Publication Manual (4th ed.). Washington, DC: Author.

Bailar, J. C. & Mosteller, F. (1988). Guidelines for statistical reporting in articles for medical journals. Annals of Internal Medicine, 108, 266 - 273.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997 - 1003.

Dallal, G. E. (1997). Sample Size Calculations Simplified. http://www.tufts.edu/~gdallal/SIZE.HTM

Dallal, G. E. (1997). Some Aspects of Study Design. http://www.tufts.edu/~gdallal/STUDY.HTM

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society, 98, 39 - 54.

International Committee of Medical Journal Editors [International Committee]. (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108: 258 - 265.

Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100 - 116.