Errors in Hypothesis Testing

A superintendent in a medium size school has a problem. The mathematics scores on nationally standardized achievement tests such as the SAT and ACT of the students attending her school are lower than the national average. The school board members, who don't care whether the football or basketball teams win or not, are greatly concerned about this deficiency. The superintendent fears that if the situation is not corrected, she will lose her job before long.

As the superintendent was sitting in her office wondering what to do, a salesperson approached with a briefcase and a sales pitch. The salesperson had heard about the problem of the mathematics scores and was prepared to offer the superintendent a "deal she couldn't refuse." The deal was teaching machines to teach mathematics, guaranteed to increase the mathematics scores of the students. In addition, the machines never take breaks or demand a pay increase.

The superintendent agreed that the machines might work, but was concerned about the cost. The salesperson finally wrote some figures. Since there were about 1000 students in the school and one machine was needed for every ten students, the school would need about one hundred machines. At a cost of $10,000 per machine, the total cost to the school would be about $1,000,000. As the superintendent picked herself up off the floor, she said she would consider the offer, but didn't think the school board would go for such a big expenditure without prior evidence that the machines actually worked. Besides, how did she know that the company that manufactures the machines might not go bankrupt in the next year, meaning the school would be stuck with a million dollar's worth of useless electronic junk?

The salesperson was prepared, making an offer to lease ten machines for testing purposes to the school for one year at a cost of $500 each. At the end of a year, the superintendent would make a decision about the effectiveness of the machines. If they worked, she would pitch them to the school board; if not, then she would return the machines with no further obligation.

An experimental design was agreed upon. One hundred students would be randomly selected from the student population and would be taught using the machines for one year. At the end of the year, the mean mathematics scores of those students would be compared to the mean scores of the students who did not use the machine. If the means were different enough, the machines would be purchased. (The astute statistics student will recognize this as a nested t-test.)

In order to help decide how different the two means would have to be in order to buy the machines, the superintendent did a theoretical analysis of the decision process. Her analysis is presented in the following decision box.

Decision Boxes in Hypothesis Testing
	"Real World"
DECISION	The machines do NOT work.	The machines work.
Buy the machines. Decide the machines work.	(1.) Type I ERROR probability=a	(4.) CORRECT probability= 1-b "power"
Do not buy the machines. Decide that the machines do not work	(2.) CORRECT probability=1-a	(3.) Type II ERROR probability=b

The decision box has the decision that the superintendent must make in the left column. For simplicity's sake, only two possibilities are permitted: either buy all the machines or buy none of the machines. The other two column titles represent "the state of the real world". The state of the real world can never be truly known, because if it were known whether or not the machines worked, there would be no point in doing the experiment. The four "Real World" cells represent various places one could be, depending upon the state of the world and the decision made. Each cell will be discussed in turn.

This is called a Type I error and in this case is very costly ($1,000,000). The probability of this type of error is a, also called the significance level, and is directly controlled by the experimenter. Before the experiment begins, the experimenter directly sets the value of a. In this example the value of awould be set low, lower than the usual value of .05, perhaps as low as .0001, which means that one time out of 10,000 the experimenter would buy the machines when they didn't work.

This is a correct decision, made with probability 1- awhen in fact the teaching machines don't work and the machines are not purchased.

The relationship between the probabilities in these two decision boxes can be illustrated (see the following figures) using the sampling distribution when the null hypothesis is true. The decision point is set by a, the area in the tail or tails of the distribution. Setting a smaller moves the decision point further into the tails of the distribution as you can see in the second distribution.

This is called a Type II error and is made with probability b . The value of b is not directly set by the experimenter, but is a function of a number of factors, including the size of a, the size of the effect, the size of the sample, and the variance of the original distribution. The value of b is inversely related to the value of a: the smaller the value of a, the larger the value of b. It can now be seen that setting the value of ato a small value was not done without cost, as the value of b is increased.

This is the cell where the experimenter would usually like to be. The probability of making this correct decision is 1-b and is given the name "power." Because a was set low, b would be high, and as a result 1-b would be low. Thus it would be unlikely that the superintendent would buy the machines, even if they did work.

The relationship between the probability of a Type II error (b) and power (1-b) is illustrated in the following sampling distribution when there actually was an effect.

The relationship between the size of aand b can be seen in the following illustration combining the two previous distributions into overlapping distributions, the top graph with a=.05 and the bottom with a=.01.

The size of the effect is the difference between the center points (m) of the two distributions. As the size of the effect is increased, the size of beta is decreased.

When the error variance of the scores is decreased and everything else remains constant, the probability of a type II error is decreased, as illustrated here:

The interactive exercise designed to allow exploration of the relationships between alpha, size of effects, size of sample (N), size of error, and beta can now be understood. The values of alpha, size of effects, size of sample, and size of error can all be adjusted with the appropriate scroll bars. When one of these values is changed, the graphs will change and the value of beta will be re-computed. The area representing the value of alpha on the graph is drawn in dark gray. The area representing beta is drawn in dark blue, while the corresponding value of power is represented by the light blue area. Use this exercise to verify:

The size of the increase or decrease in beta is a complex function of changes in all of the other values. For example, changes in the size of the sample may have either small or large effects on beta depending upon the other values. If a large treatment effect and small error is present in the experiment, then changes in the sample size are going to have a small effect.

A Second Chance

As might be expected, in the previous situation the superintendent chose not to purchase the teaching machines, because she had essentially stacked the deck against deciding that there were any effects. When she described the experiment and the result to the salesperson the next year, the salesperson listened carefully and understood the reason why a had been set so low.

The salesperson had a new offer to make, however. Because of an advance in microchip technology, the entire teaching machine had been placed on a single integrated circuit. As a result the price had dropped to $500 a machine. Now it would cost the superintendent a total of $50,000 to purchase the machines, a sum that is quite reasonable.

The analysis of the probabilities of the two types of errors revealed that the cost of a Type I error, buying the machines when they really don't work ($50,000), is small when compared to the loss encountered in a Type II error, when the machines are not purchased when in fact they do work, although it is difficult to put into dollars the cost of the students not learning to their highest potential.

In any case, the superintendent would probably set the value of ato a fairly large value (.10 perhaps) relative to the standard value of .05. This would have the effect of decreasing the value of b and increasing the power (1-b) of the experiment. Thus the decision to buy the machines would be made more often if in fact the machines worked. The experiment was repeated the next year under the same conditions as the previous year, except that the size of a) was set to .10.

The results of the significance test indicated that the means were significantly different, the null hypothesis was rejected, and a decision about the reality of effects made. The machines were purchased, the salesperson earned a commission, the math scores of the students increased, and everyone lived happily ever after.

Q20.1

When doing hypothesis testing in the real world
the true state of the world with respect to the reality of effect is unknown.
the researcher knows when the effects are real, but does not know when they are not real.
the researcher knows when he or she has made a Type I error.
the researcher know when he or she has made a Type II error.

Q20.2

The scientific method requires
that the reality of effects be tested before they are accepted into the literature.
that politically correct decisions be made when practical.
that effects be assumed to be true before they are found to be false.
certainty of effects before they are accepted into the scientific literature.

Q20.3

The probability of rejecting the null hypothesis whether it is true or not
is generally unknown to the hypothesis tester.
is the significance level (alpha).
is called power.
is the sum of alpha and beta.

Q20.4

The probability of correctly retaining the null hypothesis is a function of
alpha, the size of the effects, and measurement error
alpha
the size of the effects
measurement error

Q20.5

The probability of a type II error is ____ related to the probability of a type I error.
inversely
directly
proportionally
not

Q20.6

Decreasing the probability of a Type I error (alpha) _____ the probability of a Type II error (beta).
increases
decreases
does not affect
inverses

Q20.7

Keeping the size of effects and alpha constant, decreasing the error variance will _____ the size of beta.
decrease
increase
not affect
partition

Q20.8

The reason most researchers do not set the value of alpha extremely low (less the .001) is
the probability of a Type II error is greatly increased.
certainty of that magnitude is unknown in science.
when the null hypothesis is true, it will be rejected.
these values for alpha were not included in statistical tables.

Q20.9

The probability of correctly rejecting the null hypothesis is a function of
alpha, the size of the effects, and measurement error
alpha
the size of the effects
measurement error

Q20.10

The size of beta decreases as the size of
of the effects increase.
alpha increases.
sample decreases.
error increases.

Q20.11

When the cost of a not rejecting the null when it is false is high with respect to rejecting the null when it is true, the hypothesis tester will
set the value of alpha higher.
set the value of beta higher.
never reject the null hypothesis.
reject the null hypothesis.

The Analysis Generalized to All Experiments

The analysis of the reality of the effects of the teaching machines may be generalized to all significance tests. Rather than buying or not buying the machines, you reject or retain the null hypothesis. In the "real world," rather than the machines working or not working, the null hypothesis is true or false. The following decision box presents the choices representing significance tests in general.

Decision Boxes in Hypothesis Testing
	"Real World"
DECISION	NULL TRUE ALTERNATIVE FALSE No Effects	NULL FALSE ALTERNATIVE TRUE Real Effects
Reject Null Accept Alternative Decide there are real effects.	Type I ERROR probability=a	CORRECT probability=1-b "power"
Retain Null Retain Alternative Decide that no effects were discovered.	CORRECT probability=1-a	Type II ERROR probability=b

Q20.12

The results of a test of the difference between two sample means are said to be statistically significant when:
the observed difference may be explained by sampling error.
the observed result cannot be accounted for by the effects of the treatment.
the probability of obtaining this difference due to sampling error is small.
.

Q20.13

When an experimenter selects a particular level of risk (a-alpha)
an experimenter NEVER selects a particular level of alpha.
he/she effectively is setting the probability of accepting H1 when in fact it is true.
he/she effectively is setting the probability of rejecting H0 when in fact it is true.
he/she effectively is setting the probability of retaining H0 when in fact it is false.

Q20.14

The level of significance of an hypothesis test is determined by
the probability of type II error.
Alpha.
Beta.
none of the above.

Q20.15

If the null hypothesis is rejected as being false,
there is some chance a type II error is being made.
a high probability exists that H1 also is false.
the probability associated with H0 must have been insignificant at the 0.01 level.
none of the above.

Q20.16

When an experiment is performed in real life
There is a good chance that the alternative hypothesis is false.
There is a good chance that the null hypothesis is true.
Whether the null hypothesis is true or not depends upon the value of alpha.
The truth value of the null hypothesis is unknown.

Q20.17

In reaching a decision about the null hypothesis, a type I error occurs when:
the null hypothesis is retained when it is true.
the alternative hypothesis is accepted when it is true.
the null hypothesis is retained when the alternative hypothesis is true.
an alternative hypothesis is accepted when the null hypothesis is true.

Q20.18

Which of the following does not belong:
reject the null hypothesis
retain the alternative hypothesis
significance was found
real effects were discovered

Q20.19

If the cost of a Type II error is high relative to the cost of a Type I error, the value of alpha should be set
at .05.
higher than .05.
lower than .05.
the experimenter does not set the value of alpha.

Summary

When doing an hypothesis test, two types of decision errors are possible. The first, called a Type I error, occurs when the null hypothesis is rejected when in fact it is true. The probability of a Type I error is called alpha and symbolized by a. Alpha is directly set by the researcher with a generally accepted default value of .05. The second type of error is called a Type II error and occurs when the researcher retains the null hypothesis when in fact it is false. The probability of a Type II error is called beta and symbolized by b. The value of beta is indirectly set by the researcher and depends upon four values, including: alpha, effect size, sample size, and error variance. In general, while the size of alpha is known, the size of beta can only be imprecisely estimated.

It is not difficult to conceive of situations where the default value of .05 for alpha should be abandoned for values that take into account the relative costs of each type of error. Since the probabilities of alpha and beta are inversely related, if the cost of a Type I error is high relative to the cost of a Type II error, then the probability of a Type I error (a) should be set relatively low. If the cost of a Type I error is low relative to the cost of a Type II error, then the probability of a Type I error (a) should be set relatively high.