The normal curve is one of a number of possible models of probability distributions. Because it is widely used and is an important theoretical tool, it merits its own chapter in this book.
The normal curve is not a single curve, rather it is an infinite number of possible curves, all described by the same algebraic expression:
Upon viewing this expression for the first time the initial reaction of the student is usually to panic. Don't. In general it is not necessary to "know" this formula to appreciate and use the normal curve. It is, however, useful to examine this expression for an understanding of how the normal curve operates.
First, some symbols in the expression are simply numbers. These symbols include "2", "p", and "e". The latter two are irrational numbers that are very long, p equaling 3.1416... and e equaling 2.81.... As discussed in the chapter on the review of algebra, it is possible to raise a "funny number", in this case "e", to a "funny power".
The second set of symbols that are of some interest includes the symbol "X", which is a variable corresponding to the score value. The height of the curve at any point is a function of X.
Thirdly, the final two symbols in the equation, "m " and "d " are called parameters, or values that, when set to particular numbers, define one of the infinite number of possible normal curves with which one is dealing. The concept of parameters is very important and considerable attention will be given them in the rest of this chapter. The Greek symbols "m " and "d " are often written in English as mu and sigma, respectively.
The normal curve is called a family of distributions. Each member of the family is determined by setting the parameters (m and d ) of the model to a particular value (number). Because the m parameter can take on any value, positive or negative, and the d parameter can take on any positive value, the family of normal curves is quite large, consisting of an infinite number of members. This makes the normal curve a general-purpose model, able to describe a large number of naturally occurring phenomena, from test scores to the size of the stars.
All the members of the family of normal curves, although different, have a number of properties in common. These properties include: shape, symmetry, tails approaching but never touching the X-axis, and area under the curve.
In statistics, the area under a curve represents theoretical relative frequency or probability. It permits the statistician to make decisions about the world based on a belief about what the world looks like rather than the limited information available in a sample of scores. For example, the statistician would advise the shoe store owner to purchase shoes to stock his or her shelves based on the area under a normal curve model of the world rather than the proportion of individuals in the sample who wore a particular size shoe. Area under a curve may seem like a strange notion to many introductory statistics students, so let's take a brief look at it.
Area is a familiar concept. For example, the area of a square is s2, or side squared; the area of a rectangle is length times height; the area of a right triangle is one-half base times height; and the area of a circle is pr2 or p * r2. It is valuable to know these formulas if one is purchasing such things as carpeting, shingles, etc.
Areas may be added or subtracted from one another to find some resultant area. For example, suppose one had an L-shaped room and wished to purchase new carpet. One could find the area by taking the total area of the larger rectangle and subtracting the area of the rectangle that was not needed, or one could divide the area into two rectangles, find the area of each, and add the areas together. Both procedures are illustrated below:
Finding the area under a curve poses a slightly different problem. In some cases there are formulas that directly give the area between any two points; finding these formulas are what integral calculus is all about. In other cases the areas must be approximated. Whether using integral calculus or approximating methods, the mathematician relies on the idea of adding together the areas of a number of rectangles.
Suppose a curve was divided into equally spaced intervals on the X-axis and a rectangle drawn corresponding to the height of the curve at any of the intervals. The rectangles may be drawn either smaller that the curve, or larger, as in the two illustrations below:
In either case, if the areas of all the rectangles under the curve were added together, the sum of the areas would be an approximation of the total area under the curve. In the case of the smaller rectangles, the area would be too small; in the case of the latter, they would be too big. Taking the average would give a better approximation, but mathematical methods provide a better way.
A better approximation may be achieved by making the intervals on the X-axis smaller. Such an approximations is illustrated below, more closely approximating the actual area under the curve.
The actual area of the curve may be calculated by making the intervals infinitely small (no distance between the intervals) and then computing the area. If this last statement seems a bit bewildering, you share the bewilderment with millions of introductory calculus students. At this point the introductory statistics student must say "I believe" and trust the mathematician or enroll in an introductory calculus course.
The standard procedure for drawing a normal curve is to draw a bell-shaped curve and an X-axis. A tick is placed on the X-axis corresponding to the highest point (middle) of the curve. Three ticks are then placed to both the right and left of the middle point. These ticks are equally spaced and include all but a very small portion under the curve. The middle tick is labeled with the value of mu (m) ; sequential ticks to the right are labeled by adding the value of sigma (d). Ticks to the left are labeled by subtracting the value of d from m for the three values. For example, if m =52 and d =12, then the middle value would be labeled with 52, points to the right would have the values of 64 (52 + 12), 76, and 88, and points to the left would have the values 40, 28, and 16. An example is presented below:
Differences in members of the family of normal curves are a direct result of differences in values for parameters. The two parameters, m and d, each change the shape of the distribution in a different manner.
The first, m, determines where the midpoint of the distribution falls. Changes in m, without changes in d, result in moving the distribution to the right or left, depending upon whether the new value of m was larger or smaller than the previous value, but does not change the shape of the distribution. An example of how changes in m affect the normal curve are presented below:
Changes in the value of d , on the other hand, change the shape of the distribution without affecting the midpoint, because d affects the spread or the dispersion of scores. The larger the value of d, the more dispersed the scores; the smaller the value, the less dispersed. Perhaps the easiest way to understand how d affects the distribution is graphically. The distribution below demonstrates the effect of increasing the value of d :
Since this distribution was drawn according to the procedure described earlier, it appears similar to the previous normal curve, except for the values on the X-axis. This procedure effectively changes the scale and hides the real effect of changes in d . Suppose the second distribution was drawn on a rubber sheet instead of a sheet of paper and stretched to twice its original length in order to make the two scales similar. Drawing the two distributions on the same scale results in the following graphic:
Note that the shape of the second distribution has changed dramatically, being much flatter than the original distribution. It must not be as high as the original distribution because the total area under the curve must be constant, that is, 1.00. The second curve is still a normal curve; it is simply drawn on a different scale on the X-axis.
A different effect on the distribution may be observed if the size of d is decreased. Below the new distribution is drawn according to the standard procedure for drawing normal curves:
Now both distributions are drawn on the same scale, as outlined immediately above, except in this case the sheet is stretched before the distribution is drawn and then released in order that the two distributions are drawn on similar scales:
Note that the distribution is much higher in order to maintain the constant area of 1.00, and the scores are much more closely clustered around the value of m, or the midpoint, than before.
An interactive exercise is provided to demonstrate how the normal curve changes as a function of changes in m and d. The exercise starts by presenting a curve with m = 70 and d = 10. The student may change the value of m from 50 to 90 by moving the scroll bar on the bottom of the graph. In a similar manner, the value of d can be adjusted from 5 to 15 by changing the scroll bar on the right side of the graph.
Suppose that when ordering shoes to restock the shelves in the store one knew that female shoe sizes were normally distributed with m = 7.0 and d = 1.1. Don't worry about where these values came from at this point, there will be plenty about that later. If the area under this distribution between 7.75 and 8.25 could be found, then one would know the proportion of size eight shoes to order. The values of 7.75 and 8.25 are the real limits of the interval of size eight shoes.
Finding scores and areas on normal curves is easy using the probability calculator; simply select the normal distribution, enter values into the correct boxes, and click on a button. The area or score will be entered in the correct box and a representation of the curve will be drawn in the display area. The initial screen of the probability calculator appears below.
To find the area below 7.75 on a normal curve with mu =7.0 and sigma=1.1 complete the following steps.
To find the area between scores, use the probability calculator and the steps outlined below.
1. Click on the selection "Area Between" under the "Normal Distribution"
2. Click on the "Normal Distribution" button.
3. Enter mu, sigma, low score, and high score in the labeled boxes.
4. Click on the button with the arrow pointing to the right.
The steps and the result are presented in the figure below.
To find the area above a score, use the probability calculator and the steps outlined below.
1. Click on the selection "Area Above" under the "Normal Distribution"
2. Click on the "Normal Distribution" button.
3. Enter mu, sigma, and value in the labeled boxes.
4. Click on the button with the arrow pointing to the right.
The steps and the result is presented in the figure below.
At times it will be necessary to find the area that falls in both the tail beyond a single score and the area in the tail of the mirror image of the score. For example in a normal distribution with mu equal 100, sigma equal 3, and a score of 105.3, one might wish to find the total area to the right of a value of 105.3 and to the left of 94.7. The value of 94.7 is the same distance from mu as the score of 105.3, as 105.3 - 100 = 5.3 and 94.7 - 100 = -5.3. This area can be found on the probability calculator using the following three steps.
1. Click on the selection "Two-tailed Sig. level" under the "Normal Distribution"
2. Click on the "Normal Distribution" button.
3. Enter mu, sigma, and value in the labeled boxes.
4. Click on the button with the arrow pointing to the right.
The steps and the result are presented in the figure below.
If a value of 94.7 had been entered as the value in the above illustration of the probability calculator, the same result would have occurred.
Probability Calculator
In some applications of the normal curve, it will be necessary to find the scores that cut off some proportion or percentage of area of the normal distribution. One such application is finding two scores that cut off some symmetrical middle area of a given normal curve. The scores form a confidence interval around the middle point set by the value of mu. Confidence intervals are used to give a range of values rather than a single value to a given measure or estimate. Confidence intervals incorporate error or uncertainty into the information that is given to the client or reader. For example, if finding a ninety-five percent (.95 area) confidence interval resulted in a low score of 83.72 and a high score of 95.87, then ninety-five times out of one hundred the true score will fall between those two points.
Other applications, namely that of finding the percentile rank based on the normal curve, require the ability to find the score value that cuts off a given area of a normal curve. For example, a score value that cuts off 20 percent (.20 area) of a given normal curve would be said to have a percentile rank of 20. The probability calculator has the ability to find both types of scores given area.
It is important to note, however, that interpretation of scores obtained from the normal in this manner are dependent upon the assumptions underlying the their construction to be reasonable. In order for a percentile rank based on the normal curve to be a valid estimate, the underlying distribution must be a normal distribution with the correct parameters of mu and sigma.
To find the scores that cut off some proportion or percentage of area of the normal distribution, it will be necessary to enter mu, sigma, and a probability value into the Probability Calculator. For example, suppose one wished to know what two scores cut off the middle 75% of a normal distribution with m = 123 and d = 23. The score values are used to find confidence intervals for various scores and statistics. In order to answer questions of this nature, the following steps can be used in the probability calculator.
1. Click on the selection "Confidence Interval" under the "Normal Distribution"
2. Click on the "Normal Distribution" button.
3. Enter mu, sigma, and Probability in the labeled boxes.
4. Click on the button with the arrow pointing to the left.
The steps and the result are presented in the figure below.
In a similar manner, the score value which cuts of the bottom proportion of a given normal curve can be found using the program. For example a score of 138.52 cuts off .75 of a normal curve with mu=123 and sigma=23. This area was found using Normal Curve Area program in the following manner.
1. Click on the words "Area Below" under the "Normal Distribution"
2. Click on the "Normal Distribution" button.
3. Enter mu, sigma, and Probability in the labeled boxes.
4. Click on the button with the arrow pointing to the left.
In this case, a score of 138.5 would have a percentile rank of seventy-five in a normal distribution with mu equal 123 and sigma equal 23.
The standard normal curve is a member of the family of normal curves with m = 0.0 and d = 1.0. The value of 0.0 was selected because the normal curve is symmetrical around m and the number system is symmetrical around 0.0. The value of 1.0 for d is simply a unit value. The X-axis on a standard normal curve is often relabeled and called z-scores.
There are three areas on a standard normal curve that all introductory statistics students should know. The first is that the total area below (to the left of) 0.0 is .50, as the standard normal curve is symmetrical like all normal curves. This result generalizes to all normal curves in that the total area below the value of mu is .50 on any member of the family of normal curves.
The second area that should be memorized is between z-scores of -1.00 and +1.00. It is .68 or 68%.
The total area between plus and minus one sigma unit on any member of the family of normal curves is also .68.
The third area is between z-scores of -2.00 and +2.00 and is .95 or 95%.
This area (.95) also generalizes to plus and minus two sigma units on any normal curve.
Knowing these areas allow computation of additional areas. For example, the area between a z-score of 0.0 and 1.0 may be found by taking 1/2 the area between z-scores of -1.0 and 1.0, because the distribution is symmetrical between those two points. The answer in this case is .34 or 34%. A similar logic and answer is found for the area between 0.0 and -1.0 because the standard normal distribution is symmetrical around the value of 0.0.
The area below a z-score of 1.0 may be computed by adding .34 and .50 to get .84. The area above a z-score of 1.0 may now be computed by subtracting the area just obtained from the total area under the distribution (1.00), giving a result of 1.00 - .84 or .16 or 16%.
The area between -2.0 and -1.0 requires additional computation. First, the area between 0.0 and -2.0 is 1/2 of .95 or .475. Because the .475 includes too much area, the area between 0.0 and -1.0 (.34) must be subtracted in order to obtain the desired result. The correct answer is .475 - .34 or .135.
Using a similar kind of logic to find the area between z-scores of .5 and 1.0 will result in an incorrect answer because the curve is not symmetrical around .5. The correct answer must be something less than .17, because the desired area is on the smaller side of the total divided area. Because of this difficulty, the areas can be found using the probability calculator. Entering the following information will produce the correct answer
The following formula is used to transform a given normal distribution into the standard normal distribution. It was much more useful when area between and below a score was only contained in tables of the standard normal distribution. It is included here for both historical reasons and because it will appear in a different form later in this text.
The normal curve is an infinite number of possible probability models called a family of distributions. Each member of the family is described by setting the parameters (m and d) of the distribution to particular values. The members of the family are similar in that they share the same shape, are symmetrical, and have a total area underneath of 1.00. They differ in where the midpoint of the distribution falls, determined by m , and in the variability of scores around the midpoint, determined by d. The area between any two scores and the scores which cut off a given area on any given normal distribution can be easily found using the Probability Calculator provided with this text