The Sampling Distribution

The sampling distribution is a theoretical distribution of a sample statistic. While the concept of a distribution of a set of numbers is intuitive for most students, the concept of a distribution of a set of statistics is not. Therefore distributions will be reviewed before the sampling distribution is discussed.

The Sample Distribution

The sample distribution is the distribution resulting from the collection of actual data. A major characteristic of a sample is that it contains a finite (countable) number of scores, the number of scores represented by the letter N. For example, suppose that the following data were collected:

Sample Data
32	35	42	33	36	38	37	33	38	36	35	34	37	40	38	36	35	31	37	36	33
36	39	40	33	30	35	37	39	32	39	37	35	36	39	33	31	40	37	34	34	37

These numbers constitute a sample distribution. Using the procedures discussed in Chapter 5, "Frequency Distributions," the following histogram can be constructed to picture this data:

In addition to the frequency distribution, the sample distribution can be described with numbers, called statistics. Examples of statistics are the mean, median, mode, standard deviation, range, and correlation coefficient, among others. Statistics, and procedures for computing statistics, were discussed in detail in an earlier chapters.

If a different sample were taken, different scores would result. The relative frequency polygon would be different, as would the statistics computed from the second sample. However, there would also be some consistency in that while the statistics would not be exactly the same, they would be similar. To achieve order in this chaos, statisticians have developed probability models.

Q17.1

The sample distribution
can be summarized and described with statistics.
contains an infinite number of scores.
is a mathematical equation relating the score value to the height of the curve.
is a theoretical distribution of a sample statistic.

Q17.2

The sample distribution
is different for every sample.
can be summarized and described using parameters.
will always have the same mean and standard deviation.
is a theoretical distribution of a sample statistic.

Probability Models: Population Distributions

Probability models exist in a theoretical world where complete information is available. As such, they can never be known except in the mind of the mathematical statistician. If an infinite number of infinitely precise scores were taken, the resulting distribution would be a probability model of the population.

The probability model may be described with pictures (graphs) which are analogous to the relative frequency polygon of the sample distribution. The two graphs below illustrate two types of probability models, the uniform distribution and the normal curve.

As discussed earlier in Chapter 9, "The Normal Curve," probability distributions are described by mathematical equations that contain parameters. Parameters are variables that change the shape of the probability model. By setting these parameters equal to numbers, a member of that probability model family of models results.

A critical aspect of statistics is the estimation of parameters with sample statistics. Sample statistics are used as estimators of the corresponding parameters in the population model. For example, the mean and standard deviation of the sample are used as estimates of the corresponding population parameters m and d. Mathematical statistics texts devote considerable effort to defining what is a good or poor parameter estimation procedure.

Q17.3

Parameters in a probability model can be estimated by
sample statistics.
the standard error of the mean.
parameterized equations.
results of a subjective probability analysis.

Q17.4

Statistics calculated on a sample of observations are:
estimates of a corresponding population model parameter.
symbolized with Greek letters.
exact values of the model parameters.
none of the above.

The Sampling Distribution

Note the "-ing" on the end of sample in sampling distribution. It looks and sounds similar to the sample distribution but, in reality the concept is much closer to a population model.

The sampling distribution is a theoretical distribution of a sample statistic. It is a model of a distribution of scores, like the population distribution, except that the scores are not raw scores, but statistics. It is a thought experiment. "What would the world be like if a person repeatedly took samples of size N from the population distribution and computed a particular statistic each time?" The resulting distribution of statistics is called the sampling distribution of that statistic.

For example, suppose that a sample of size sixteen (N=16) is taken from some population. The mean of the sixteen numbers is computed. Next a new sample of sixteen is taken, and the mean is again computed. If this process were repeated an infinite number of times, the distribution of the now infinite number of sample means would be called the sampling distribution of the mean.

Every statistic has a sampling distribution. For example, suppose that instead of the mean, medians were computed for each sample. The infinite number of medians would be called the sampling distribution of the median.

Just as the population models can be described with parameters, so can the sampling distribution. The expected value (analogous to the mean) of a sampling distribution will be represented here by the symbol m (mu). The m symbol is often written with a subscript to indicate which sampling distribution is being discussed. For example, the expected value of the sampling distribution of the mean is represented by the symbol

that of the median by

, and so forth. The value of

can be thought of as the mean of the distribution of means. In a similar manner the value of

is the mean of a distribution of medians. They are not really means, because it is not possible to find a mean when N=¥ , but they are the mathematical equivalent of a mean.

Using advanced mathematics, in a thought experiment, the theoretical statistician often discovers a relationship between the expected value of a statistic and the model parameters. For example, it can be proven that the expected value of both the mean and the median,

and M_d, is equal to

. When the expected value of a statistic equals a population parameter, the statistic is called an unbiased estimator of that parameter. In this case, both the mean and the median would be an unbiased estimator of the parameter

A sampling distribution may also be described with a parameter corresponding to a variance, symbolized by

. The square root of this parameter is given a special name, the standard error. Each sampling distribution has a standard error. In order to keep them straight, each has a name tagged on the end of "standard error" and a subscript on the d symbol. The standard deviation of the sampling distribution of the mean is called the standard error of the mean and is symbolized by

. Similarly, the standard deviation of the sampling distribution of the median is called the standard error of the median and is symbolized by

In each case the standard error of a statistic describes the degree to which the computed statistics will differ from one another when calculated from sample of similar size and selected from similar population models. The larger the standard error, the greater the difference between the computed statistics. Consistency is a valuable property to have in the estimation of a population parameter, as the statistic with the smallest standard error is preferred as the estimator of the corresponding population parameter, everything else being equal. Statisticians have proven that in most cases the standard error of the mean is smaller than the standard error of the median. Because of this property, the mean is the preferred estimator of

Q17.5

A sampling distribution
is a theoretical distribution of a sample statistic.
is identical to the sample distribution, except for the ending.
is a real distribution containing sample statistics.
is an interesting theoretical construct, but has few applications in real life.

Q17.6

A theoretical distribution of sample medians would be called
the sampling distribution of the median.
the sample distribution of the median.
a bunch of medians.
the standard error of the mean.

Q17.7

When the expected value of a statistic equals a population parameter, the statistic is called
an unbiased estimator.
a good parameter estimate.
the true value of the parameter.
a subjective probability estimate.

Q17.8

The standard error of the mean is ____ the standard error of the median.
smaller than
equal to
larger than
unrelated to

Q17.9

In general, the smaller the standard error of a statistic
the more consistent the statistic will be from sample to sample.
the smaller the standard deviation of the sample.
the greater the fluctuations in the probability estimates.
the larger the correlation coefficient.

Q17.10

As sample size (N) increases the sample standard deviation (sx)
is more stable.
increases.
decreases.
none of the above.

Q17.11

Which of the following does not belong:
sample statistic
model parameter
standard error of the mean
sigma

The Sampling Distribution of the Mean

The sampling distribution of the mean is a distribution of sample means. This distribution may be described with the parameters

and

These parameters are closely related to the parameters of the population distribution, with the relationship being described by the Central Limit Theorem. The Central Limit Theorem essentially states that the mean of the sampling distribution of the mean (

) equals the mean of the population model (

and that the standard error of the mean (

equals the standard deviation of the population model (

) divided by the square root of N as the sample size gets infinitely larger (N-> ¥). In addition, the sampling distribution of the mean will approach a normal distribution. The following equations summarize these relationships:

The astute student probably noticed, however, that the sample size would have to be very large (¥) in order for these relationships to hold true. In theory, this is fact; in practice, an infinite sample size is impossible. The Central Limit Theorem is very powerful. In most situations encountered by behavioral scientists, this theorem works reasonably well with an N greater than 10 or 20. Thus, it is possible to closely approximate what the distribution of sample means looks like, even with relatively small sample sizes.

The importance of the Central Limit Theorem to statistical thinking cannot be overstated. Most of hypothesis testing and sampling theory is based on this theorem. In addition, it provides a justification for using the normal curve as a model for many naturally occurring phenomena. If a trait, such as intelligence, can be thought of as a combination of relatively independent events, in this case both genetic and environmental, then it would be expected that the trait would be normally distributed in a population.

Computer Simulation of Sampling Distribution

The purpose of the microcomputer simulation exercise SIM-SAM is to demonstrate how a sampling distribution is created. The following figure shows the opening screen:

Although it is possible to skip directly to the test mode, you should spend some time familiarizing yourself in the Learn Mode.

To run a simulation, select a distribution, a value for either the Range or Sigma, and a sample size and then click the Sample button. Use the scrollbars to change the values for Range, Sigma, and Sample Size. When you click the Sample button, the computer generates 100 samples of the sample size selected, computes the mean for each sample, and then draws the sampling distribution of the mean below the population model. You should verify that the sampling distribution changes as a function of the type of population model, the variability of the population model, and the sample size. In addition, verify that the shape of the sampling distribution of the mean approaches a normal distribution as the sample size increases no matter what the population model looks like.

When you understand and are comfortable with SIM-SAM's Learn Mode, click Exit and proceed to the Test Mode. The Test Mode screen looks like this:

On each trial, you are presented with a population model and a sample size. You must guess which of the four potential sampling distributions will be closest to the sampling distribution of the mean that is generated by the computer. Click the button next to the graph you select, and the computer will generate 100 samples, compute the mean for each sample, draw the sampling distribution of the mean in the labeled box, and compare the observed sampling distribution of the mean with each of the four possibilities. Using a measure of "goodness of fit", the computer will select the distribution that is closest to the actual distribution. If that distribution is the one that you selected, both the trial counter and the correct counter will be incremented by one; if not, only the trial counter will be incremented.

The number of points given in this assignment will be the number appearing in the "Maximum Correct" box with a maximum of eight. When you are satisfied with the score click on Exit and Update Score.

Q17.12

The mean of a smaller size sample (N) could be closer to the true parameter (() of a population model than that of the mean of a larger sample:
only if the larger sample was biased.
only if the proper degrees of freedom were used in the smaller sample and the formula for the population value were used in the larger sample.
in no case.
by chance (sampling error).
none of the above

Q17.13

The standard error of the mean changes as a result of changes in the:
theoretical standard deviation (sigma) of the population model.
size of the sample (N).
both the theoretical standard deviation (sigma)of the population model and sample size (N)
neither a nor

Q17.14

The Central Limit Theorem states that the sampling distribution of the mean approaches a normal distribution
as N approaches infinity.
only if the population distribution is a normal distribution.
if the height of the abscissa is approximately the length of the ordinate.
only during a full moon.

Q17.15

Which of the following statements is most correct with regard to sample size and the sampling distribution of the mean?
As sample size increases, so also does the standard error of the mean.
By enlarging N, you should expect the distribution of sample means to more closely approximate normality.
With an increase in sample size, the mean and the variance of a sample should also become larger.
The variance of the sampling distribution approximates the variance of the population.
none of the above.

Q17.16

The standard error of the mean is the
sample standard deviation.
standard mean of the errors.
errors in a standard normal distribution.
the error in the assumption that the mean is always a useful statistic.
theoretical standard deviation of a sampling distribution of means.

Q17.17

The Central Limit Theorem
is extremely important in statistical theory.
is an interesting theoretical construct, but of little practical value.
relates the sampling distribution of the median to the sampling distribution of the mean.
relates probability theory to sampling distributions.

Q17.18

The Central Limit Theorem is a theoretical justification for
using the normal curve as a model of many phenomena.
using classical hypothesis testing in preference to Bayesian statistics.
assuming the interval property is satisfied in many real life situations.
drawing to an inside straight when the family farm is on the table.

Summary

The sampling distribution, a theoretical distribution of a sample statistic, is a critical component of hypothesis testing. The sampling distribution allows the statistician to hypothesize about what the world would look like if a statistic was calculated an infinite number of times.

A sampling distribution exists for every statistic that can be computed. Like models of relative frequency distributions, sampling distributions are characterized by parameters, two of which are the theoretical mean and standard deviation. The theoretical standard deviation of the sampling distribution is called the standard error and describes how much variation can be can be expected given certain conditions are met, such as a particular sample size.

Of considerable importance to statistical thinking is the sampling distribution of the mean, a theoretical distribution of sample means. A mathematical theorem, called the Central Limit Theorem, describes the relationship of the parameters of the sampling distribution of the mean to the parameters of the probability model and sample size. The Central Limit Theorem is the theoretical foundation of the gaming industry and insurance companies. The Central Limit Theorem is also the foundation for hypothesis tests that measure the size of the effect using means.

32	35	42	33	36	38	37	33	38	36	35	34	37	40	38	36	35	31	37	36	33
36	39	40	33	30	35	37	39	32	39	37	35	36	39	33	31	40	37	34	34	37

32	35	42	33	36	38	37	33	38	36	35	34	37	40	38	36	35	31	37	36	33
36	39	40	33	30	35	37	39	32	39	37	35	36	39	33	31	40	37	34	34	37

32	35	42	33	36	38	37	33	38	36	35	34	37	40	38	36	35	31	37	36	33
36	39	40	33	30	35	37	39	32	39	37	35	36	39	33	31	40	37	34	34	37