Imagine, if you will, that you have just been elected mayor of a medium-sized city. You like your job; people recognize you on the street and show you the proper amount of respect. You are always being asked to out lunch and dinner, etc. You want to keep your job as long as possible.
In addition to electing you mayor, the electorate voted for a new income tax at the last election. In an unprecedented show of support for your administration, the amount of the tax was left unspecified, to be decided by you (this is a fantasy!) You know the people of the city fairly well, however, and they would throw you out of office in a minute if you taxed them too much. If you set the tax rate too low, the effects of this action might not be as immediate, as it takes some time for the city and fire departments to deteriorate, but just as certain.
You have a pretty good idea of the amount of money needed to run the city. You do not, however, have more than a foggy notion of the distribution of income in your city. The IRS, being the IRS, refuses to cooperate. You decide to conduct a survey to find the necessary information.
Since there are approximately 150,000 people in your city, you hire 150 students to conduct 1000 surveys each. It takes considerable time to hire and train the students to conduct the surveys. You decide to pay them $5.00 a survey, a considerable sum when the person being surveyed is a child with no income, but not much for the richest man in town who employs an army of CPAs. The bottom line is that it will cost approximately $750,000, or close to three-quarters of a million dollars to conduct this survey.
After a considerable period of time has elapsed, (because it takes time to conduct that many surveys,) your secretary announces that the task is complete. Boxes and boxes of surveys are placed on your desk.
You begin your task of examining the figures. The first one is $33,967, the next is $13,048, the third is $309,339 etc. Now the capacity for human short-term memory is approximately five to nine chunks (7 plus or minus 2, (Miller, 1963). What this means is that by the time you are examining the tenth income, you have forgotten one of the previous incomes, unless you put the incomes in long-term memory. Placing 150,000 numbers in long term memory is slightly overwhelming so you do not attempt that task.
In an alternative ending to the fantasy, suppose you had at one time in your college career made it through the first half of an introductory statistics course. This part of the course covered the descriptive function of statistics. That is, a procedure for organizing and describing sets of data.
Basically, there are two methods of describing data: pictures and numbers. Pictures of data are called frequency distributions and make the task of understanding sets of numbers cognitively palatable. Summary numbers may also be used to describe other numbers, and are called statistics. An understanding of what two or three of these summary numbers mean allows you to have a pretty good understanding of what the distribution of numbers looks like. In any case, it is easier to deal with two or three numbers than with 150,000.
After organizing and describing the data, you make a decision about the amount of tax to implement. Everything seems to be going well until an investigative reporter from the local newspaper prints a story about the three-quarters of a million dollar cost of the survey. The irate citizens immediately start a recall petition. You resign the office in disgrace before you are thrown out.
If you had only completed the last part of the statistics course in which you were enrolled, you would have understood the basic principles of the inferential function of statistics. Using inferential statistics, you can take a sample from the population, describe the numbers of the sample using descriptive statistics, and infer the population distribution. Granted, there is a risk of error involved, but if the risk can be minimized the savings in time, effort, and money is well worth the risk.
In the preceding fantasy, suppose that rather than surveying the entire population, you randomly selected 1000 people to survey. This procedure is called sampling from the population and the individuals selected are called a SAMPLE. If each individual in the population is equally likely to be included in the sample, the sample is called a random sample.
Now, instead of 150 student surveyors, you only need to hire 10 surveyors, who each survey 100 citizens. The time taken to collect the data is a fraction of that taken to survey the entire population. Equally important, now the survey costs approximately $5000, an amount that the taxpayers are more likely to accept.
At the completion of the data collection, the descriptive function of statistics is used to describe the 1000 numbers, as it is still necessary to organize and describe the 1000 numbers, but an additional analysis must be carried out to generalize (infer) from the sample to the population.
Some reflection on your part suggests that it is possible that the sample contained 1000 of the richest individuals in your city. If this were the case, then the estimate of the amount of income to tax would be too high. Equally possible is the situation where 1000 of the poorest individuals were included in the survey (the bums on skid row), in which case the estimate would be too low. These possibilities exist through no fault of yours or the procedure utilized. They are said to be due to chance; a distorted sample just happened to be selected.
The beauty of inferential statistics is that the amount of probable error, or likelihood of either of the above possibilities, may be specified. In this case, the possibility of either of the above extreme situations actually occurring is so remote that they may be dismissed. However, the chance that there will be some error in our estimation procedure is pretty good. Inferential statistics will allow you to specify the amount of error with statements like, "I am 95 percent sure that the estimate will be within $200 of the true value." You are willing to trade the risk of error and inexact information because the savings in time, effort, and money are so great.
At the conclusion of the fantasy a grateful citizenry makes you king (or queen). You receive a large salary increase and are elected to the position for life. You may continue this story any way that you like at this point ...