Chapter 12
Score Transformations

If a student, upon viewing a recently returned test, found that he or she had made a score of 33, would that be a good score or a poor score? Based only on the information given, it would be impossible to tell. The 33 could be out of 35 possible questions and be the highest score in the class, or it could be out of 100 possible points and be the lowest score, or anywhere in between. The score that is given is called a raw score. The purpose of this chapter is to describe procedures to convert raw scores into transformed scores, which give meaning to the numbers and allow comparisons between scores made on different scales.

Why Do We Need to Transform Scores?

Converting scores from raw scores into transformed scores has two purposes:

The transformations discussed in this section belong to two general types; percentile ranks and linear transformations. Percentile ranks are advantageous in that the average person has an easier time understanding and interpreting their meaning. However, percentile ranks also have a rather unfortunate statistical property that makes their use generally unacceptable among the statistically sophisticated. This chapter will focus on percentile ranks. Linear transformations is the topic of Chapter 13.

     

Percentile Ranks Based on the Sample

A percentile rank is the percentage of scores that fall below a given score. For example, a raw score of 33 on a test might be transformed into a percentile rank of 98 and interpreted as "You did better than 98% of the students who took this test." In that case the student would feel pretty good about the test. If, on the other hand, a percentile rank of 3 was obtained, the student might wonder what he or she was doing wrong.

The procedure for finding the percentile rank is as follows:

The result is the percentile rank for that score.

It's actually easier to demonstrate and perform the procedure than it sounds. For example, suppose the obtained scores from 11 students were:

33 28 29 37 31 33 25 33 29 32 35

You want to know the percentile rank for the score of 31. The first step would be to rank order the scores from lowest to highest.

25 28 29 29 31 32 33 33 33 35 37

Computing the percentage falling below a score of 31, for example, gives the value 4/11 = .364 or 36.4%. The four in the numerator reflects that four scores (25, 28, 29, and 29) were less than 31. The 11 in the denominator is N, or the number of scores. The percentage falling at a score of 31 would be 1/11 = .0909 or 9.09%. The numerator being the number of scores with a value of 31 and the denominator again being the number of scores. One-half of 9.09 would be 4.55. Adding the percentage below to one-half the percentage within would yield a percentile rank of 36.4 + 4.55 or 40.95%. The computations are illustrated in the figure below.

Similarly, for a score of 33, the percentile rank would be computed by adding the percentage below (6/11=.5454 or 54.54%) to one-half the percentage within ( 1/2 * 3/11 = .1364 or 13.64%), producing a percentile rank of 68.18%. The 6 in the numerator of percentage below indicates that 6 scores were smaller than a score of 33, while the 3 in the percentage within indicates that 3 scores had the value 33. All three scores of 33 would have the same percentile rank of 68.18%. The computations are illustrated in the figure below.

The preceding procedure can be described in an algebraic expression as follows:

Computational Formula for Percentile Ranks

Application of this algebraic procedure to the score values of 31 and 33 would give the following results:

Illustration of Percentile Rank

Computing Percentile Ranks

Note that these results are within rounding error of the percentile rank computed earlier using the procedure described in words.

When computing the percentile rank for the smallest score, the frequency below is zero (0), because no scores are smaller than it. Using the formula to compute the percentile rank of the score of 25:

Computing a Percentile Rank

Computing the percentile rank for the largest score, 37, gives:

Computing Percentile Ranks

In the last two cases it has been demonstrated that a score may never have a percentile rank equal to or less than zero or equal to or greater than 100. Percentile ranks may be closer to zero or one hundred than those obtained if the number of scores was increased.

The percentile ranks for all the scores in the example data may be computed as follows:

Percentile Ranks based on the Sample
Score 25 28 29 29 31 32 33 33 33 35 37
Percentile Rank 4.6 13.6 27.3 27.3 40.9 50 68.2 68.2 68.2 86.4 95.4

         

Percentile Ranks Based on the Normal Curve

The percent of area below a score on a normal curve with a given mu and sigma provides an estimate of the percentile rank of a score. The mean and standard deviation of the sample estimate the values of mu and sigma. Percentile ranks can be found using the Probability Calculator by entering the mean, standard deviation, and score in the mu, sigma, and score boxes in the Area Below/Normal Curve option of the Probability Calculator.

25 28 29 29 31 32 33 33 33 35 37

In the example raw scores given above, the sample mean is 31.364 and the sample standard deviation is 3.414. Entering the appropriate values in the normal curve area program for a score of 29 in the Normal Curve Area program would yield a percentile rank based on the normal curve of 24% as demonstrated below.

Computing a Percentile Rank

To use the Probability Calculator to find percentile ranks based on the normal curve:

Percentile ranks based on normal curve area for all the example scores are presented in the table below.

Percentile Ranks based on the Normal Curve
Score 25 28 29 29 31 32 33 33 33 35 37
Percentile Rank 3 16 24 24 46 57 68 68 68 86 95

Probability Calculator        

Computing Percentile Ranks Based on the Normal Curve with SPSS

Percentile ranks based on normal area can be easily computed using SPSS. The first step is to enter the scores in a variable in a data file. In the example below, the variable has been labeled "x".

A raw data file in SPSS.

To find the mean and standard deviation in addition to adding as additional variable called "standard scores " to the data file, choose Analyze/Descriptive Statistics as shown:

 Commands generating descriptive statistics in SPSS.

Then click Descriptives, and the Descriptives dialog box appears. The variable "X" will appear in the left-hand box and should be clicked to the right-hand box by clicking the directional button in the middle of the boxes. Note that the Save Standardized Values as Variables box has been checked. Your screen should now look like this:

The Descriptives command in SPSS.

When you are ready, click on the OK button. This command produces two results. The first is a table of means and standard deviations for the "x" variable in the output screen. The second is the addition of a second variable, called "zx" in the data table. In general, the new variable name will be the original variable with a "z" in front of it. The data file will now look as follows.

The SPSS data file after the Descriptives command.

The next step is to find the area that falls below the value of "x" on a normal curve with mu and sigma equal to the mean and standard deviation of the scores, respectively. This is done in SPSS by means of the Compute command (click Transform/Compute). First, enter a name for the variable to be computed in the Target Variable box. In this case the name selected is "prnormal," a shortened form of "percentile rank using the normal distribution." Then, to place an algebraic expression in the Numeric Expression box, select the function you want from the Functions list by double-clicking it. In this case, the numeric expression will use the CDFNORM function that returns the area below the normal curve. In the parentheses following the CDFNORM function is the variable just created, "zx." The result will be in decimal form that may be converted to percentages by multiplying by 100. Be sure to move the "* 100" outside the right parenthesis; be sure the equation is entered exactly as shown. Click on OK and all values of "zx" will be converted to a value of "prnorm".

Computing percentile ranks based on the normal curve using SPSS

The result is a new variable called "prnormal" that is included in the data table.

Results of computing percentile ranks based on the normal curve using SPSS


Comparing the Two Methods of Computing Percentile Ranks

The astute student will observe that the percentile ranks based on the normal curve are somewhat different from those called percentile ranks based on the sample. That is because the two procedures give percentile ranks that are interpreted somewhat differently.

Comparing the Two Methods of Computing Percentile Ranks
Raw Score 25 28 29 29 31 32 33 33 33 35 37
Sample %ile 4.6 13.6 27.3 27.3 40.9 50 68.2 68.2 68.2 86.4 95.4
Normal Area %ile 3 16 24 24 46 57 68 68 68 86 95

The percentile rank based on the sample describes where a score falls relative to the scores in the sample distribution. That is, if a score has a percentile rank of 34 using this procedure, then it can be said that 34% of the scores in the sample distribution fall below it.

The percentile rank based on the normal curve, on the other hand, describes where the score falls relative to a hypothetical model of a distribution. That is a score with a percentile rank of 34 using the normal curve says that 34% of an infinite number of scores obtained using a similar method will fall below that score. The additional power of this last statement is not bought without cost, however, in that the assumption must be made that the normal curve is an accurate model of the sample distribution, and that the sample mean and standard deviation are accurate estimates of the model parameters mu and sigma. If one is willing to buy these assumptions, then the percentile rank based on normal area describes the relative standing of a score within an infinite population of scores.

An Unfortunate Property

Percentile ranks, as the name implies, is a system of ranking. Using the system destroys the interval property of the measurement system. That is, if the scores could be assumed to have the interval property before they were transformed, they would not have the property after transformation. The interval property is critical to interpret most of the statistics described in this text, i.e. mean, standard deviation, and variance, thus transformation to percentile ranks does not permit meaningful analysis of the transformed scores.

If an additional assumption of an underlying normal distribution is made, not only do percentile ranks destroy the interval property, but they also destroy the information in a particular manner. If the scores are distributed normally then percentile ranks underestimate large differences in the tails of the distribution and overestimate small differences in the middle of the distribution. This is most easily understood in an illustration:

Distorting a Distribution with Percentile Ranks

In the above illustration two standardized achievement tests with m =500 and d =100 were given. In the first, an English test, Suzy made a score of 500 and Johnny made a score of 600, thus there was a one hundred point difference between their raw scores. On the second, a Math test, Suzy made a score of 800 and Johnny made a score of 700, again a one hundred point difference in raw scores. It can be said then, that the differences on the scores on the two tests were equal, one hundred points each.

When converted to percentile ranks, however, the differences are no longer equal. On the English test Suzy receives a percentile rank of 50 while Johnny gets an 84, a difference of 34 percentile rank points. On the Math test, Johnny's score is transformed to a percentile rank of 97.5 while Suzy's percentile rank is 99.5, a difference of only two percentile rank points.

It can be seen, then, that a percentile rank has a different meaning depending upon whether it occurs in the middle of the distribution or the tails of a normal distribution. Differences in the middle of the distribution are magnified, differences in the tails are minimized.

The unfortunate property destroying the interval property precludes the use of percentile ranks by sophisticated statisticians. Percentile ranks will remain in widespread use in order to interpret scores to the layman, but the statistician must help in emphasizing and interpreting scores. Because of this unfortunate property, a different type of transformation is needed, one which does not destroy the interval property. This leads directly into the topic of the next chapter "Linear Transformations".


       

Summary

Transformed scores make meaningful sense out of raw scores, allowing for interpretation and comparison of the numbers. There are two general types of transformations: percentile ranks and linear transformations. This chapter discussed percentile ranks in detail.

A percentile rank based on the sample is the percentage of scores that fall below a given score. Converting a raw test score to a percentile rank based on the sample, one could say that Maryann scored better than 78% of the students who took the test.

A percentile rank based on the normal curve is the percentage of scores that fall below a hypothetical distribution of scores. Converting a raw test score to a percentile rank based on the normal curve, one could say that Maryann scored better that 83% of the students who would ever take the test. This powerful statement can only be made if certain assumptions are true, namely that the true distribution is a normal curve and that the sample mean and standard deviation are accurate estimates of the true parameters mu and sigma of the normal curve.

Both the percentile rank based on the sample and the percentile rank based on the normal curve destroy the interval property of the measurement system. If the interval property was satisfied before the transformation, it will not be so after the transformation. If the underlying distribution is a normal curve, the interval property is destroyed in a particular manner, with percentile ranks near the middle of the distribution meaning less change than percentile ranks in the tails of the distribution.