Score Transformations

If a student, upon viewing a recently returned test, found that he or she had made a score of 33, would that be a good score or a poor score? Based only on the information given, it would be impossible to tell. The 33 could be out of 35 possible questions and be the highest score in the class, or it could be out of 100 possible points and be the lowest score, or anywhere in between. The score that is given is called a raw score. The purpose of this chapter is to describe procedures to convert raw scores into transformed scores, which give meaning to the numbers and allow comparisons between scores made on different scales.

Why Do We Need to Transform Scores?

The transformations discussed in this section belong to two general types; percentile ranks and linear transformations. Percentile ranks are advantageous in that the average person has an easier time understanding and interpreting their meaning. However, percentile ranks also have a rather unfortunate statistical property that makes their use generally unacceptable among the statistically sophisticated. This chapter will focus on percentile ranks. Linear transformations is the topic of Chapter 13.

Q12.1

To ensure that a raw score of 57 means the same thing on two different tests, a ______ would be employed.
score transformation
percentile rank transformation
linear transformation
statistical analysis

Q12.2

One purpose of score transformations is to
allow comparisons of scores made on tests with different scales.
summarize and describe samples of scores.
predict some future score.
allow researchers to make rational decisions about the reality of effects.

Q12.3

The two major categories of scores transformations are
linear and percentile rank transformations.
regression models and hypothesis testing.
standard scores and regression models.
asymptotic hypo geometric and linear transformations.

Percentile Ranks Based on the Sample

A percentile rank is the percentage of scores that fall below a given score. For example, a raw score of 33 on a test might be transformed into a percentile rank of 98 and interpreted as "You did better than 98% of the students who took this test." In that case the student would feel pretty good about the test. If, on the other hand, a percentile rank of 3 was obtained, the student might wonder what he or she was doing wrong.

It's actually easier to demonstrate and perform the procedure than it sounds. For example, suppose the obtained scores from 11 students were:

You want to know the percentile rank for the score of 31. The first step would be to rank order the scores from lowest to highest.

Computing the percentage falling below a score of 31, for example, gives the value 4/11 = .364 or 36.4%. The four in the numerator reflects that four scores (25, 28, 29, and 29) were less than 31. The 11 in the denominator is N, or the number of scores. The percentage falling at a score of 31 would be 1/11 = .0909 or 9.09%. The numerator being the number of scores with a value of 31 and the denominator again being the number of scores. One-half of 9.09 would be 4.55. Adding the percentage below to one-half the percentage within would yield a percentile rank of 36.4 + 4.55 or 40.95%. The computations are illustrated in the figure below.

Similarly, for a score of 33, the percentile rank would be computed by adding the percentage below (6/11=.5454 or 54.54%) to one-half the percentage within ( 1/2 * 3/11 = .1364 or 13.64%), producing a percentile rank of 68.18%. The 6 in the numerator of percentage below indicates that 6 scores were smaller than a score of 33, while the 3 in the percentage within indicates that 3 scores had the value 33. All three scores of 33 would have the same percentile rank of 68.18%. The computations are illustrated in the figure below.

Application of this algebraic procedure to the score values of 31 and 33 would give the following results:

Note that these results are within rounding error of the percentile rank computed earlier using the procedure described in words.

When computing the percentile rank for the smallest score, the frequency below is zero (0), because no scores are smaller than it. Using the formula to compute the percentile rank of the score of 25:

In the last two cases it has been demonstrated that a score may never have a percentile rank equal to or less than zero or equal to or greater than 100. Percentile ranks may be closer to zero or one hundred than those obtained if the number of scores was increased.

The percentile ranks for all the scores in the example data may be computed as follows:

Percentile Ranks based on the Sample
Score	25	28	29	29	31	32	33	33	33	35	37
Percentile Rank	4.6	13.6	27.3	27.3	40.9	50	68.2	68.2	68.2	86.4	95.4

Q12.4

The computational procedure for finding the percentile rank based on the sample might be best described as
the percentage below plus one-half the percentage within.
the area below a normal curve model at the score value times one hundred.
the sum of the scores divided by the number of scores.
rank order the scores, find the range, and divide the range by the number of desired intervals.

Q12.5

In the set of data {11 12 12 14 15 15 15 18 19 19 19 21} the frequency within of a score of 19 would be:
3
19
8
11

Q12.7

In computing the percentile rank based on the sample, the highest score in the sample
will have the largest percentile rank, but always less than 100.
will have a percentile rank of 100.
will have a percentile rank that is computationally inaccessible to the statistically challenged.
will need a slightly different computational formula than the rest of the scores.

Q12.8

The percentile rank based on the sample of the highest score
will become closer to 100 as the sample size increases.
will equal 100.
will be equal to 97.5
will become smaller as the frequency below increases.

Percentile Ranks Based on the Normal Curve

The percent of area below a score on a normal curve with a given mu and sigma provides an estimate of the percentile rank of a score. The mean and standard deviation of the sample estimate the values of mu and sigma. Percentile ranks can be found using the Probability Calculator by entering the mean, standard deviation, and score in the mu, sigma, and score boxes in the Area Below/Normal Curve option of the Probability Calculator.

In the example raw scores given above, the sample mean is 31.364 and the sample standard deviation is 3.414. Entering the appropriate values in the normal curve area program for a score of 29 in the Normal Curve Area program would yield a percentile rank based on the normal curve of 24% as demonstrated below.

To use the Probability Calculator to find percentile ranks based on the normal curve:

Percentile ranks based on normal curve area for all the example scores are presented in the table below.

Percentile Ranks based on the Normal Curve
Score	25	28	29	29	31	32	33	33	33	35	37
Percentile Rank	3	16	24	24	46	57	68	68	68	86	95

Q12.9

The computational procedure for finding the percentile rank based on the normal curve might be best described as
the percentage below plus one-half the percentage within.
the area below a normal curve model at the score value times one hundred.
the sum of the scores divided by the number of scores.
rank order the scores, find the range, and divide the range by the number of desired intervals.

Q12.10

The computational procedure for finding the percentile rank based on the normal curve
destroys the interval property of the original scores.
requires no assumptions about the underlying theoretical distribution.
preserves the rational zero property of measurement.
is so difficult to use that it is seldom found.

Q12.11

The percentile rank based on the normal curve
describes the relative position of a score within a hypothetical probability model.
requires no assumptions about the underlying theoretical distribution.
describes the relative position of a score within a sample of scores.
is so difficult to use that it is seldom found.

Q12.12

The percentile rank based on a normal curve will accurately describe the position of a score within a hypothetical population given
the mean and standard deviation are accurate estimates of mu and sigma.
the mean and standard deviation are accurate estimates of the intercept and slope.
the underlying distribution is an asymptotic uniform distribution.
when the mean is approximately three times the size of the standard deviation.

Computing Percentile Ranks Based on the Normal Curve with SPSS

Percentile ranks based on normal area can be easily computed using SPSS. The first step is to enter the scores in a variable in a data file. In the example below, the variable has been labeled "x".

To find the mean and standard deviation in addition to adding as additional variable called "standard scores " to the data file, choose Analyze/Descriptive Statistics as shown:

Then click Descriptives, and the Descriptives dialog box appears. The variable "X" will appear in the left-hand box and should be clicked to the right-hand box by clicking the directional button in the middle of the boxes. Note that the Save Standardized Values as Variables box has been checked. Your screen should now look like this:

When you are ready, click on the OK button. This command produces two results. The first is a table of means and standard deviations for the "x" variable in the output screen. The second is the addition of a second variable, called "zx" in the data table. In general, the new variable name will be the original variable with a "z" in front of it. The data file will now look as follows.

The next step is to find the area that falls below the value of "x" on a normal curve with mu and sigma equal to the mean and standard deviation of the scores, respectively. This is done in SPSS by means of the Compute command (click Transform/Compute). First, enter a name for the variable to be computed in the Target Variable box. In this case the name selected is "prnormal," a shortened form of "percentile rank using the normal distribution." Then, to place an algebraic expression in the Numeric Expression box, select the function you want from the Functions list by double-clicking it. In this case, the numeric expression will use the CDFNORM function that returns the area below the normal curve. In the parentheses following the CDFNORM function is the variable just created, "zx." The result will be in decimal form that may be converted to percentages by multiplying by 100. Be sure to move the "* 100" outside the right parenthesis; be sure the equation is entered exactly as shown. Click on OK and all values of "zx" will be converted to a value of "prnorm".

The result is a new variable called "prnormal" that is included in the data table.

Comparing the Two Methods of Computing Percentile Ranks

The astute student will observe that the percentile ranks based on the normal curve are somewhat different from those called percentile ranks based on the sample. That is because the two procedures give percentile ranks that are interpreted somewhat differently.

Comparing the Two Methods of Computing Percentile Ranks
Raw Score	25	28	29	29	31	32	33	33	33	35	37
Sample %ile	4.6	13.6	27.3	27.3	40.9	50	68.2	68.2	68.2	86.4	95.4
Normal Area %ile	3	16	24	24	46	57	68	68	68	86	95

The percentile rank based on the sample describes where a score falls relative to the scores in the sample distribution. That is, if a score has a percentile rank of 34 using this procedure, then it can be said that 34% of the scores in the sample distribution fall below it.

The percentile rank based on the normal curve, on the other hand, describes where the score falls relative to a hypothetical model of a distribution. That is a score with a percentile rank of 34 using the normal curve says that 34% of an infinite number of scores obtained using a similar method will fall below that score. The additional power of this last statement is not bought without cost, however, in that the assumption must be made that the normal curve is an accurate model of the sample distribution, and that the sample mean and standard deviation are accurate estimates of the model parameters mu and sigma. If one is willing to buy these assumptions, then the percentile rank based on normal area describes the relative standing of a score within an infinite population of scores.

An Unfortunate Property

Percentile ranks, as the name implies, is a system of ranking. Using the system destroys the interval property of the measurement system. That is, if the scores could be assumed to have the interval property before they were transformed, they would not have the property after transformation. The interval property is critical to interpret most of the statistics described in this text, i.e. mean, standard deviation, and variance, thus transformation to percentile ranks does not permit meaningful analysis of the transformed scores.

If an additional assumption of an underlying normal distribution is made, not only do percentile ranks destroy the interval property, but they also destroy the information in a particular manner. If the scores are distributed normally then percentile ranks underestimate large differences in the tails of the distribution and overestimate small differences in the middle of the distribution. This is most easily understood in an illustration:

In the above illustration two standardized achievement tests with m =500 and d =100 were given. In the first, an English test, Suzy made a score of 500 and Johnny made a score of 600, thus there was a one hundred point difference between their raw scores. On the second, a Math test, Suzy made a score of 800 and Johnny made a score of 700, again a one hundred point difference in raw scores. It can be said then, that the differences on the scores on the two tests were equal, one hundred points each.

When converted to percentile ranks, however, the differences are no longer equal. On the English test Suzy receives a percentile rank of 50 while Johnny gets an 84, a difference of 34 percentile rank points. On the Math test, Johnny's score is transformed to a percentile rank of 97.5 while Suzy's percentile rank is 99.5, a difference of only two percentile rank points.

It can be seen, then, that a percentile rank has a different meaning depending upon whether it occurs in the middle of the distribution or the tails of a normal distribution. Differences in the middle of the distribution are magnified, differences in the tails are minimized.

The unfortunate property destroying the interval property precludes the use of percentile ranks by sophisticated statisticians. Percentile ranks will remain in widespread use in order to interpret scores to the layman, but the statistician must help in emphasizing and interpreting scores. Because of this unfortunate property, a different type of transformation is needed, one which does not destroy the interval property. This leads directly into the topic of the next chapter "Linear Transformations".

Q12.13

The percentile rank based on the sample compared to the percentile rank based on the normal curve
requires fewer assumptions about the world.
will produce identical results.
will result in larger transformed scores.
should be used in conjunction with a higher-order transformational procedure.

Q12.14

How do percentile ranks have a different meaning depending upon whether it occurs in the middle of the distribution or the tails of a normal distribution.
Differences in the middle of the distribution are magnified, differences in the tails are minimized.
Differences in the middle of the distribution are minimized, differences in the tails are maximized.
Differences in the middle of the distribution are linear, differences in the tails are non-linear.
Differences in the middle of the distribution are three times more likely to be significant than differences is the tails of the distribution.

Q12.15

A percentile rank transformation given that the underlying distribution is a normal curve will
Overestimate small differences in the tails of the distribution.
Underestimate large differences in the middle of the distribution.
both Overestimate small differences in the tails of the distribution and underestimate large differences in the middle of the distribution.
neither Overestimate small differences in the tails of the distribution nor underestimate large differences in the middle of the distribution.

Q12.16

Some pretest and posttest scores are given below for four students. Assuming that all scores were normally distributed, which student made the greatest improvement?
David: Pretest 10th percentile, Posttest: 19 percentile
Karen: Pretest 20th percentile, Posttest: 29 percentile
Morgan: Pretest 60th percentile, Posttest: 69th percentile
Sherri: Pretest 90th percentile, Posttest: 99 percentile

Summary

Transformed scores make meaningful sense out of raw scores, allowing for interpretation and comparison of the numbers. There are two general types of transformations: percentile ranks and linear transformations. This chapter discussed percentile ranks in detail.

A percentile rank based on the sample is the percentage of scores that fall below a given score. Converting a raw test score to a percentile rank based on the sample, one could say that Maryann scored better than 78% of the students who took the test.

A percentile rank based on the normal curve is the percentage of scores that fall below a hypothetical distribution of scores. Converting a raw test score to a percentile rank based on the normal curve, one could say that Maryann scored better that 83% of the students who would ever take the test. This powerful statement can only be made if certain assumptions are true, namely that the true distribution is a normal curve and that the sample mean and standard deviation are accurate estimates of the true parameters mu and sigma of the normal curve.

Both the percentile rank based on the sample and the percentile rank based on the normal curve destroy the interval property of the measurement system. If the interval property was satisfied before the transformation, it will not be so after the transformation. If the underlying distribution is a normal curve, the interval property is destroyed in a particular manner, with percentile ranks near the middle of the distribution meaning less change than percentile ranks in the tails of the distribution.