Louis Guttman’s Contributions to Classical Test Theory

Donald W. Zimmerman

Carleton University

Richard H. Williams

University of Miami

Bruno D. Zumbo

University of British Columbia

Donald Ross

New York Psychiatric Institute

            Louis Guttman (1916-1987) is widely recognized for his many outstanding contributions to quantitative methods in the social sciences. Papers by Guttman began to appear as early as the 1930’s and 1940’s and continued in abundance over a span of a half century and more. His competence encompassed the following areas: scaling theory, partial order scalogram analysis (POSA), multidimensional scalogram analysis (MSA), facet theory and the related analysis of methods of theory construction and establishment of scientific lawfulness, similarity structure analysis (SSA), factor analysis, including the RADEX concept, and reliability theory (Dancer, 1990; Johnson and Kotz, 1997). Some of the research specialties listed above were created de novo by Guttman in the course of his own investigations and were in time pursued by many other researchers in the social sciences.

            These methods, except the last, are well known to quantitative theorists in sociology, and, to a lesser extent, to workers in psychology and education. Even students in introductory courses know something of the “Guttman scale” or “scalogram.” Psychology texts sometimes mention the Guttman scale in passing, after previously having defined the four scales explicated by S.S. Stevens¾nominal, ordinal, interval, and ratio¾that were thought to be of primary importance in psychological research. In expanded treatments, Likert and Thurstone scaling usually are given more attention than Guttman’s approach.

            Just before his death in 1987, Guttman participated in a debate concerned with the relevance of factor analysis to the study of group differences. Initially, he submitted a rather long rejoinder to a paper by Arthur Jensen (1985) that had been published in Behavioral and Brain Sciences. The rejoinder was rejected by the journal with suggestions for revision, but Guttman never revised it. Later, the manuscript was resuscitated by the editor of Multivariate Behavioral Research and published in that journal, where it ignited some controversy. It came to be known as “Guttman’s last paper” (Mulaik, 1992; Guttman, 1992). Interestingly, in the paper Guttman, a major innovator in factor analysis, argued that factor analysis is misunderstood when employed in the study of group differences and actually is not relevant to those studies. An entire issue of Multivariate Behavioral Research in 1992 was devoted to commentaries and discussion of “Guttman’s last paper.” See Gustafsson (1992), Guttman (1992), Jensen (1992a, 1992b), Loehlin (1992a, 1992b), Roskam and Ellis (1992), and Schönemann (1992a, 1992b) for further details.

            The present paper is concerned with another sphere in which Guttman made extensive contributions ¾the classical theory of educational and psychological tests. Unfortunately, these contributions have been somewhat neglected. Most psychologists are familiar with the major figures in the field, beginning with Charles Spearman in the earliest years of the twentieth century, but not too many are aware of Guttman’s work. We shall see that Guttman initiated a line of investigation that diverged from the mainstream approaches of Spearman and others who extended Spearman’s approach. These ideas were presented in two papers in Psychometrika in 1945 and 1953.

            Beginning in the 1960’s, Guttman’s work in test theory became more widely known, although it has seldom been given the credit it deserves. The present paper focuses attention on Guttman’s theoretical contributions and how they are related to the better known theories of Lord, Novick, Cronbach, Rozeboom, and others who became prominent in the later part of the century. Our primary goal is to outline Guttman’s definitions of basic theoretical concepts underlying psychological testing, as well as his axiomatics and proof of important theorems, and demonstrate their superiority to alternative approaches. We shall be concerned largely with the theoretical treatment of true scores, error scores, test reliability, test validity, parallel tests, attenuation produced by error of measurement, coefficient alpha, and how these concepts are interrelated.

Guttman’s Definitions of True Scores, Error Scores, and Test Reliability

            In 1966, Melvin Novick published a paper in the Journal of Mathematical Psychology entitled “The axioms and principal results of classical test theory,” in which he described the status of the theory as follows:

“The classical theory of mental tests has a long and distinguished history of application to the technology of test construction and test utilization. The most detailed statement of this theory appears in Gulliksen (1950). This theory, however, suffers from some imprecision of statement so that, from time to time, controversies arise that appear to raise embarrassing questions concerning its foundations.” (Novick, 1966, p. 1).

In the same paper, Novick presented a carefully crafted set of axioms for test theory and affirmed that his methodology could be viewed as “following an approach due to Guttman” and that it “fills the gap left by Guttman.” He also referred to “Guttman’s perhaps incomplete discussion.” Furthermore, Novick recognized that “this aspect of Guttman’s work seems to have been ignored by subsequent writers.”

            Unfortunately, this neglect persisted. Many authors, even to the present day, have discussed the expected value concept of true score and other central test-theory ideas in the context of the work of Lord and Novick published in 1966 and 1968 and later authors dealing with mental test theory and have not been aware of Guttman’s earlier contribution.

            Test theory in the form presented by Gulliksen (1950) in Theory of Mental Tests was based on the work of Spearman (1904, 1910). Gulliksen stated on the first page of his book: “Nearly all of the basic formulas that are particularly useful in test theory are found in Spearman’s early papers.” However, the well-known notation and terminology that postulated observed scores consisting of a sum of true scores and error scores was introduced by George Udny Yule in a letter to Spearman (reported by Guilford in 1936). In his formulation of Spearman’s ideas, Yule assumed that error scores have a mean of zero and are uncorrelated with true scores.

Later, Spearman suggested that Yule made more assumptions than needed, and Guttman indeed confirmed this suspicion many years later. Nevertheless, during the first four decades of the twentieth century almost all psychologists, educators, and other social scientists in England and the USA concerned with mental test theory adopted Yule’s terminology and notation (see also, Yule, 1912; Yule and Kendall, 1940).

            Guttman’s first departure from conventional wisdom concerned the mathematical procedure by which true scores and error scores are defined in the model. In 1945 he published a paper in Psychometrika entitled “A basis for analyzing test-retest reliability.” This title is somewhat misleading, because the paper was not concerned with practical methods of finding reliability by administering a test twice. Rather it was a wide-ranging treatment of the basic postulates of the classical test-theory model. Perhaps the title “An alternative to Spearman” would have been more descriptive.

            Test theorists regard an observed score as a sum of true and error components, , where the possible values of these three variables extended over an entire group of examinees. Next, they proceeded to derive the familiar expressions for reliability,

                                                         

Yule and many other people believed that it is necessary to assume that true scores and error scores are uncorrelated, that the mean error score is zero, and that error scores are uncorrelated with each other. Otherwise, it would be necessary to incorporate unwieldy additional terms in the expressions for the variance of a sum, and related formulas, such as

                                                 

the last term of which could not be allowed to vanish.

            In his 1945 paper, Guttman, focused attention on the variability that would arise, not from a single administration of a test to a group of examinees, but from a single examinee taking a test on repeated occasions. From this point of view, the observed score of individual i can be written as  and the variance of that individual’s observed scores over hypothetical independent, repeated measurements as . Furthermore, Guttman defined an individual’s true score as the expectation, or expected value, of that individual’s observed score with respect to that same distribution,

                                                              

In this single subscript notation, i = 1,2, …, n, where n is the total number of persons taking the same test.

            An individual’s error score is  where the lower case e, is used here to prevent confusion with the expectation symbol. Error is simply the difference between a random variable and its expectation. Incidentally, this notation is not Guttman’s, although the conceptual development is the same, as will be explained in more detail later.

            Two things are notable about this approach. First, there is nothing special or unusual about this notion of a retest or a repeated measurement itself, which is a purely hypothetical concept. It has the same status as the notion of independent, repeated tosses of a coin or repeated throws of a die. In probability and statistics, it is convenient to conceptualize a sequence of independent, identically distributed random variables,  to represent random samples. At this stage in the construction of the model, it is not necessary to say anything about whether or not repeated administrations of a test are feasible in practice, or whether or not scores on such practical occasions are independent. Second, because each  is a constant, , that is, the variance of an individual’s error score equals the variance of the same individual’s observed score.

            Guttman’s ideas during this early period in evolution of the discipline were harmonious with the mathematics of probability and statistics. They were less compatible with the development undertaken by test theorists who followed in the path of Spearman. On the other hand, it is likely that neither Guttman nor the mathematical statisticians of the time would have pursued these ideas in depth without the previous work of Spearman and others in attending to the special concerns of psychological testing.

            Next, Guttman proceeded to define reliability as

                                                     

where the expectation now is taken over all the individual observed variances (or, what is the same, the individual error variances) in the entire group of examinees and  denotes the variance of observed scores over the entire group. It is easy to show that the expectation is equal to the more familiar total error variance, and that Guttman’s definition is equivalent to the one given by Yule, Gulliksen, and others.

            The Guttman approach has the noteworthy advantage of not requiring an assumption that true scores and error scores are uncorrelated. Unlike earlier developments, the zero correlation between true scores and error scores can be derived from the initial definitions in the model and is not an empirical assumption. Later, Novick (1966) and Lord and Novick (1968) emphasized this fact and adopted the same concept of true score as the expectation of an individual’s observed score. They referred to the distribution of repeated measurements as a propensity distribution and incorporated it into the model in essentially the same way that Guttman had done. Lord and Novick defined test reliability as  the squared correlation between observed scores and true scores over the entire group. All four of the definitions of reliability given above are mathematically equivalent when the true score is defined as above, except for the detail that the Lord and Novick expression is undefined if

            Some further comments about notation are in order. Instead of the one used above, Guttman actually employed a rather cumbersome triple subscript notation, using the indices i, j, and k, where i denoted items or subtests, j persons, and k trials in a sequence of repeated trials. The introduction of items and subtests into the model will be discussed later. Lord and Novick (1968) used a similar unwieldy notation in defining a true score and test reliability.

            It should be recognized that “repeated measurements” (or “repeated trials” in probability theory) are usually implicit in statistical models and do not require special notation. For example, if X is a random variable having two outcomes, 0 and 1, with probabilities .5 and .5, one writes simply  and not , for the expected value of X. An index referring to repeated trials is unnecessary, because a random variable incorporates the concept of a sample space and sample points with assigned probabilities in its definition. Nevertheless, the explicit identification of test scores and their components as random variables that can be related to definitions in probability theory and sampling theory was an important step.

            The decomposition of a test score into components, X = T + E, postulated by Spearman, Yule, and writers of early textbooks at first seemed straightforward. In probability theory it is commonplace to investigate a random variable that is the sum of other random variables defined on the same probability space. In test theory, however, ambiguity arises in conceptualizing the probability space on which T and E are defined. It is necessary to consider a space that encompasses the variability of persons in a group in addition to the variability of individual measurements.

            In discussing a single score, it is natural to focus attention on the variance of the propensity distribution: An individual measurement is precise if  is small relative to some predetermined standard. But the conventional reliability coefficient does not indicate the precision of an individual measurement. The Guttman formula indicates that reliability is the complement of the ratio , which can be regarded as the average imprecision of a group of scores relative to the total variability. Guttman’s approach is a bridge between classical test theory and recent approaches that replace the concept of reliability with that of precision or information as employed in item response theory (Collins, 1999; Kane, 1996; Mellenbergh, 1996, 1999; Raykov, 1992, 1999).

            Guttman brought to the study of tests and measurements a type of abstract thinking that was hardly new in probability and mathematical statistics, but which aided in the clear formulation of problems that test theorists faced. His definitions of true scores and error scores, with emphasis on an abstract “propensity” distribution of hypothetical repeated measures, is one example. However, it is somewhat curious that Guttman rejected the notion of parallel tests because of practical difficulties associated with the concept (1945, 1953). In criticizing this notion, he remarked that each parallel test form would have a different reliability coefficient, which is not true when parallelism is defined with more precision than was done during that period. Later, Novick (1966), Lord and Novick (1968), and others, showed that parallel measurements can be abstractly defined and integrated with the rest of the theory in an unambiguous way. In fact, the concept is essentially the same as exchangeable random variables in probability theory. See, for example, De Finetti’s (1975) introduction of this notion and other texts in probability theory (Loève, 1963; Rényi,1970).

Guttman’s Contributions to the Theory of Composite Measurements

            Over the years, a great deal has been written about coefficient alpha, also called Cronbach’s alpha (Cronbach, 1951), as a formula for determining the reliability of composite measurements. These “composites” may consist of individual test items, or subtests which themselves contain more than one item. Coefficient alpha has played a major role in practical methods for determining reliability, sometimes being the only method reported in published studies. It has also been considered to be an indication of the homogeneity of test items and has been related to the factor structure of a test. Many research studies have attempted to explicate the meaning of the formula and to find conditions under which it is suitable for use in practice. However, test theorists are still undecided to this day about the circumstances under which coefficient alpha is appropriate.

            Less widely recognized is the fact that Guttman contributed extensively to this line of investigation when it was first initiated in the 1930’s and 1940’s, before the name “coefficient alpha” was in vogue. In recent decades, some test theorists have lost sight of the early work of Kuder and Richardson (1937) and Rulon (1939), as well as that of Guttman, contained in the same 1945 paper cited earlier. The “Kuder-Richardson formula 20” and the “Kuder-Richardson formula 21,” as they were dubbed in the literature, resulted from attempts to determine reliability from a single administration of a test. This quest held considerable appeal for many people, because the test-retest and parallel-forms methods were time-consuming and expensive. The Kuder-Richardson approach was seized upon as a way of reducing testing time and saving money in the important task of assessing reliability.

            Guttman called attention to some rather simple properties of statistical estimates based on random samples which reveal that this search is doomed to failure. He stated

"A fundamental fact concerning unreliability is that, in general, it cannot be estimated from only a single trial. Two or more trials are needed to prove the existence of variation in the score of a person on an item, and to estimate the extent of such variation if there is any. The experimental difficulties in obtaining independent trials have led to many attempts to estimate the reliability of a test from only a single trial by bringing in various hypotheses. Such hypotheses usually do not afford a real solution, since ordinarily they cannot be verified without the aid of at least two independent trials, which is precisely what they are intended to avoid." (Guttman, 1945).

Nevertheless, this argument did not stop other theorists from trying to find ways to estimate reliability from a single administration of a test. Coefficient alpha eventually became enormously popular despite Guttman’s pessimism.

            In the same 1945 paper, Guttman derived several formulas for lower bounds on test reliability that were indeed applicable to a single administration of a test. In a footnote in his review of Gulliksen’s Theory of mental tests, Guttman expressed the notion succinctly as follows:

“We shall mean by retest theory what Cronbach calls ‘hypothetical retest with zero time.’ If a test is actually repeated twice on the same population, then each trial has its own retest coefficient, since the situation may change between trials. The kind of coefficient we call here ‘retest’ implies no change in situation. ‘No change’ can always be guaranteed by making but one empirical trial under the conditions of interest; then one can use a formula for computing a lower bound to the retest coefficient. Lower bound formulas give correct information about what would happen in an infinite number of trials under unchanged conditions, and fortunately require but a single empirical trial for calculating their statistics.” (Guttman, 1953).

            Later, Novick and Lewis (1967) employed lower bounds, as well as the same mathematical techniques used by Guttman, to derive necessary and sufficient conditions under which coefficient alpha equals test reliability. The meaning of the Novick and Lewis proof has not been widely appreciated in subsequent studies of coefficient alpha, which continued to view it as an estimate of a parameter. In one sense, their result reveals that under specified conditions the value of coefficient alpha does in fact equal test reliability as defined in classical test theory. (See the discussion in the last section.)

            On the other hand, Guttman’s reservations were confirmed in the sense that these conditions are quite restrictive and probably violated frequently. Furthermore, as Guttman realized, the value of coefficient alpha depends to a great extent on sampling and fluctuates from one test administration to another. It is only the population value of coefficient alpha, estimated from sample data on one trial, that is mathematically equal to test reliability as independently defined, that is, “what would happen in an infinite number of trials under unchanged conditions.”

The veracity of Guttman’s approach to reliability coefficients and lower bounds based on item statistics were borne out in later studies (Ten Berge and Zegers, 1978; Jackson, 1979; Raju, 1979). Nevertheless, many textbooks in psychology and education as well as practical investigations of test reliability, continued to regard coefficient alpha as an estimator.

            In the years to follow, many test theorists began to think of coefficient alpha as an measure of internal consistency or of item homogeneity and became less confident that it estimates test reliability like the classical methods which require repeated test administrations or multiple forms. Also, Monte Carlo studies disclosed considerable variability as well as bias when the conditions identified by Novick and Lewis are violated (Zimmerman, Zumbo, and Lalonde, 1993). Finally, the coefficient was found to be inaccurate if error scores on test items or subtests are correlated, as will be explained below. In reading Guttman’s 1945 and 1953 papers today, one has the impression that some false starts and blind alleys could have been avoided if more attention had been paid to them at the time they were written.

Guttman’s Recognition of Correlated Errors of Measurement

            In 1953 Guttman published a paper entitled “Reliability formulas that do not assume experimental independence” that addressed an assumption that had previously been standard in test theory. “Experimental independence” connotes statistical independence, and consequently zero correlation, among some of the components of test scores discussed in the previous two sections. During the early period of test theory, investigators often tried to support this kind of assumption by empirical arguments, that is, by reasoning about what could or could not happen when a test is administered. A typical argument was “it is reasonable to believe that a does not influence b during a testing session, and therefore the error correlation is zero.”

            As mentioned before, Guttman emphasized that it is not necessary to stipulate that true scores and error scores are uncorrelated as an axiom in the model. Previously, test theorists had taken  to be a required assumption. On the contrary, if the “expected value” concept of true score is adopted initially, the zero correlation between true scores and error scores can be derived as a theorem. The proof of this result is quite simple, although it had been overlooked by earlier investigators.

            At the same time, Guttman recognized that this reasoning cannot be extended to error scores on two distinct tests or two parts of the same test. If the subscripts 1 and 2 refer to distinct tests or parts of a test, then  can be proved from the axioms, but  cannot. Therefore, any formula in test theory depending on the proposition for its proof is not a tautology, but must be considered to be conditional on the truth of that proposition.

            In his 1953 paper, Guttman observed that the reliability formulas for composite tests mentioned in the last section, including the Kuder-Richardson formulas, do in fact depend on a zero correlation between the error scores of test items. In the paper, he extended the approach to lower bounds in his 1945 paper and derived formulas that do not involve the assumption of uncorrelated error scores, although these formulas were not widely used subsequently. Nevertheless, this step was important in clarifying the status of the theory from a purely mathematical point of view and for encouraging caution in the application of test-theory formulas in practice. Later, similar caution was recommended by Rozeboom (1966) and by Zimmerman and Williams (1977, 1980). It should also be noted that some familiar and widely-used validity formulas relating two tests, including Spearman’s correction for attenuation, are subject to the same limitations (Zimmerman and Williams, 1997).

            Guttman deserves credit for coming to grips head on with correlated errors of measurement. Unfortunately, many subsequent investigators, including Lord and Novick, Cronbach, and others, did not heed his warnings and played down the role of correlated errors in test theory. However, Rozeboom (1966) in his book Foundations of the theory of prediction, remarked

“…we have already scoffed at the thought that measurement errors for multiple items included on the same testing occasion would remain undefiled by correlational agreement.” (Rozeboom, 1966, p. 427)

Also:

“…however pleasant a mathematical pastime it may be to shuffle through the internal statistics of a compound test in search of a formula which gives the closest estimate of the test’s reliability under conditions of uncorrelated errors, this is for practical applications like putting on a clean shirt to rassle a hog.” (Rozeboom, 1966, p. 415).

And also:

“If internal-consistency measures are to attain status in reliability theory by honest merit rather than by irrelevant mathematical elegance, explicit provision must be made for the effects of correlated measurement errors.” (Rozeboom, 1966, p. 438)

            At this juncture, the status of the assumption was similar to that of the assumption  during the period prior to the work of Guttman, Lord, and Novick. Test theorists did not derive the zero correlation from axioms, but were compelled to adopt it and sometimes tried to justify it by empirical arguments. Guttman attempted to bypass it altogether in the theory of composite measurements by deriving useful formulas that did not depend on it.

            A similar approach was taken by Zimmerman and Williams (1977, 1980), who incorporated the correlation, where E1 and E2 are the conventional group error scores, into the theory as a full-fledged variable. That is, they allowed  to remain in algebraic equations derived from the initial axioms and accepted at face value the results of calculations based on the modified equations. Consequently, one might expect the results to be accurate even if errors were defiled by correlational agreement. This strategy, however, was hindered by the difficulty in estimating the value of  in testing situations. After a half century, the jury is still out on how crucial this assumption might be in testing practice and whether or not it introduces significant inaccuracy into applications that depend on it.

            Many subsequent investigators were inclined to ignore the problem entirely by supposing that correlations among error scores are small and have no practical significance. Guttman had already pointed out a limitation of this point of view:

“The basic and exact formulas developed here are mathematical identities or tautologies. As such, they have no immediate practical use. Their importance lies in the fact that they provide a universal framework from which different practical formulas are easily derived. For a formula to be practical, it has to make some assumption as to the nature of the experimental dependence among items. Our tautologies show the exact mathematical nature, location, and role of such assumptions.” (Guttman, 1953, pp. 225-226).

Decades later, Zimmerman and Williams (1977, 1980), emphasized that in one common situation the influence of correlated errors becomes greater as test reliability becomes smaller. Consider the following modified Spearman attenuation formula, which is a tautology containing the correlation :

                          

where  is the correlation between true scores,  is the correlation between observed scores, and  and  are reliability coefficients.

            Table 1 gives an indication of how reliability influences the accuracy of a calculation.

The entries in the table are the discrepancy between the value of the correlation between true

-----------------------------------------------------------------------------------

Insert Table 1 about here

----------------------------------------------------------------------------------

scores, , calculated using the Spearman attenuation formula, and the correct value obtained from the tautology presented above. That is, the entries are the value of the second term of the above equation, and the discrepancy vanishes when  = 0. The value of  is fixed at .50 (upper section) or .75 (lower section), and that of  varies between .10 and .90 in increments of .20.

            Note that the discrepancy does not depend on , but is decidedly a function of . When the reliability of a predictor is .90 and that of a criterion is .75, correlated errors have some influence; the discrepancy is .048 when  = .25. However, the discrepancy is .144 when  = .75 and  = .50. And for lower values of  the reliability coefficient, the influence of correlated errors is overwhelming, even when is as small as .05 or .10. Certainly, perusal of the table demonstrates that Guttman’s and Rozeboom’s qualms about correlated errors were not unfounded.

Concluding Comments

            This paper has focused mainly on Guttman’s original contributions to the basic theory of educational and psychological test scores. His work in this area is not as well known as his work in scaling, factor analysis, facet theory, and other areas of application of quantitative methods to the social sciences. It is the contention of the present authors that Guttman’s innovations in test theory are of equal importance to those in the better known fields mentioned above. Increased appreciation of his work certainly would be highly desirable. Careful examination of 1945 and 1953 papers by present day investigators would throw considerable light on the axiomatics and structure of classical test theory.

            After Guttman’s 1945 and 1953 papers, references to his work has been scarce. Lord, Novick, and Rozeboom referred to it in many places. It is difficult to discern how much of the classic Lord and Novick book “Statistical theories of mental test scores” was influenced directly by Guttman’s earlier ideas, but apart from trivial differences in notation, the mathematical development in the early chapters certainly has much in common with it. On the other hand, Guttman’s ideas concerning lower bounds, composite measurements, and correlated errors have not come into prominence and are infrequently cited in textbooks. Two informative exceptions are the recent books of Traub (1944) and McDonald (1999), which discuss the lower bounds to composite reliability presented in Guttman’s 1945 paper.

            Many present day psychologists are familiar with the important historical figures in mathematical statistics and applied statistical methods, including pioneers such as Karl Pearson, R.A. Fisher, and Jerzy Neyman. Early in the last century, the methods of hypothesis testing and estimation that came to be employed extensively in psychological research began to diverge from the correlational methods of Spearman and other test theorists. The same divergence occurred in educational research. Quantitative theorists steeped in one of these areas generally were not too concerned about the other. In one sense, Guttman’s work bridged the gap between these two quantitative domains. His competence emanated from the common ground of the two approaches¾the mathematics of probability and statistics¾and it paved the way for the better known advances to follow.


References

Collins, L.M. (1996). Is reliability obsolete? A commentary on “Are simple gain scores obsolete?” Applied Psychological Measurement, 20, 289-292.

Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Dancer, L.S. (1990). Louis Guttman obituary. American Psychologist, 45, 773-774.

De Finetti, B. (1975). Theory of probability, Vol. 2. New York: Wiley.

Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill.

Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.

Gustafsson, J.E. (1992). The relevance of factor analysis for the study of group differences. Multivariate Behavioral Research, 27, 239-247.

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.

Guttman, L. (1953a). A special review of Harold Gulliksen, Theory of mental tests. Psychometrika, 18, 123-130.

Guttman, L. (1953b). Reliability formulas that do not assume experimental independence. Psychometrika, 18, 225-239.

Guttman, L. (1992). The irrelevance of factor analysis for the study of group differences. Multivariate Behavioral Research, 27, 175-204.

Jackson, P.H. (1979). A note on the relation between coefficient alpha and Guttman’s “split-half” lower bounds. Psychometrika, 44, 251-252.

Jensen, A.R. (1985). The nature of black-white differences on various psychometric tests: Spearman’s hypothesis. Behavioral and Brain Sciences, 8, 193-219.

Jensen, A.R. (1992a). Spearman’s hypothesis: Methodology and evidence. Multivariate Behavioral Research, 27, 225-233.

Jensen, A.R. (1992b). More on psychometric g and “Spearman’s hypothesis.” Multivariate Behavioral Research, 27, 257-260.

Johnson, N.L., & Kotz, S. (Eds.) (1997). Louis Guttman. In Leading personalities in the statistical sciences from the seventeenth century to the present. New York: Wiley, pp. 112-116.

Kane, M. (1996). The precision of measurements. Applied Measurement in Education, 9, 355-379.

Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151-160.

Loehlin, J.C. (1992a). Guttman on factor analysis and group differences: A comment. Multivariate Behavioral Research, 27, 235-237.

Loehlin, J.C. (1992b). On Schönemann on Guttman on Jensen, via Lewontin. Multivariate Behavioral Research, 27, 261-263.

Loève, M. (1963). Probability theory (3rd ed.). New York: Van Nostrand.

Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading MA, Addison-Wesley.

McDonald, R.P. (1999). Test theory: A unified treatment. Hillsdale, NJ: Lawrence Erlbaum Associates.

Mellenbergh, G.J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293-299.

Mellenbergh, G.J. (1999). A note on simple gain score precision. Applied Psychological Measurement, 23, 87-89.

Mulaik, S.A. (1992). Guttman’s “last paper”: A commentary and discussion. Multivariate Behavioral Research, 27, 173-174.

Novick, M.R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3, 1-18.

Novick, M.R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1-13.

Raju, N.S. (1979). Note on two generalizations of coefficient alpha. Psychometrika, 44, 347-349.

Raykov, T. (1992). Structural models for studying correlates of change. Australian Journal of Psychology, 44, 102-115.

Raykov, T. (1999). Are simple change scores obsolete? An approach to studying correlates and predictors of change. Applied Psychological Measurement, 23, 120-126.

Rényi, A. (1970). Foundations of probability. San Francisco: Holden-Day.

Roskam, E.E., & Ellis, J. (1992). Reaction to other commentaries. Multivariate Behavioral Research, 27, 249-252.

Rozeboom, W.W. (1966). Foundations of the theory of prediction. Homewood, IL: Dorsey.

Rulon, P.J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99-103.

Schönemann, P.H. (1992a). Extension of Guttman’s result from g to PC1. Multivariate Behavioral Research, 27, 219-224.

Schönemann, P.H. (1992b). Second round commentary on Guttman. Multivariate Behavioral Research, 27, 253-256.

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.

Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271-295.

Traub, S.E. (1994). Reliability for the social sciences. Thousand Oaks, CA: Sage.

Ten Berge, J.M.F., & Zegers, F.E. (1978). A series of lower bounds to the reliability of a test. Psychometrika, 43, 575-579.

Yule, G.U. (1912). On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society, 75, 579-642.

Yule, G.U., & Kendall, M.G. (1937). An introduction to the theory of statistics. (11th ed.) London: Griffin.

Zimmerman, D.W., & Williams, R.H. (1977). The theory of test validity and correlated errors of measurement. Journal of Mathematical Psychology, 16, 135-152

Zimmerman, D.W., & Williams, R.H. (1980). Is classical test theory "robust" under violation of the assumption of uncorrelated errors? Canadian Journal of Psychology, 34, 227-237.

Zimmerman, D.W., & Williams, R.H. (1997). Properties of the Spearman correction for attenuation for normal and realistic non-normal distributions. Applied Psychological Measurement, 21, 253-279.

Zimmerman, D.W., Zumbo, B.D., & Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 33-49.


Table 1.

 

Discrepancy between the true score correlation calculated using Spearman attenuation formula

and using the modified formula, as a joint function of the reliability of a predictor,, and correlation between error scores ,, for fixed values of the reliability of a criterion.

 

 = .50

 

 

 

.100

 

.300

 

.500

 

.700

 

.900

 

.050

 

.150

 

.076

 

.050

 

.033

 

.017

.100

.300

.153

.100

.065

.033

.150

.450

.229

.150

.098

.050

.200

.600

.306

.200

.130

.067

.250

.750

.381

.250

.164

.083

 

 = .75

 

 

 

.100

 

.300

 

.500

 

.700

 

.900

 

.050

 

.087

 

.044

 

.029

 

.019

 

.010

.100

.173

.088

.058

.038

.019

.150

.260

.132

.087

.057

.029

.200

.346

.176

.115

.076

.038

.250

.433

.220

.144

.094

.048