Abstracts of selected earlier publications (before 1996)

Durable Secondary Reinforcement: Method and Theory (1957). Psychological Review, 64, 373-383.

A two-stage secondary reinforcement procedure was investigated. The first stage consisted of intermittent association of a previously neutral stimulus with a primary reinforcer. The second stage consisted of establishment of an operant response by presentation of the secondary reinforcer itself on an intermittent schedule. This two-stage procedure produced a higher rate of responding and longer-lasting responding than the typical method of invariably associating a stimulus with a primary reinforcer and presenting it continuously after an operant response.

Sustained Performance in Rats Based on Secondary Reinforcement (1959). Journal of Comparative and Physiological Psychology, 52, 353-358.

Animals were first given food reinforcement in a straight-alley runway, using a variable-ratio scheduling procedure, so that the running behavior could be elicited on many nonreinforced trials. A bar was then inserted into the starting box, and the animals could produce the onset of an extinction trial by pressing it. No further food reinforcement was given, but it proved possible to develop bar-pressing behavior and maintain it at high strength on variable-ratio and fixed-ratio schedules, solely by releasing a run to the empty goal box. An animal's performance lasted from 10 to 14 one and one-half hour daily sessions, during which thousands of responses were made. At first, the behavior was comparable to that generated by variable-ratio and fixed-ratio schedules with typical stepwise response patterns. Although the runway behavior became disrupted during the test phase of the experiment, a fragment of it—jumping from the starting box into the runway—persisted much longer. Control groups indicated that neither escape from the starting box, nor incidental properties of stimuli, could account for these behavioral changes.

Intermittent Reinforcement of Discriminatively Controlled Responses and Runs of Responses (1960). Journal of the Experimental Analysis of Behavior, 3, 83-91.

Intermittent reinforcement of the connection between a discrete external stimulus and reinforcement was studied in a free-operant situation. In one schedule, the first response following presentation of a discriminative stimulus was reinforced intermittently. In the other schedule, fixed-ratio responding following the discriminative stimulus was reinforced intermittently. It was found that schedules of this type have effects similar to those of partial reinforcement in a runway. The discriminative stimulus continues to maintain control consistently, although the latency of the first response following the stimulus increases and become somewhat more variable. Precise stimulus control of runs of ten to twenty responses can be maintained by gradually increasing the ratio following the discriminative stimulus.

Influence of Conditions at the Time of Reinforcement on the Strength of a Secondary Reinforcement Effect (1963). Psychological Reports, 13, 747-752.

Animals were trained to run through a guillotine door into an adjacent compartment for water reinforcement. They were then given access to the compartment as a consequence of bar-pressing, but no water. At the time of testing for bar-pressing various changes were made in the type of response required. For other groups of animals variations were made in the relative sizes of the two compartments during both training and testing. It was found that "getting into the starting box" was as reinforcing as "getting out of the starting box," that learning in the testing situation did not occur unless previous water reinforcement had been given in one of the compartments, and that changes in the type of response required at the time of testing did not diminish the learning effect.

Functional Laws and Reproducible Processes in Behavior (1963). Psychological Record, 13, 163-173.

A distinction was made between two methodological points of view relevant to behavior theory. Examples from physics and biology were used to illustrate the distinction, and the mathematical basis of the distinction in the phase space of statistical thermodynamics was discussed. It was proposed that the uniqueness of behavior arises from the special forms of interactions of many variables in complex systems of variables with the passage of time, rather than because of the influence of special or novel functional laws of the type described by the elementary equations of physics.

A Conceptual Approach to Some Problems in Mental Retardation (1965). Psychological Record, 15, 175-183.

A description of relationships between organism and environment, and of changes in these relationships in development, was suggested as a background for analysis of problems in mental retardation. The role of genetic factors and of environmental factors in mental retardation was discussed from this perspective. The indirect nature of connections between specific deficiencies of the organism and abnormalities in behavior was stressed.

Concurrent Schedules of Primary and Conditioned Reinforcement in Rats (1969). Journal of the Experimental Analysis of Behavior, 12, 261-268.

Rats responded on a fixed-interval schedule during which a 3-sec stimulus preceded each water reinforcement. The stimulus was then scheduled concurrently for responses on the same lever according to either a variable interval or a variable ratio. Although water reinforcement continued on a fixed-interval schedule, the pattern of responding became typical of a variable-interval or variable-ratio schedule. When the 3-sec stimulus was presented on a variable-interval or variable-ratio schedule, but was omitted on the fixed-interval schedule, the response rate decreased. When the stimulus occurred after the same time periods as those of the variable-interval schedule, but at least 7-sec after the last response, the rate decreased. The rate became higher when the fixed-interval schedule was discontinued and each presentation of the 3-sec stimulus was followed by water on a variable-interval schedule. When both water and the 3-sec stimulus were discontinued for a period of time, resulting in extinction of the lever response, and the 3-sec stimulus alone then presented on a variable-interval or variable-ratio after lever responses, rate increased and then gradually decreased.

Expeted Values of Correlated Measurements and Correction for Attenuation (1970). Psychological Reports, 26, 907-911.

The correlation between scores on two measurements or test procedures in a population is attenuated by variability of the individual mesurements. If an individual's scores are uncorrelated random variables, X and Y, the "correction for attenuation" provides an estimate of the correlation between the expected values EX and EY in a population. However, if an individual's scores are correlated random variables, a plausible alternative in some situations, the usual correction formula is not applicable. This paper derives a general formula which includes correlation between expected values of correlated measurements and which reduces to the usual correction for attenuation in the case of uncorrelated measurements.

Patterns of Responding in Second-order Chained Schedules (1970). Psychonomic Science, 20, 137-139.

Responding on one lever under a fixed-interval schedule (FI 100 sec) produced an exteroceptive stimulus, and responding on another lever under a different schedule (FI 50 sec or FR 1) terminated the stimulus. Under a second-order schedule, every second termination of the stimulus produced primary reinforcement (FR 2 of entire sequence). Patterns of responding in the initial component of the chain were similar to those maintained by presentation of brief stimuli in second-order schedules. When the schedule in the terminal component was changed and primary reinforcement became less frequent, rate in the initial component decreased, while the pattern of responding under the second-order schedule was maintained.

Patterns of Responding in Conditioned Reinforcement Schedules Superimposed on Primary Reinforcement Schedules (1971). Psychonomic Science, 23, 379-380.

Rats responded on a 2-min fixed-interval schedule in which lights were associated with water reinforcement at the end of each interval. Rate changes resulting from superimposed conditioned reinforcement schedules were examined under conditions in which momentary stimulus effects were minimized. When a variable-interval schedule of presentation of lights, without water, was superimposed on the fixed-interval schedule only during alternate intervals between water reinforcements, overall rate of responding increased in all intervals, including those in which stimuli were not added. When lights, without water, were presented on a similar variable-interval schedule during alternate intervals, only after at least 6-sec had elapsed since a response (DR0 schedule), overall rate decreased in all intervals. When a single response-dependent stimulus presentation was added in the middle of each interval between water reinforcements , rate decreased in the periods just after the added stimulus and increased in periods just before the next scheduled presentation of the stimulus.

Rate Changes after Unscheduled Omission and Presentation of Reinforcement (1971). Journal of the Experimental Analysis of Behavior, 15, 261-270.

Changes in response rate similar to frustration effects were studied in a two-lever situation. Responding on one lever on a fixed-interval schedule produced access to water for 5-sec. and an exteroceptive stimulus. In the presence of this stimulus, responding on another lever on a fixed interval schedule produced access to water for 5-sec. and terminated the stimulus. Occasional omission of a previously scheduled reinforcer after responding on the first lever resulted consistently in increases in rate on the second lever during the immediately succeeding interval. In another procedure, occasional presentation of a previously unscheduled reinforcer after responding on the first lever resulted consistently in decreases in rate on the second lever during the immediately succeeding interval. Changes occurred after the first omissions or presentations and were about the same in magnitude as the procedure continued over several sessions. Typically, an increase or decrease in rate was maintained throughout an entire 100-sec. interval. Changes in rate on the second lever of approximately the same magnitude also occurred when rate on the first lever was near-zero under a schedule that differentially reinforced behavior other than lever pressing.

Discrete Operant Discrimination Maintained by Conditioned Reinforcement (1972). Psychonomic Science, 28, 33-36.

Operant discrimination in rats in a two-lever situation was maintained by conditioned reinforcement. A response on Lever 1 in the presence of a tone turned on two lights and turned off the tone; subsequently, a response on Lever 2 in the presence of the lights produced water. Later, a response on Lever 1 during the tone produced lights, while a response on Lever 2 turned off the lights but did not produce water. Concurrently, the lights were presented at regular intervals on an independent schedule, not immediately after the tone nor after responding on either lever, and on these occasions responses on Lever 2 again produced water. In this situation, the latency of response on Lever 1 after the tone turned out to be a sensitive measure of discriminative properties of the tone.

Two Concepts of "True Score" in Test Theory (1975). Psychological Reports, 36, 795-805.

Two concepts of "true score" in test theory are examined. Under one concept, the true score is identified with the expected value of the observed score, and it follows that reliability is the ratio of true variance to observed variance. Under the other concept, the true score is a constant which is not necessarily equal to the expected value of the observed score, and it follows that reliability is not necessarily equal to the ratio of true variance to observed variance. Axioms are presented which encompass both points of view, and explicit formulas relating the two kinds of true scores are derived by representing all scores and components of scores as random variables with the same associated probability space.

Probability Spaces, Hilbert Spaces, and the Axioms of Test Theory (1975). Psychometrika, 40, 395-412.

A branch of probability theory that has been studied extensively in recent years, the theory of conditional expectation, provides just the concepts needed for mathematical derivation of the main results of the classical test theory with minimal assumptions and greatest economy in the proofs. The collection of all random variables with finite variance defined on a given probability space is a Hilbert space; the function that assigns to each random variable its conditional expectation is a linear operator; and the properties of the conditional expectation needed to derive the usual test-theory formulas are general properties of linear operators in Hilbert space. Accordingly, each of the test-theory formulas has a simple geometric interpretation that holds in all Hilbert spaces.

The Theory of Test Validity and Correlated Errors of Measurement (with Richard H. Williams, 1977). Journal of Mathematical Psychology, 16, 135-152

The zero correlation between true scores and error scores is a consequence of initial definitions in the classical test theory model. However, a zero correlation between error scores on two tests or test items is an independent assumption that does not follow from initial definitions and is possibly violated in practice. If this correlation is non-zero, the usual validity formulas are modified. This paper presents formulas for the general case in which no assumptions are made about error score correlations.

A Simple Duality Principle in Test Theory (1979). Journal of Mathematical Psychology, 20. 256-262.

It is demonstrated that theorems in test theory have corresponding dual theorems which are obtained by exchanging true scores and error scores, as well as reliability coefficients and their complements, in both the hypothesis and the conclusion. A formula that does not conform to the principle cannot be an identity in the classical test theory model, but must be based on additional assumptions in the hypothesis that perhaps are not immediately apparent. The usefulness of the principle is indicated, and its origin in the mathematical formalism underlying the theory is discussed.

Quantum Theory and Interbehavioral Psychology (1979). Psychological Record, 29. 473-485.

A close relationship between concepts in modern quantum physics and the point of view of interbehavioral psychology is becoming increasingly apparent. The quantum mechanical treatment of causality and of probability, the status of the uncertainty principle, and the inseparability of object and measuring instrument in the theory are harmonious with ideas expressed in Kantor's interbehavioral psychology early in this century. Furthermore, the interbehavioral approach throws light on some unresolved issues in quantum theory related to the role of the observer in measurement and the meaning of the correspondence principle.

A Note on the Correlation of Gains and Initial Status (with Richard H. Williams, 1982). Journal of General Psychology, 107, 203-207.

By writing the familiar equation for the correlation of gains and initial status in a form which reveals that the correlation does not depend on the standard deviations of pretest and posttest scores separately, but only on their ratio, it is possible to exhibit a one-parameter family of functions which throws new light on this relationship. The same family of functions provides information about the correlation between the true components of gains and initial status, as well as information about the correlation between the error components of gains and initial status. Experimental independence of pretest scores and posttest scores is incompatible with experimental independence of pretest scores and gain scores.

Gain Scores in Research Can be Highly Reliable (with Richard H. Williams, 1982). Journal of Educational Measurement, 19, 149-154.

The common belief that gain scores are unreliable is based on certain assumptions about the values of parameters in a well known formula for the reliability of differences. In this paper we show that a reliability coefficient calculated from the formula can be high, provided one makes other assumptions about the values of pretest and posttest reliability coefficients and standard deviations. Furthermore, there is reason to believe that the revised assumptions are more realistic than the usual ones in testing practice.

The Universe—An Unscientific Concept (1982). Psychological Record, 32, 337-347.

The grand conception of a self-contained totality of phenomena, or "universe," which is prominent in scientific writing, but which extends far beyond actual human contact with objective events, is inadequate as a theoretical construct and devoid of empirically verifiable consequences. It is inconsistent with inferences from the history of physical science about how theories might be generalized in the future and also incongruous with current knowledge of biological and psychological evolution. In particular, the insights of interbehavioral psychology cast doubt on the common belief that scientific research is progressing toward a final and completed system of natural laws. Both on an atomic and an astronomical scale of distances, it has been discovered repeatedly that apparently autonomous, isolated systems participate in interrelations with other events in more inclusive domains. There is no scientific support, theoretical or empirical, for the belief that this continuing trend eventually will terminate. Furthermore, uncritical acceptance of the existence of a "universe" can impede scientific progress, because it implicitly places limits on the kinds of inquiries that may be undertaken.

The Relative Error Magnitude in Three Measures of Change (with Richard H. Williams, 1982). Psychometrika, 47, 141-147.

Formulas for the standard error of measurement of three measures of change—simple difference scores, residualized difference scores, and the base-free measure introduced by Tucker, Damarin, and Messick—are derived. Equating these formulas by pairs yields additional explicit formulas which provide a practical guide for determining the relative error of the three measures in any pretest-posttest design. The functional relationship between the standard error of measurement and the correlation between pretest and posttest observed scores remains essentially the same for each of the three measures despite variations in other test parameters (reliability coefficients, standard deviations), even when pretest and posttest errors of measurement are correlated.

On the High Predictive Potential of Change and Growth Measures (with Richard H. Williams, 1982). Educational and Psychological Measurement, 42, 961-968.

Formulas in the statistical theory of test scores have led some psychometricians to believe that measures of change and growth have questionable value in research. However, certain combinations of parameters, when substituted into these formulas, yield reliable change scores and high non-spurious correlations between change scores and independent criterion scores, even when pretest scores are not good predictors of either changes or the criterion. Because of mathematical constraints, these particular combinations of parameters are the ones to be expected in research designs if valid and reliable changes in individuals' test scores do occur. Accordingly, it is possible for measures of change and growth to have excellent predictive value, as investigators in many fields have taken for granted, and conversely, independent variables can be excellent predictors of changes. Although criteria which are highly correlated with changes are difficult to discover empirically, their existence cannot be ruled out by statistical arguments alone.

A Note on the Completeness of the Scientific Method (1984). Psychological Record, 34, 175-179.

It has often been suggested that human knowledge is separated into non-interacting spheres, such as things known by faith as opposed to things known by reason, or scientific knowledge as opposed to artistic insight. Increasingly, however, science reveals the interdependence of phenomena of nature. Since all varieties of human activity are themselves products of biological evolution, ultimately embedded in the natural world and interconnected, it is impossible to assign any psychological process to a transcendent realm with its own priviliged vantage point and special methods of inquiry. It is proposed that the term "completeness" refer to the unrestricted application of the scientific method to any conceiveable sphere of inquiry.

Variability of Deviation IQ's Based on Multiple-Choice Test Scores (1985). Educational and Psychological Measurement, 45, 745-751.

A computer program simulated guessing on multiple-choice test items and calculated deviation IQ's from observed scores which contained a guessing component. The entire procedure was replicated 5000 times for each of a variety of assumptions about the number of test items, the number of choices per item, and the distribution of the number of items "known" by examinees before guessing. Extensive variability in deviation IQ's due entirely to chance was found. The degree of variability was sensitive to the scale location of the distribution of items "known" but not to the shape or variance of that distribution. For all distributions, the degree of variability consistently depended inversely on the number of items and inversely on the number of choices per item.

Note on the Reliability of Experimental Measures and the Power of Significance Tests (with Richard H. Williams, 1986). Psychological Bulletin, 100, 123-124.

The statistical theory of the power of significance tests, combined with the classical theory of the reliability of measurement, reveals that the power of a statistical test sometimes increases and sometimes decreases as the reliability coefficient of a dependent variable increases. A seeming paradox that has been discussed extensively arises because the relation between statistical power and the reliability coefficient is not a functional relation unless another variable—either true variance or error variance—remains constant. This fact explains why authors have reached different conclusions about how reliability influences significance tests.

A Note on the Inadequacy of Percentage Grading Scales for Almost All Ability Distributions (1992). Canadian Psychology, 34:2.

This study investigated distributions of letter grades (A,B,C,D,F) assigned to test scores according to the percentages adopted by Canadian universities, assuming realistic score distributions of ten different shapes with various means and standard deviations. The grade distributions corresponding to 83 out of 90 score distributions were highly anomalous, and the remaining 7 were far from ideal. Therefore, in the majority of practical testing situations, the percentage grading method is inadequate because of purely statistical properties of a scale based on fixed percentages.

Significance Testing of Correlation Using Scores, Ranks, and Modified Ranks (with Bruno D. Zumbo, 1993). Educational and Psychological Measurement, 53, 897-904.

A computer simulation study compared significance tests of correlation coefficients calculated from initial scores, from ranks assigned by the Spearman method, and from three kinds of modified ranks in which N sample values were replaced by N/2, N/3, and N/4 integers ("modular ranks"). Tests based on the initial scores are more powerful than those based on the various ranks for normal distributions, whereas the reverse is true for mixed-normal, exponential, and Cauchy distributions. Probabilities of Type I and Type II errors are unaffected by reduction in the number of ranks. Implications of these findings for the classification of rank correlation as a nonparametric correlation method are discussed.

Coefficient Alpha as an Estimate of Test Reliability Under Violation of Two Assumptions (with Bruno D. Zumbo and Coralie Lalonde, 1993). Educational and Psychological Measurement, 53, 33-49.

Through use of computer simulation, the central tendency and variability of coefficient alpha were examined under violation of two assumptions made in the derivation of the formula. When assumptions were satisfied, the mean value of coefficient alpha was extremely close to the population reliability coefficient, but values were highly variable. This result was independent of the shape of the population distribution of test scores. Coefficient alpha underestimated reliability under violation of the assumption of essential tau-equivalence of subtest scores and overestimated reliability under violation of the assumption of uncorrelated error scores. In both cases, the bias of the estimates varied systematically with the degree of violation of assumptions, while the variability of the estimates remained constant. All these results were independent of the number of persons and the number of subtests.

Effect of Nonindependence of Sample Observations on Some Parametric and Nonparametric Statistical Tests (with Richard H. Williams and Bruno D. Zumbo, 1993). Communications in Statistics—Simulation and Computation, 22(3), 779-789.

The probabilities of Type I and Type II errors of the Student t test, as well as the Wilcoxon-Mann-Whitney test, are grossly inflated or deflated by violation of within-group and between-group independence of sample observations. A modified t formula with two correlation terms eliminated these changes for samples from normal distributions. The same formula applied to ranks replacing the initial scores eliminated changes resulting from both non-normality and nonindependence for various heavy-tailed distributions.

Reliability of Measurement and Power of Significance Tests Based on Differences (with Richard H. Williams and Bruno D. Zumbo, 1993). Applied Psychological Measurement, 17, 1-9.

The power of significance tests based on difference scores is indirectly influenced by the reliability of the measures from which differences are obtained. Reliability depends on the relative magnitude of true score and error score variance, but statistical power is a function of the absolute magnitude of these components. Explicit power calculations reaffirm the paradox put forward by Overall and Woodward (1975, 1976)—that significance tests of differences can be powerful even if the reliability of the difference scores is zero. This anomaly arises because power is a function of observed score variance but is not a function of reliability unless either true score variance or error score variance is constant. Provided that sample size, significance level, directionality, and the alternative hypothesis associated with a significance test remain the same, power always increases when population variance decreases, independently of reliability.

Mimicking Properties of Nonparametric Rank Tests Using Scores that are not Ranks (1993). Journal of General Psychology, 120, 509-516.

In a computer simulation study, random samples from a uniform density were substituted for each of two independent samples from normal and various non-normal densities. This procedure was compared to conventional ranking and to Bell and Doksum's (1965) procedure, which substituted random normal deviates for initial sample values. After performing the Student t test, the program transformed the initial scores and performed additional t tests on ranks, random uniform scores, and random normal scores. For several distributions, the test on random normal scores was more powerful than the others, consistent with known asymptotic results. The probabilities of Type I and Type II errors of the test on random uniform scores were nearly the same as those of the Wilcoxon-Mann-Whitney test, for all distributions examined.

A Note on Interpretation of Formulas for the Reliability of Differences (1994). Journal of Educational Measurement, 31, 143-147.

It is widely recognized that the reliability of a difference score depends on the reliability of the constituent scores and their intercorrelation. Authors often use a well-known identity to express the reliability of a difference as a function of the reliabilities of the components, assuming that the intercorrelation remains constant. This approach is misleading, because the familiar formula is a composite function in which the intercorrelation of the components itself is a function of reliability. An alternative formula, containing the correlation between true scores, instead of the correlation between observed scores, provides more useful information and yields values that are not quite as anomalous as the ones usually obtained.

A Note on Modified Rank Correlation (1994). Journal of Educational and Behavioral Statistics, 19, 357-362.

The Spearman rank correlation method separately assigns ranks to sample values of each of two variables, X and Y, and substitutes squared differences between the ranks into a computational formula derived from Pearson correlation. The present note examines an alternate procedure which separately transforms the X and Y values to standard scores, ranks the combined standard scores in a single sequence, and calculates the Pearson correlation between the ranks corresponding to the initial scores. A simulation study reveals that significance tests of correlation based on this method effectively control Type I and Type II error probabilities and are slightly more powerful than the Spearman method for normal and various non-normal distributions and for sample sizes ranging from 8 to 30. This note discusses some advantages of the modified procedure.

Abstracts of more recent publications (1996 and later)

Abstracts of recent manuscripts in press (accepted for publication but not yet in print) or under review (submitted to journals but not yet accepted for publication)

Return to the beginning of this section

Return to Donald W. Zimmerman's home page