Kenneth A. Kavale, University of Iowa
Learning Disabilities Summit: Building a Foundation for the Future White Papers
The next type of discrepancy calculation involves more comprehensive expectancy formulas including some combination of variables (usually IQ and perhaps CA, MA, years in school [YS], or grade age [GA]). The USOE (1976) SDL formula provides an example:
Earlier examples were provided by Bond and Tinker (1973), Harris (1975), and Johnson and Myklebust (1967) .The Bond and Tinker formula is
The underlying logic for the Bond and Tinker formula seems confounded; an IQ score was included to account for unequal learning rates, but the included constant (1.0) makes this point moot because it negates the differential effects of IQ during the first 6 years of life (Dore-Boyce, Misner, & McGuire, 1975). To remedy this confounding, one set of proposed formulas (Horn, 1941) assigned different weights to MA and CA so formulas may be applied at four different age ranges and the problem of unequal learning rates presumably negated. Without some modification of this sort, the Bond and Tinker (as well as the Harris) formulas are poor predictors that over- and underidentify students with low and high IQs, respectively (Alspaugh & Burge, 1972; Rodenborn, 1974; Simmons & Shapiro, 1968).
The formula proposed by Johnson and Myklebust (1967) introduced the problem of interpreting ratio scores in determining discrepancy level. The Johnson and Myklebust formula calculates an expectancy level (MA + CA + GA / 3), but instead of a direct comparison (EGL - AGL), discrepancy is calculated from a ratio score (AGL / EGL 100) with a value less than 90 considered significant.
Because of the absence of an absolute zero and equal intervals, ratio scores do not possess inherent meaning. Only extreme scores are meaningful on what is really an ordinal scale, and a value such as 90 cannot be interpreted to mean 90% of average, for example. The situation is further complicated by the variable standard deviations (SDs) across age levels which means that the significance of a given discrepancy ratio will vary from one grade to another. The difficulties such SD variability causes were demonstrated by Macy, Baker, and Kosinski (1979) where the Johnson and Myklebust (1967) discrepancy quotients were quite variable across different combinations of age, grade, and content areas.
The expectancy formula approach to discrepancy calculation has been roundly criticized. McLeod (1979) discussed the negative influence of measurement errors and regression: "Regression means that if scores on two tests are positively correlated, as are intelligence, reading, arithmetic, and spelling scores, then individuals who obtain a particular score on one test will on the average obtain a score nearer to the population average, i.e., regress toward the mean on the other test" (p. 324). Hoffman (1980) suggested that the theoretical problems surrounding regression were not well understood, which led to "considerable uncertainty and possibly confusion among many professionals as to what the data mean at an applied level" (p. 11). If not considered, regression effects lead to increased possibility of misclassification, as pointed out by Thorndike (1963):
If a simple difference between aptitude and achievement standard scores, or a ratio of achievement to aptitude measure, is completed, the high aptitude group will appear primarily to be "underachievers" and the low aptitude group to be "overachievers." For this reason it is necessary to define "underachievement" as discrepancy of actual achievement from the predicted value, predicted upon the basis of the regression equation between aptitude and achievement. A failure to recognize this regression effect has rendered questionable, if not meaningless, much of the research in "underachievement" (p. 13).
The questionable reliability associated with some tests used in determining discrepancy almost ensures the presence of regression effects (Coles, 1978; Thurlow & Ysseldyke, 1979). The test validity question is captured in what Kelley (1927) long ago labeled the "jingle and jangle" fallacy--the assumption that tests with the same names measure similar functions, or that tests with different names measure different functions. Hanna, Dyck, and Holen (1979) focused their criticism on the psychometric difficulties associated with age- and grade-equivalent scores. The many associated problems made the expectancy approach a less than optimal means of determining and interpreting a "significant" discrepancy (Davis & Shepard, 1983). L. R. Wilson, Cone, Busch, and Allee (1983) discussed the incorrect assumption that achievement follows a linear growth pattern which results in an inherent bias when discrepancy is defined as a fraction of some expected achievement value because of different slopes in the patterns.
When used in practice, the expectancy formula approach to discrepancy "yielded strikingly disparate results in terms of the number of children identified as learning disabled by each" (Forness, Sinclair, & Guthrie, 1983, p. 111). In actuality, the resulting prevalence rates ranged from 1% to 37% (Sinclair, Guthrie, & Forness, 1984). Confounding this variability was the additional finding that in a sample of students deemed eligible for LD programs, 64% were not identified by any expectancy formula (Sinclair & Alexson, 1986). Finally, O'Donnell (1980) found that a discrepancy derived from an expectancy formula was not a distinctive characteristic of LD and was equally likely to be found among other students with disabilities.
Although discrepancy methods were the object of contentious debate, discrepancy continued to be reinforced as a primary criterion for LD identification (e.g., Chalfant, 1985) mainly because of a desire to reduce the reliance on clinical judgment in LD diagnosis (see Meehl, 1954). Thus, the continued use of discrepancy in the LD diagnostic process required improved methodology.
The first problem requiring attention was related to the types of test scores included in discrepancy formulas. Age-equivalent scores (e.g., MA), for example, lack a consistent unit of measurement. More problematic are grade-equivalent (GE) scores that possess difficulties related to the fact that they ignore both the dispersion of scores about the mean and the nonequivalent regression lines between grade and test scores across both grade levels and content areas (Gullicksen, 1950). Consequently, exact values are difficult to achieve, and GEs, therefore, usually involve an excess of extrapolation, especially at the upper and lower ends of a scale. The difficulties are compounded because scores calculated between testing periods (often 1 year) must be interpolated, but such a calculation is based on the invalid assumption of a constant learning rate. What this means is that achievement tests do not exhibit identical GEs. For example, a seventh grade student who is 2 years below grade level in reading will receive quite different percentile rankings (a possible range of 12 percentile ranks) depending on the reading achievement measure used (Reynolds, 1981). When included in discrepancy formulas, GEs from different tests assessing different academic areas may distort scores that may exaggerate small performance differences (Berk, 1981). The problem of GE comparability is thus significant and, by grade 8, GE scores may possess essentially no meaning (Hoover, 1984).
The problems associated with GEs may be partially remedied by the use of standard scores that hold the advantage of being scaled to a constant mean (M) and SD which permits more accurate and precise interpretation. Nevertheless, Clarizio and Phillips (1986) pointed out the potential limitations with standard scores: (a) no basis for comparisons across grade levels, (b) possible distortions in profile comparisons, and (c) inconsistency of unit size caused by within-grade variability. Although their use provides advantages over GEs, standard scores also need to be interpreted cautiously.
Standard score (SS) discrepancy methods typically involve a direct comparison between common metrics for intellectual ability and academic achievement (Elliot, 1981; Erickson, 1975; Hanna et al., 1979). For LD determination, the standard scores for ability (IQ) and achievement most often have an M = 100 and SD = 15 with the SDL criterion usually being a minimum of 15-point IQ-achievement difference.
Although advancing discrepancy calculation, the SS procedure is not without limitation. One problem surrounds the invalid assumption that, on average, IQ and achievement scores should be identical (e.g., a child with an IQ of 115 should have a reading or math achievement score of 115). This assumption would be true only if IQ and achievement were perfectly correlated (r = 1.00). The actual correlation is about 0.60, which means that the expected achievement for an IQ of 130 is actually 122, not 130. With below-average IQs, an opposite effect occurs (i.e., an IQ of 85 actually has an expected achievement level of about 88). Thus, the SS approach to discrepancy will always possess a systematic bias (Thorndike, 1963). For LD identification, this means the overidentification of high-ability underachievers and the underidentification of low-ability achievers who may in fact be LD.
The less-than-perfect correlation between ability and achievement measures also produces measurement errors that may influence the resulting difference scores. When different IQ and achievement tests are used in calculating discrepancy, the use of particular test combinations will identify more students as LD than will other test combinations (Bishop & Butterworth, 1980; Jenkins & Pany, 1978). The measurement errors also affect the inherent meaning of score comparisons because of the possibility that unique elements may not be measured. Hopkins and Stanley (1981) illustrated the substantial overlapping variance possible between ability and achievement tests. Across grade levels on average, 47% of the variance overlaps, which means that almost half the time the same skills are being measured, making it questionable whether or not "true" differences are being revealed.
The SS approach produces a difference score that is presumably an index of discrepancy. The difference score, however, often lacks adequate reliability, resulting in uncertainty as to whether or not the difference may have really occurred by chance (Feldt, 1967; Payne & Jones, 1957). For example, the acceptable individual reliabilities of most IQ and achievement tests (about 0.90) produce a difference score with a reliability of only about 0.75. Measurement error is again the primary factor producing this unreliability (see Cronbach, Gleser, Nanda, & Rajaratnam, 1972) which ultimately may distort the discrepancy score as discussed by Thorndike (1963), who concluded that
if nothing but the errors of measurement in the predictor and criterion were operating, we could still expect to get a spread of discrepancy scores represented by a standard deviation of half a grade-unit. We would still occasionally get discrepancies between predicted and actual reading level of as much as a grade and a half. This degree of "underachievement" would be possible as a result of nothing more than measurement error (p. 9).
Algozzine and Ysseldyke (1981a), using various IQ-achievement test correlations, demonstrated the significantly lower reliabilities of difference scores compared with both of the reliabilities of the tests on which they were based. Using the standard error of measurement (SEM) (a theoretical range around the presumed true score), Schulte and Borich (1984) also demonstrated the unreliability of difference scores. The calculated SEMs of difference scores were substantial and would significantly influence the type and rate of errors made in LD identification. In an empirical analysis, Salvia and Clark (1973) showed how "the standard error of measurement for deficit scores is sufficiently large to preclude rigid adherence to deficits as a criterion for learning disabilities" (p. 308). Reynolds (1981) showed how it is possible to determine the significance of the difference between two scores, but it is a time-consuming process and does not fully answer the question about where to set the cut-off (i.e., criterion) score for LD identification (Schulte & Borich, 1984).