Kenneth A. Kavale, University of Iowa
Learning Disabilities Summit: Building a Foundation for the Future White Papers
With SS methods being problematic, alternative means of calculating discrepancy were considered. Shepard (1980) suggested a regression discrepancy method to remedy many of the existing problems. The measurement error associated with IQ and achievement measures ensures that statistical regression will occur, especially when dealing with IQ levels outside of a 95-105 range. The regression method involves calculating equations for IQ and achievement where "The anticipated [expected] achievement score is the norm for children of the same ability, grade level, and sex" (Shepard 1980, p. 80). Measurement error makes a "true" score indeterminate, and its value may be expressed through the SEM, a range surrounding the obtained score. The formula includes the SD of the test and its reliability estimate and is computed from
The SEM is then used to calculate a CI that reflects a range within which the "true" score might be found. The formula is
where x is the obtained score and z is the normal curve value corresponding to confidence level (e.g., 95% level = 1.96).
The standard error of estimate (SEE) is a statistic similar to the SEM that is used in the case of two independent scores when one is used to predict the second. Essentially, the SEE places a CI around the predicted score. The formula is
where SD is the standard deviation of the achievement test and is the squared correlation between IQ and achievement.
Because the correlation between IQ and achievement is not perfect, regression effects will operate (i.e., individuals who obtain an extreme score on one test will, on average, obtain a score closer to the population mean on the second test). The predicted achievement score may be adjusted for regression effects if both IQ and achievement test scores are expressed as SS with M = 100 and SD = 15, and the IQ-achievement correlation is multiplied by the obtained IQ minus the mean of the IQ test, which is then added to the mean of the achievement test (100) as follows:
The actual value is computed from the following equation:
which includes (a) measuring IQ, (b) predicting achievement level, (c) measuring actual achievement (y), (d) establishing confidence intervals (CIs) around the predicted achievement score using the SEE, and (e) comparing the predicted and actual achievement scores using the SEE to determine significant differences. For both IQ and achievement, standard scores (M = 100; SD = 15) are typically used in the formula.
An example of the regression method shows how it is used to determine the presence of a discrepancy. To illustrate, assume a student with a measured IQ of 115. Next, assume an rxy of 0.54 (usually derived from the available research literature). With these values, a predicted achievement score is calculated and found to be 108.1. At the 95% level, the z-value is 1.96, which is used in the equation to obtain a value of 19.93. Using this value, a CI is constructed by adding and subtracting 19.93 from the predicted achievement score of 108.1 to create a CI of 88.17-128.03. If the student's actual achievement score (y) was 85, then it falls below the lower end of the CI (88.17) and a significant discrepancy is said to exist.
The actual calculations may be aided by computer programs that compute significant IQ-achievement discrepancies using a regression approach (e.g., McDermott & Watkins, 1985; Reynolds & Snow, 1985; Watkins & Kush, 1988). The computer programs reduce mathematical error and may be used to create tables for various combinations of IQ and achievement tests as exemplified in the Iowa Regression Discrepancy Tables (Iowa Department of Public Instruction, 1981).
L. R. Wilson and Cone (1984) argued that the regression discrepancy method provides a "best fit" line for empirically establishing expected achievement values at various IQ levels, and "because regression is a real-world phenomenon, the equation automatically adjusts expected academic scores so that they are less extreme" (p. 99). Evans (1990) discussed six advantages of the regression discrepancy method including (a) determining whether IQ-achievement score differences are due to random error or real, nonchance differences, (b) determining expected achievement score based on individual IQ scores and the correlation between intelligence and achievement, (c) defining discrepancy as the difference between expected and actual achievement score, (d) measuring discrepancy in terms of the SD of the discrepancy difference score, (e) taking into account the SEM of the discrepancy by considering measurement error of IQ and achievement tests, and (f) determining if the discrepancy falls in a predetermined critical ("severe") range when measurement error is considered.
The regression discrepancy method still possesses some practical difficulties, however. Ideally, the regression equation calculated would be based on IQ and achievement scores obtained from large-scale random sampling from the population of interest. Because this is not usually feasible, population statistics for the correlations between individual IQ scores and specific achievement scores, the M and SD of the population IQ, and the M and SD for each specific achievement score must be estimated. With estimated values, the resulting equations may possess errors that limit generalizability [see methods proposed by Woodbury (1963) and McLeod (1979)], but with best estimates and noncontroversial assumptions about linear relationships and normal distributions, "[t]he regression equation approach provides the best method for determining academic discrepancy because unlike other approaches, it considers regression, measurement errors, and evidence" (L. R. Wilson & Cone, 1984, p. 107, emphasis added).
Although the regression discrepancy method provides the best answer to the question, "Is there a severe discrepancy between this child's score on the achievement measure and the average achievement score of all other children with the same IQ as this child?" (Reynolds, 1985, p. 40), another practical difficulty remains. A regression equation requires the choice of a value to denote "severity level" but the vagaries surrounding LD make this choice uncertain. The most usual value chosen is two SDs (gleaned from the historical two SDs below the mean IQ level used for the diagnosis of mental retardation [MR]), but while presumably meeting a criterion of "relative infrequency" in the population, the value remains uncertain because of the lack of a true prevalence rate for LD. The uncertainty may produce classification errors of two types: false positive (i.e., identifying a student as LD when he or she is not, in fact, LD) and false negative (i.e., failing to detect real LD). Shepard (1980) suggested that "it is likely that the Regression Discrepancy Method falsely labels more normal children as LD than it correctly identifies children who really have a disorder. At the same time, errors of overidentification do not assume that all real instances of LD will be detected" (p. 88).
Cone and Wilson (1981) analyzed the four basic methods of quantifying a discrepancy and concluded that SS and regression equation methods are preferred. This conclusion has been affirmed in other comparative analyses of discrepancy methods (e.g., Bennett & Clarizio, 1988; Braden & Weiss, 1988; Clarizio & Phillips, 1989).
The primary difficulty with regression equation methods is the numerous and complex empirical calculations required that may be further exacerbated by assessment instruments which may not meet acceptable psychometric standards as well as other technical problems (e.g., calculating the correlation between measures or choosing a proper incidence figure). Berk (1984), in an analysis of discrepancy methods, urged caution because of questions surrounding reliability and validity of outcomes. In a similar analysis, Reynolds (1984-1985) validated the use of regression equation models but noted possible confusion in choosing one type of regression equation over another:
Case a will be far too restrictive and is conceptually illogical in several regards: It will create a more homogeneous group of children; however, LD is characterized by the individuality of the learner, not traits or characteristics held in common with other children. Objections to application of case b are less conceptual than mathematical. Routinely applying both models and accepting qualification by either introduces a significantly greater probability of finding a severe discrepancy when none actually exists than does the application of either model....Using both models with all children will then not aid in reducing the conceptual confusion in the field as might application of a uniform model (p. 465).
Even the most defensible method of discrepancy calculation (i.e., SS and regression equation) remains less than perfect with respect to optimal psychometric and statistical considerations. The problems are exacerbated by the many different measurement models that might be employed (see Willson & Reynolds, 1984-1985) and the curious situation involving the fact that as these models become more defensible statistically, they become more complicated to use in practice (Boodoo, 1984-1985). Consequently, actual diagnostic practice in the LD field lags behind state-of-the-art statistical models, which almost makes discrepancy "an atheoretical, psychologically uninformed solution to the problem of LD classification" (Willson, 1987, p. 28).
The technical problems create real-world difficulties. Ross (1992a, b), in a survey of school psychologists, found that fewer than 10% were able to correctly evaluate whether four sets of ability-achievement scores reflected chance measurement differences or reliable, nonchance differences. Barnett and Macmann (1992) attributed much of the inaccuracy in discrepancy interpretation to basic misunderstandings surrounding test interpretation; statistical significance, confidence intervals, and measurement error. For example, Macmann, Barnett, Lombard, Belton-Kocher, and Sharpe (1989) found classification agreement rates ranging from 0.57 to 0.86 with different discrepancy calculation methods. When different achievement measures were used in the same calculations, however, the classification agreement rates fell to a range of 0.19 to 0.47. When both ability and achievement measures varied, agreement rates were consistently below 0.25 (Clarizio & Bennett, 1987). Thus, on average, only about 1 in 4 students deemed to possess a "severe" discrepancy would be identified as such with different sets of ability and achievement test scores. Macmann and Barnett (1985) affirmed this finding in a computer simulation study that concluded that "the identification of a severe discrepancy between predicted and actual achievement was disproportionately related to chance and instrument selection" (p. 371). The consequences become even more problematic in cases where more than one achievement test was administered, and the lowest score among them was used in discrepancy calculation. Sobel and Kelemen (1984) showed how this situation will likely result in a difference between the proportion of students actually classified LD and the proportion originally expected. For example, in the case of three achievement measures administered and the lowest score selected, the original criterion value of 6.68% LD cases identified would increase to 12.2%.
The inherent variability associated with discrepancy calculation is made more unsure by findings showing instability in discrepancy scores over time. O'Shea and Valcante (1986) found that SDL comparisons between groups of students with LD and low achieving students without LD differed significantly from grade 2 to grade 5. The groups appeared to develop diverging SDLs over time with increasingly larger differences for students with LD in language and mathematics compared to reading but, nevertheless, the SDL for reading doubled from grade 2 to grade 5. White and Wigle (1986), in a large-scale evaluation of school-identified students with LD, found four different patterns of discrepancy over time. The largest group (40%) revealed no ability-achievement discrepancy at initial placement or at reevaluation. The next largest groups demonstrated either a pattern of being discrepant at placement but not at reevaluation or, conversely, a pattern of not being discrepant at initial placement but discrepant at reevaluation. The smallest group showed a discrepancy at both placement and reevaluation. Considering that discrepancy is a primary identification criterion for LD, its instability over time is a source of concern, but the problem appears endemic. For example, Shepard and Smith (1983) reported that only 43% of a statewide sample of school-identified students with LD met strict identification criteria, with discrepancy being the primary criterion.
An early survey of 3,000 students with LD in Child Service Demonstration Centers showed that the average discrepancy was only about 1 year, leading to the conclusion that "[t]his discrepancy can be interpreted as a moderate retardation, rather than a severe disability" (Kirk & Elkins, 1975, p. 34). In a later similar analysis, Norman and Zigmond (1980) applied the federal (1976) SDL formula and found that, on average, 47% of students met the SDL criterion. For children aged 6 to 10 years (the likely age range of identification), less than 40% met the SDL criterion while the percentage for students aged 15 to 17 was 68%. Although providing greater confidence in the LD classification of the older children, the smaller percentage of younger children meeting the SDL criterion raises questions about the validity of their LD classification.
Shepard and Smith (1983) suggested that "the validity of LD identification cannot be reduced to simplistic statistical rules" (p. 125), but the inconsistent application of existing criteria creates significant difficulties in the LD diagnostic process. Shepard, Smith, & Vojir (1983), using a "discrepancy criterion," found that 26% of identified students with LD in Colorado revealed no discrepancy while 30% revealed a significant discrepancy with the use of any reading or math test. When validated with a second achievement measure, 5% of all students with LD had a significant discrepancy on two math tests while 27% revealed a significant discrepancy on two reading tests. Thus, not only was the discrepancy criterion not validated, but a "below grade level" criterion was not affirmed either; "Many LD pupils were not achieving below grade level as measured by standardized tests" (p. 317).
In contrast, Cone, Wilson, Bradley, and Reese (1985) found that 75% of a school-identified LD population in Iowa met the required discrepancy criterion. As this LD population continued in school, achievement levels became increasingly discrepant. In a later analysis, L. R. Wilson, Cone, Bradley, and Reese (1986) found that the identified students with LD were clearly different from other students with mild disabilities in Iowa (e.g., MR and behavior disorders [BDs]): "The main factor providing differentiations was discrepancy between achievement and ability" (p. 556). They concluded that students with LD were primarily underachievers, not simply low achievers.
In a later analysis, Valus (1986a) found 64% of identified students with LD to be significantly underachieving. In a large-scale analysis of Iowa's LD population, Kavale and Reese (1992) found that 55% met the discrepancy criterion. In different locales, the percentage of students with LD meeting the discrepancy criterion ranged from 32% to 75%. Thus, in any LD population, there will be a significant proportion who do not meet a significant discrepancy criterion, and, because of possible differences in interpretation, considerable variability in the proportions that do meet the discrepancy criterion across settings.
The finding of significant inconsistencies about the percentage of students meeting the discrepancy criterion is common among studies analyzing identified LD populations. For example, McLeskey (1989) found that 64% of an Indiana LD population met the discrepancy criterion, but this figure was achieved only after more rigorous and stringent state guidelines for LD identification were implemented. The 64% figure was almost double the 33% found in an earlier study (McLeskey & Waldron, 1991). In general, about one third of identified LD samples have been found not to meet the stipulated discrepancy criterion (e.g., Bennett & Clarizio, 1988; Dangel & Ensminger, 1988; Furlong, 1988).
Shepard and Smith (1983) referred to the approximately one third of identified students with LD as "clinical cases," meaning that their eligibility was a discretionary judgment made by a multidisciplinary team (MDT) which was at variance with the statistical (i.e., discrepancy) information. This situation may occur because (a) the LD may have caused ability level (i.e., IQ) to decline, and if achievement remained at a comparatively low level, then a discrepancy would not exist; (b) intact skills permitted the student to "compensate" for the effects of LD which means that achievement test scores may reveal an increase while ability level remained constant; or (c) a "mild" discrepancy was present but not unexplained because of factors such as limited school experience, poor instructional history, behavior problems, or second-language considerations. The essential question: Are such students "truly" LD, or is the inconsistency between team decisions and statistical status "truly" misclassification?
The many vagaries associated with "system identification" (Morrison, MacMillan, & Kavale, 1985) are the primary reason for the difficulty in decisions about the presence or absence of LD (Frame, Clarizio, Porter, & Vinsonhaler, 1982). In analyses of MDT decisions, it appears that LD identification criteria, especially the primary criterion of severe discrepancy, were neither rigorously or consistently applied (Epps, McGue, & Ysseldyke, 1982; Furlong & Yanagida, 1985; Furlong & Feldman, 1992). The difficulties begin with the lack of uniformity across educational agencies in setting "severe" discrepancy criterion levels (Perlmutter & Parus, 1983; Thurlow & Ysseldyke, 1979) which are then often exacerbated by differences in interpreting existing guidelines (Thurlow, Ysseldyke, & Casey, 1984; Valus, 1986b). The misapplication of criteria in LD identification procedures is further complicated by external pressures that might include the desire of MDTs to provide special education services, the request of general education teachers to remove difficult-to-teach students, and parental demands for LD placement (e.g., Algozzine & Ysseldyke, 1981b; Sabatino & Miller, 1980; Ysseldyke, Christenson, Pianta, & Algozzine, 1983).
When LD is viewed as primarily a socially constructed disability (Gelzheiser, 1987), the many external pressures often become primary considerations because a criterion like SDL is viewed as too child-centered in a medical model sense and does not permit examination of complex contextual interactions presumed relevant for valid diagnosis (Algozzine & Ysseldyke, 1987). Gerber and Semmel (1984) even argued that an instructional perspective rather than a statistical one should be the basis for determining LD eligibility. They suggested that the teacher become the "test" for determining whether a student has a "real" learning problem. Under such circumstances, it is not surprising to find that MDTs often do not "bother with the data" (Ysseldyke, Algozzine, Richey, & Graden, 1982).
The "clinical cases" of LD represent, at best, a "functional" LD because even though deemed eligible, the students in question really did not meet stipulated identification criteria with discrepancy often being the most tangible. The failure to meet stipulated criteria, however, raised serious questions about the reliability and validity of "clinical diagnoses" of LD (Shepard, 1983). It was, therefore, not surprising to find that judges were not able to differentiate students with LD based solely on an examination of test scores (Epps, Ysseldyke, & McGue, 1984).