Beyond IQ: When Theory Trumps Science: a Critique of the PSW Model for SLD Identification

Sunday, June 12, 2016

When Theory Trumps Science: a Critique of the PSW Model for SLD Identification

McGill, R. J., & Busse, R. T. (2016). When Theory Trumps Science: a Critique of the PSW Model for SLD Identification. Contemporary School Psychology, 1-9.‏

In this paper, McGill & Busse criticize the use of PSW operational definitions of learning disabilities (such as the Flanagan/CHC operational definition).

An operational definition does not define the concept itself (an operational definition of learning disability (LD) does not define the essence of LD). Rather, it portrays the way in which the concept is measured (what we should actually do in order to determine whether a child is learning disabled or not).

Each operational definition of learning disability has advantages and disadvantages, and we can find papers criticizing each definition. It's important to be familiar with the critique. The mere existence of critique does not make the use of an operational definition wrong. It's important to weigh each definition's critique and to choose the definition with the least heavy- weight critique.

Now we turn to the paper, with remarks and explanations by me in green.

Within the professional literature, there is growing support for educational agencies to adopt an approach to SLD identification that emphasizes the

importance of an individual’s pattern of cognitive and achievement strengths and weaknesses (PSW). The Flanagan/CHC definition of learning disability is one of these operational definition. Cognitive strengths and weaknesses can manifest in CHC abilities such as Fluid Ability, Short Term Memory, Long Term Storage and Retrieval, Visual Processing, Auditory Processing, Processing Speed, Comprehension Knowledge. Achievement strengths and weaknesses can manifest in tests measuring Reading Decoding/Reading Comprehension/Writing/Written Expression/Math Calculations/Math Reasoning.

The Flanagan definition is based on five major criteria: A. Significantly poor performance on achievement tests. B. One or more of the CHC cognitive abilities is significantly below average. C. Concordance/linkage between the poor area of achievement and the low cognitive ability. The low cognitive ability can explain/be the cause of the poor achievement area. D. Other cognitive abilities are intact. E. Exclusionary factors are not the primary reason for the poor performance on the achievement tests.

In 2014, the California Association of School Psychologists released a position paper endorsing this approach. As a vehicle for examining the PSW model, the authors respond critically to three fundamental positions taken in the position paper: (a) diagnostic validity for the model has been established; (b) cognitive profile analysis is valid and reliable; and (c) PSW data have adequate treatment utility. The authors conclude that at the present time there is insufficient support within the empirical literature to support adoption of the PSW method for SLD identification.

Prior to IDEA (INDIVIDUALS WITH DISABILITIES EDUCATION ACT) 2004, federal regulations emphasized the primacy of the discrepancy model, wherein Specific Learning Disability was operationalized as a significant discrepancy between an individual’s achievement and their cognitive ability (Full Scale IQ score). This model was heavily criticized. First, it causes "waiting for failure", because such a discrepancy can be proven only in third grade, when the child "achieves" a two-year discrepancy between his IQ score and his performance in reading/writing/math. Second, it causes under-identification of learning disabilities in adolescence. LD affects full scale IQ (for instance, a learning disabled child reads less, thus his comprehension knowledge is less developed, which lowers his IQ). Thus in adolescence there is a lowered chance for a learning disabled child to have a discrepancy between his IQ score and his achievement scores. This renders it impossible to diagnose him as learning disabled despite him being so.

In contrast to previous legislation, IDEA 2004 permitted local educational agencies the option of selecting between the discrepancy method and alternatives such as response-to intervention (RTI). The RTI model defined learning disability as persistent low achievement (in reading/writing/math) despite adequate intervention. This model permits but does not require to conduct a psychological assessment for a differential diagnosis with intellectual disability, language disorder and emotional /behavioral disorders. This point (the lack of differential diagnosis) came under a lot of criticism. Over the last decade, RTI has been widely embraced within the technical literature and adopted as an SLD classification model by many educational agencies across the country, resulting in renewed concern regarding the validity of identification approaches that deemphasize the role of

cognitive testing.

PSW models were developed in response to these problems. There are several such models which are quite similar to each other: (a) the concordance/discordance model (C/DM; Hale and Fiorello 2004), (b) the

Cattell-Horn-Carroll operational model (CHC; Flanagan et al. 2011), and (c) the discrepancy/consistency model (D/CM; Naglieri 2011). It is noteworthy that, although the models differ with respect to their theoretical orientations

and the statistical formulae used to identify patterns of strengths and weakness, all three PSW models share at least three core assumptions as related to the diagnosis of Specific LD: (a) evidence of cognitive weaknesses must be present, (b) an academic weakness must also be established, and (c) there must be evidence of spared (i.e., not indicative of a weakness)

cognitive-achievement abilities. The authors go on to briefly discuss each model. This will not be done here.

Now the authors criticize PSW models on three points:

Critical Assumption One: Diagnostic Validity for the Model Has Been Established

Steubing et al. (2012) investigated the diagnostic accuracy of several PSW models and reported high diagnostic specificity (a high percentage of children who do not have LD and are correctly identified as not having LD) across all

models. However, the models had low to moderate sensitivity (a low to moderate percentage of children who have LD and are identified as having LD). Only a very small percentage of the population (1%-2%) met criteria for specific learning disabilities using these models. Clinically, it feels like the CHC definition does lower the percentage of children identified as LD, but other definitions, like DSM5, inflate the percentage of children identified as LD. We have no way of knowing what is the "real" percentage of LD in the population, because every study that assesses this percentage is conducted in light of some operational definition, usually a relatively "inflating" one.

Since there are no objective criteria with which we can know who is really learning disabled, Steubing et al cannot argue that "these models had low to moderate sensitivity". The only thing that Steubing et al can say is that PSW models identify less children as learning disabled than other models. But we cannot know whether this fact makes PSW models better or worse at identifying LD.

Kranzler et al. (2016) examined the broad cognitive abilities of the Cattell-Horn-Carroll theory held to be meaningfully related to basic reading, reading comprehension, mathematics calculation, and mathematics reasoning across age groups. Results of analyses of 300 participants in three age groups (6–8, 9–13, and 14–19 years) indicated that the XBA method (a method for implementing the PSW approach) is very reliable and accurate in detecting true negatives (the percentage of children who are not LD and are correctly identified as not having LD). The model identified 92% of children who were not LD as not having LD. However Kranzler et al found the model to have quite low sensitivity, indicating that this method is very poor at detecting the percentage of LD children who are correctly identified as having LD. Only 21% of children who were LD were identified as having LD according to this model.

A brief peek at Kranzler et al's study makes me wonder at the way Kranzler et al determined which abilities were related to which areas of achievement. For example, at age 6-8, Kranzler et al considered the broad abilities Comprehension Knowledge, Long Term Storage and Retrieval, Processing Speed and Short Term Memory as related to basic reading skills. Kranzler et al say they took these links out of McGrew and Wendling's 2010 study. But that study (presented on slide no. 11 in the second presentation of the Intelligence and Cognitive Abilities presentation series on the right hand column of this blog) found that the narrow ability Phonological Coding is also related to basic reading skills at age 6-8! Generally, McGrew and Wendling recommend using combinations of broad and narrow abilities to predict performance in different areas of achievement. Kranzler et al used only broad abilities.

Thus it may be that the low sensitivity found in Kranzler et al's study results from the fact that Kranzler et al did not implement McGrew and Wendling's findings accurately (I have to say this very carefully since I did not read the entire Kranzler paper).

Critical Assumption Two: Cognitive Profile Analysis Is Reliable and Valid

In order to identify LD according to the Flanagan method we have to use cognitive ability/index scores. If they are not reliable – the identification of LD will not be reliable as well.

Significant questions have been raised about the long term stability and structural and incremental validity of factor level measures from intelligence. Structural validity investigations using exploratory factor analysis have revealed conflicting factor structures from those reported in the technical manuals of contemporary cognitive measures which indicates that these instruments may be overfactored (Frazier and Youngstrom 2007). Additionally, the long-term stability and diagnostic utility of these indices has been found wanting. The authors cite a study by Watkins and Smith (2013) who investigated the long-term stability of the WISC-IV with a sample of 344 students twice evaluated for special education eligibility at an average interval of 2.84 years. Test-retest reliability coefficients for the Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI), and the Full Scale IQ (FSIQ) were .72, .76, .66, .65, and .82, respectively. As far as I know, good reliability is considered to be above 0.7. Thus the WMI and the PSI were found to be not reliable in this research and the VCI and PRI had low reliability. However, 25% of the students earned FSIQ scores that differed by 10 or more points, and 29%, 39%, 37%, and 44% of the students earned VCI, PRI, WMI, and PSI scores, respectively, that varied by 10 or more points. Given this variability, Watkins and Smith argue that it cannot be assumed that WISC-IV scores will be consistent across long test-retest intervals for individual students.

In light of this study we ought to give up using index scores and use only the FSIQ score as the lesser evil. In the context of LD this brings us back to the discrepancy model.

However it's possible that during the 2.84 years these children received special education services their cognitive abilities were improved. This can explain the unstableness of the indices in this study. Does this unstableness exist in the general population? In other intelligence tests?

Critical Assumption Three: PSW Methods Have Adequate Treatment Utility

Despite many attempts to validate group by treatment interactions, the efficacy of interventions focused on cognitive deficits remains speculative and unproven. Particularly noteworthy, are the findings obtained from a recent meta-analysis of the efficacy of academic interventions derived from neuropsychological assessment data by Burns et al. (2016). In contrast to the effects attributed to more direct measures of academic skill, it was found that the effects of interventions developed from cognitive data were consistently

small (g=0.17). As a result, Burns et al. (2016) concluded, "the current and previous data indicate that measures of cognitive abilities have little to no [emphasis added] utility in screening or planning interventions for reading and mathematics".

Do other operational definitions of learning disability lead to more efficient interventions?

To summarize, McGill & Busse criticize the CHC/PSW LD definition in three ways: A. the diagnostic validity of the model is weak. B. index scores are not reliable and valid. C. the model does not lead to efficient interventions.

Are you convinced?