Beyond IQ: Prof. Ryan McGill's reply

Here is Prof. Ryan McGill's reply to my post about his paper "Re (Examining) Relations between CHC Broad and Narrow Cognitive Abilities and Reading Achievement":

First, I think it is important to point out that I am not opposed to CHC theory or recommend applications of CHC (i.e., XBA, variants of the PSW model) as a matter of course. CHC has many positives, most notably it has provided a common nomenclature by which we can discuss issues related to cognitive abilities and the instruments that measure them. My concern is primarily with the measurement and underlying psychometric integrity of these constructs. Theory can be instructive but it is not an appropriate source of validity. Whereas CHC theory posits 7-9 broad abilities at Stratum II, independent researchers have had a difficult time locating many of these dimensions on contemporary IQ tests. Again, this is not to say that these dimensions are not real, they just may not be measured well, if at all, on some tests. Thus, the scores that are provided to clinicians and presented as capable of clinical interpretation for such dimensions will be of little clinical utility.

With respect to the WJ-III/WJ-IV, multiple studies (e.g., Dombrowski, 2013; Dombrowksi, McGill, & Canivez, 2014) using the very factor analytic procedures recommended by Carroll (1993) suggest a different structure for the instrument. Specifically, the publisher suggested CHC model was not supported. Instead a 3-4 factors solution with complexly determined factors and theoretically inconsistent cross-loading and subtest migration was preferred. Given the WJ has been the preeminent reference instrument for CHC theory development and refinement since 2001 and remains the only commercial ability measure purporting to measure all CHC broad abilities, this should give us concern as structural validity is necessary but singularly insufficient for construct validity.

A growing corpus of psychometric research has ably demonstrated that virtually all commercial ability measures are good measures of psychometric g and that level of interpretation (i.e., FSIQ) has the most psychometric support. Unfortunately, this same body of research suggest less confidence in clinical interpretation of Stratum II and Stratum III abilities. Again, this is a discrepancy between what we want to do versus what the mathematics indicate that we can do. Broad abilities are saturated with g variance. You may argue that this is trivial but I disagree…it creates a profound interpretive confound. How do you determine what is influencing examinee performance (complexly determined factors are even more difficult to deal with)? In fact, Carroll (1995) insisted this be done so that clinicians wouldn’t be tempted to over-interpret cognitive measures and go down blind alleys. If one wants to focus their interpretive weight at the Stratum II level of measurement than there needs to be enough target variance captured by those constructs…pure and simple. If the scores tell us little more than g/FSIQ, it is difficult for me to envision how they would be of much use.

This is where the problem lies, many broad ability measure simply contain insufficient target construct variance for confidant clinical interpretation. The issue cannot be resolved be simply ignoring g or suggesting that construct is inconsequential…positions that are frequently encountered in the CHC literature. As noted by Cucina & Howardson (2016), “an important element that is missing in CHC but present in Carroll’s work is the incorporation of the magnitude of the unique factor loadings [what is accounted for by S2/S3 abilities]. Under the Three Stratum Theory, the magnitude of the non-g loading’s is low and this is made quite clear. We are unaware of any CHC publications that recognize these low magnitudes” (p. 13).

Although you suggest that knowing about broad abilities will lead to better interventions, scientific support for this position has long been found wanting. Long ago, Cronbach and Snow (1977) indicated that the search for aptitude by treatment interactions (ATI) was akin to entering a “hallway of mirrors” and not much has improved in the last 40 years. Lest I be accused of pontificating, there are many researchers who suggest that exact opposite. So what gives? I think it is worth noting that a white paper by Shinn and colleagues (2010) evaluated the quality of the evidence-base for an LD position paper supporting this position and found that 73% of the citations listed were for commentary articles, non-empirical book and book chapters, literature reviews, and case studies. More recently, Burns (2016) et al. conducted a meta-analysis and found that effect sizes associated with intervention derived from cognitive/neuropsychological data were mostly trivial. In sum, the best interventions target underlying academic weaknesses, consideration of discrete cognitive skills does not seem to help much. Of course, at this point, someone usually invokes phonological processing. My answer to that is sure, but couldn’t you also get that same information from a comprehensive achievement battery (most of which now all provide an estimate of phonological skills) or more parsimonious CBM measures?

Look I get it, there is comfort in all this cognitive profile analysis stuff. I was taught to do it just like everyone else. We get a reinforcing effect from it, looking at a child’s profile of scores and speculating about what X or Y may mean to many of us is the very embodiment of being a school/educational psychologist. I think it would be really cool if we could do these things, I really do, however the underlying psychometrics suggest that we probably can’t and certainly not for lack of trying over the last 50 years! Plus, there are all the well documented issues with clinical judgement and our inability as clinicians to effectively deal with complex information (i.e., a multitude of psychoeducational scores) in the presence of uncertainty. This is another discussion in and of itself albeit vitally important. On these issues, the eminent works of famed psychologists Paul Meehl, Amos Tversky, and Daniel Kahneman have been particularly influential. I believe that if more judgement psychology research was stressed in clinical training programs, practitioners would view these approaches to test interpretation and clinical decision-making with more skepticism.

In sum, I think we and our charges would all be better off if we focused more energy on the “school” in school psychology and placed less emphasis on profile analysis which in my view is nothing more than psychometric phrenology at this point (credit goes to Stefan Dombrowksi for coining the term).

Beyond IQ

Thursday, July 27, 2017

Prof. Ryan McGill's reply

No comments:

Post a Comment