Beyond IQ: Does significant factor score variability render FSIQ invalid?

McGill, R. J. (2016). Invalidating the full scale IQ score in the presence of significant factor score variability: Clinical acumen or clinical illusion?. Archives of Assessment Psychology, 6(1), 49-79. http://assessmentpsychologyboard.org/journal/index.php/AAP/article/viewFile/74/59

There is a positive correlation between intelligence subtest scores as well as between broad ability scores. This correlation reflects the common factor that all subtests/ability factors measure – g (which is estimated by the full scale IQ= FSIQ). Every test or broad ability measure both g and skills that are specific to this test or ability. For example, Block Design or Visuospatial Processing measure both g and visual processing.

The positive correlation between the subtests and between the broad ability scores is a double edged sword: on the one hand it enables us to measure g. On the other hand its existence means that the broad abilities are not orthogonal. One wonders to what extent do broad abilities measure different things or only one thing? As Horn (1991) cautioned long ago, attempting to disentangle the different features of cognition is akin to “slicing smoke.”

Broad abilities differ in the extent to which they are good measures of g and in the extent to which they are good measures of the specific construct they are supposed to measure. If Visual Processing in a specific intelligence test measures almost only g, it will be hard to interpret this Visual Processing score as representing the child's visuospatial ability.

The FSIQ score is very reliable, stable over time and has predictive validity. However, many psychologists think that FSIQ is invalid if there is significant variability in test scores. Many researchers support this view. Hale and Fiorello (2004) write: “you should never [emphasis added] report an IQ score whenever there is significant subtest or factor variability…and any interpretation of that IQ score would be considered inappropriate”. The Technical and Interpretive Manuals for the Wechsler Scales (Wechsler, 2008; 2014) say that for the FSIQ to be interpreted, the variability between the lower-order factor scores must not exceed a priori thresholds, denoting varying degrees of statistical and clinical significance (e.g., 15-20 standard score points). If meaningful variability is observed, users are encouraged to forego clinical interpretation of the FSIQ and focus all of their interpretive weight on the profile of obtained factor scores. However, no validity evidence has been provided in the Technical and Interpretive Manuals for the Wechsler Scales or other rival measurement instruments to support these interpretive procedures.

The KABC-II measures the processing and cognitive abilities of children and adolescents between the ages of 3 years and 18 years. KABC-II utilizes a dual-theoretical foundation featuring the CHC psychometric model of broad and narrow abilities and Luria’s neuropsychological theory of cognitive processing. The CHC interpretive model for ages 7-18 features 10 core subtests, which combine to yield five first order factor scale scores (Short-Term Memory, Long-Term Storage and Retrieval, Visual Processing, Fluid Reasoning, and Crystallized Ability), and a second-order Fluid Crystallized Index (FCI) that is thought to represent psychometric g. Each CHC factor scale is composed of two subtest measures.

Kaufman, Lichtenberger, Flecther-Janzen, & Kaufman (2005) write “If the variability between indexes on the KABC-II [difference between highest and lowest score] is 23 points or greater, then the meaningfulness of the global score is diminished. In such cases we encourage examiners to focus interpretation on the profile of scale indexes and to not interpret the global score”. Again, this rule is not supported by any empirical evidence.

Ryan J. McGill, an assistant professor in the school psychology program at The College of William and Mary sought such evidence. We've met McGill's work in this blog. I very much appreciate his efforts to make his research accessible to practicing psychologists. Beyond the issues mentioned above, McGill also asked himself what predicts achievement better – FSIQ or broad ability scores?

In the paper cited above, McGill describes a study done with the KABC2 standardization sample. Participants were 2025 children and adolescents aged 7-19, who were all the children and adolescents who participated in the standardization sample in that age range.

McGill selected participants who presented with a 23 point or higher discrepancy between their highest and lowest CHC factor standard scores. One thousand, two hundred and nine participants ages 7-18 (59% of the total normative sample for that age range) presented with this discrepancy. This means that the CHC factor score profile variability of 59% of the sample makes FCI global composite allegedly invalid.

Had McGill selected participants with a 15 point discrepancy, the percentage of such participants would have risen much above 59%.

McGill found that in this group with the significant variability in broad ability test scores, the KABC2 structure (five broad ability and FCI) remained valid. All KABC-II subtests were saliently and properly associated with their theoretical factor demonstrating desirable simple structure. Thus, FCI is valid and interpretable even when factor scores are significantly variable.

The g factor accounted for about 30% of the total variance (the dispersion of test scores) and about 50-60% of the common variance (the relation between measures, or their common factor). Each of the broad abilities (Crystallized Ability, short term memory, long term retrieval, visual processing and fluid reasoning) accounted for an additional 2% (fluid reasoning) to 9% (short term memory) of the total variance and between 3-4% (fluid reasoning) and 16% (short term memory) of the common variance. This means that additional consideration of short term memory may provide users with useful information as it relates to individual performance beyond g when significant levels of scatter are observed. The four remaining broad abilities add much less useful information beyond g. when psychologists interpret at the broad ability level, they don't always consider the significant influence of g on the child's performance in that ability.

What predicts achievement better – g or broad abilities? To answer this question McGill used the KABC2 achievement tests. The FCI accounted for large and statistically significant effects across the reading, math, and written language indicators. Although the incremental predictive contributions of the CHC factor scores across all achievement variables were statistically significant, effect size estimates for these effects were consistently small with only the Crystallized Ability factor contributing anything beyond trivial effects (9%) in the reading model for ages 13-18.

Beyond IQ

Wednesday, June 6, 2018

Does significant factor score variability render FSIQ invalid?

No comments:

Post a Comment