McGill, R. J. (2016). Invalidating
the full scale IQ score in the presence of significant factor score
variability: Clinical acumen or clinical illusion?. Archives of Assessment Psychology, 6(1), 49-79. http://assessmentpsychologyboard.org/journal/index.php/AAP/article/viewFile/74/59
There is a positive correlation between intelligence subtest scores
as well as between broad ability scores.
This correlation reflects the common factor that all subtests/ability
factors measure – g (which is estimated by the full scale IQ= FSIQ). Every test or broad ability measure both g
and skills that are specific to this test or ability. For example, Block Design or Visuospatial Processing
measure both g and visual processing.
The positive correlation between the subtests and between the broad
ability scores is a double edged sword:
on the one hand it enables us to measure g. On the other hand its existence means that
the broad abilities are not orthogonal.
One wonders to what extent do broad abilities measure different things
or only one thing? As Horn (1991) cautioned long ago, attempting to disentangle
the different features of cognition is akin to “slicing smoke.”
Broad abilities differ in the extent to which they are good
measures of g and in the extent to which they are good measures of the specific
construct they are supposed to measure.
If Visual Processing in a specific intelligence test measures almost
only g, it will be hard to interpret this Visual Processing score as
representing the child's visuospatial ability.
The FSIQ score is very reliable, stable over time and has
predictive validity. However, many
psychologists think that FSIQ is invalid if there is significant variability in
test scores. Many researchers support
this view. Hale and Fiorello (2004) write:
“you should never [emphasis added] report an IQ score whenever there is
significant subtest or factor variability…and any interpretation of that IQ
score would be considered inappropriate”.
The Technical and Interpretive Manuals for the Wechsler Scales
(Wechsler, 2008; 2014) say that for the FSIQ to be interpreted, the variability
between the lower-order factor scores must not exceed a priori thresholds,
denoting varying degrees of statistical and clinical significance (e.g., 15-20
standard score points). If meaningful variability is observed, users are
encouraged to forego clinical interpretation of the FSIQ and focus all of their
interpretive weight on the profile of obtained factor scores. However, no
validity evidence has been provided in the Technical and Interpretive Manuals
for the Wechsler Scales or other rival measurement instruments to support these
interpretive procedures.
The KABC-II measures the processing and cognitive abilities of
children and adolescents between the ages of 3 years and 18 years. KABC-II
utilizes a dual-theoretical foundation featuring the CHC psychometric model of
broad and narrow abilities and Luria’s neuropsychological theory of cognitive
processing. The CHC interpretive model for ages 7-18 features 10 core subtests,
which combine to yield five first order factor scale scores (Short-Term Memory,
Long-Term Storage and Retrieval, Visual Processing, Fluid Reasoning, and
Crystallized Ability), and a second-order Fluid Crystallized Index (FCI) that
is thought to represent psychometric g. Each CHC factor scale is composed of
two subtest measures.
Kaufman, Lichtenberger, Flecther-Janzen, & Kaufman (2005) write
“If the variability between indexes on the KABC-II [difference between highest
and lowest score] is 23 points or greater, then the meaningfulness of the
global score is diminished. In such cases we encourage examiners to focus
interpretation on the profile of scale indexes and to not interpret the global
score”. Again, this rule is not
supported by any empirical evidence.
Ryan J. McGill, an assistant professor in the school psychology program at The
College of William and Mary sought such evidence. We've met McGill's work in this blog. I very much appreciate his efforts to make
his research accessible to practicing psychologists. Beyond the issues mentioned above, McGill
also asked himself what predicts achievement better – FSIQ or broad ability
scores?
In the paper cited above, McGill describes a study done with the
KABC2 standardization sample.
Participants were 2025 children and adolescents aged 7-19, who were all
the children and adolescents who participated in the standardization sample in
that age range.
McGill selected participants
who presented with a 23 point or higher discrepancy between their highest and
lowest CHC factor standard scores. One
thousand, two hundred and nine participants ages 7-18 (59% of the total
normative sample for that age range) presented with this discrepancy. This means that the CHC
factor score profile variability of 59% of the sample makes FCI global
composite allegedly invalid.
Had McGill selected participants with a 15 point discrepancy, the
percentage of such participants would have risen much above 59%.
McGill found that in this group with the significant variability in
broad ability test scores, the KABC2 structure (five broad ability and FCI)
remained valid. All KABC-II subtests
were saliently and properly associated with their theoretical factor demonstrating
desirable simple structure. Thus, FCI is valid and
interpretable even when factor scores are significantly variable.
The g factor accounted for about 30% of the total variance (the dispersion of test scores) and about
50-60% of the common variance (the relation between measures, or their
common factor). Each of the broad abilities (Crystallized Ability, short term
memory, long term retrieval, visual processing and fluid reasoning) accounted
for an additional 2% (fluid reasoning) to 9% (short term memory) of the total
variance and between 3-4% (fluid reasoning) and 16% (short term memory) of the
common variance. This means that additional
consideration of short term memory may provide users with useful information as
it relates to individual performance beyond g when significant levels of
scatter are observed. The four remaining
broad abilities add much less useful information beyond g. when psychologists interpret at the
broad ability level, they don't always consider the significant influence of g
on the child's performance in that ability.
What predicts achievement better – g or broad abilities? To answer this question McGill used the KABC2
achievement tests. The FCI
accounted for large and statistically significant effects across the reading,
math, and written language indicators. Although the incremental predictive
contributions of the CHC factor scores across all achievement variables were
statistically significant, effect size estimates for these effects were
consistently small with
only the Crystallized Ability factor contributing anything beyond trivial
effects (9%) in the reading model for ages 13-18.
No comments:
Post a Comment