Beyond IQ: Why profile analysis and ipsative scoring may be a diagnostic sin

ברוכים הבאים! בלוג זה נועד לספק משאבים לפסיכולוגים חינוכיים ואחרים בנושאים הקשורים לדיאגנוסטיקה באורייטנצית CHC אבל לא רק.

בבלוג יוצגו מאמרים נבחרים וכן מצגות שלי וחומרים נוספים.

אם אתם חדשים כאן, אני ממליצה לכם לעיין בסדרת המצגות המופיעה בטור הימני, שכותרתה "משכל ויכולות קוגניטיביות".

Welcome! This blog is intended to provide assessment resources for Educational and other psychologists.

The material is CHC - oriented , but not entirely so.

The blog features selected papers, presentations made by me and other materials.

If you're new here, I suggest reading the presentation series in the right hand column – "intelligence and cognitive abilities".

Wednesday, November 25, 2015

Why profile analysis and ipsative scoring may be a diagnostic sin

"Ben's ability to analyze and synthesize an abstract stimulus is lower than his ability to analyze and synthesize a meaningful stimulus (block design – 8, object assembly – 12)". Sounds familiar? This kind of analysis is called profile analysis, or subtest discrepancy analysis. It's very tempting to conduct this kind of analysis, since it gives us the feeling that we have meaningful things to say about the child.

In the 1940s, leading psychologists were engaged in profile analysis. At that period of time, theories were developed about the clinical meaning of different subtest profiles (profiles that can identify specific diagnostic groups: learning disabilities, emotional disabilities and so on).

Here is an example of a profile analysis done by David Wechsler himself in 1944: "White, male, age 15, 8^th grade. Continuous history of stealing, incorrigibility and running away. Several admissions to Bellevue Hospital, the last one after suicide attempt. While on wards persistently created disturbances, broke rules, fought with other boys and continuously tried to evade ordinary duties. Psychopathic patterning: Performance higher than Verbal, low Similarities, low Arithmetic, sum of Picture Arrangement plus Object Assembly greater than sum of scores on Blocks and Picture Completion".

A popular method of profile analysis is ipsative scoring: the scaled scores of the subtests are added and divided by the number of tests to get the child's average score. An ipsative score is computed for each subtest by subtracting the child's scaled score in this subtest from his average. Psychologists who use this method assume that scores which deviate significantly from the child's personal average are important clinical indicators for "strengths" and "weaknesses" of the child. Weaknesses are assumed to be caused by learning disabilities. The focus on this kind of analysis is in identifying discrepancies within the child himself.

This kind of analysis assumes that a scatter of subtest scores is typical of people who are learning disabled or people who have emotional or neurological problems, and that a flat profile is typical of "normal" people. This means that a person who functions normally is supposed to get a similar score on all subtests.

This assumption is wrong. According to the american standardization sample of the WISC4 and the WAIS3, only 3-4% of people have a flat subtest profile or a profile with a deviation of only one point between the subtest scaled scores. This means that subtest score scatter is normal.

For years we've learned to look for discrepancies between single subtest scores or between index scores or between the "verbal IQ" and the "performance IQ". Many people have learning disability, but do not have a significant discrepancy between "PIQ" and "VIQ". The opposite is also true: many people who are not learning disabled have a significant discrepanty between the "PIQ" and the "VIQ". This is also true for discrepancies between single subtest scroes: such discrepancies are not a necessary nor a sufficient condition for learning disability diagnosis. Moreover: there is no subtest profile that can identify a specific diagnostic group.

Even when the difference between two subtest scores or two index scores is statistically significant (not a result of error or chance), it still doesn't mean that the difference is clinically significant or an indication of a disability. Statistically significant differences are not always rare or even meaningful. Some psychologists look at the frequency of the size of the discrepancy in the general population. Rare discrepancies, which have a frequency of less than 10% in the general population, are considered to be significant in learning disability assessment. But this analysis often does not compare the scores to the population norms. As was said before, large discrepancies between the scaled scores of the subtests are common. If the lowest score in such an intra-individual comparison (for example, a comparison between the scores of two subtests) is within or above normal limits, we cannot consider it as an indicator of disability, even if there is a large difference between it and the rest of the child's subtest scores. That's because an average (or above average) ability is, by definition, not a disability. It's hard to argue that a scaled score of 10 is an indicator of a learning disability simply because all the other scores of the child are 13 and above. There is no basis to the belief that average abilities in some areas together with above average abilities in other areas are indicative of a learning disability.

To quote Flanagan's analogy about Michael Jordan: Michael Jordan has a superb ability to play basketball. But it's not reasonable to assume that all his athletic skills are developed to the same degree. Michael Jordan's ability to play baseball and golf is much worse than his ability to play basketball, even if he still is better than average both in baseball and in golf. It would be ludicrous to argue that Michael Jordan has an athletic disability because he plays baseball and golf "only" at a good and not a superb level! Significant variability between subtest scores is a normal situation. The expectation for a flat profile is unfounded.

Drawing conclusions from a discrepancy between two subtest scores is based on the premise that it's possible to draw conclusions from the score of a single subtest. But a single subtest is not a reliable measure of the cognitive construct or ability it is supposed to measure (for example, the vocabulary test in itself can't measure comprehension knowledge properly). In order to measure a broad cognitive ability properly, one has to use at least two qualitatively different measures (that is, comprehension knowledge should be measured with at least two subtests, each of them measuring a different aspect (a different narrow ability) of comprehension knowledge). In some cases, three subtests are needed, especially when there is a statistically significant difference between the scores of two subtests that were used, or when we want to assess the ability in a broader and deeper way.

Insteas of comparing betweeen single subtests, Flanagan suggests comparing the child's scores on the broad cognitive abilities, each of them measured by two to three tests, to the average population norms. Instead of looking for relative weaknesses (of the child compared to himself) in single subtests, we should look for normative weaknesses (of the child compared to the norms for his age) in the broad cognitive abilities. If, for example, the child's scores in tests measuring processing speed are significantly lower than the population mean (lower than 7), the child may have a processing speed disability. Remember, even a significantly low score on one of the broad cognitive abilities, measured by a number of subtests, is not an indicator of learning disability unless the child meets the definition criteria for learning disabilities.

Source:

Flanagan, Dawn p., Ortiz, Samuel O. and Alfonso, Vincent C. Essentials of cross battery assessment. Second edition, 2007, Wiley and sons.