On the Asymmetry of g
On the asymmetry of g
Martin G. Evans
Faculty of Management
University of Toronto
Toronto Ontario Canada
Abstract
In this paper I explore the strength of general intelligence at different levels of g. By
undertaking a confirmatory factor analysis of a standard ability measure based on a large sample
(the ASVAB) at high and low levels of g, I show that at lower levels of g the
factor structure is
quite similar to, though not as well defined as, the factor structure for the complete sample;
however, at higher levels of g, the factor structure shows some difference. This implies
that
clever people are clever in quite different ways, while those with low g have intellectual
deficits
across the board. The dominance of g in the factor structure is due to this similarity of
scores at
the bottom end of the set of abilities. The prevalence of specific abilities is due to the variety of
ways in which people can demonstrate high ability.
1
Over the years there has been a great deal of controversy over the existence of g. On
the
one hand a number of scholars (Gottfredson, 1986; Jensen, 1986; Terman, 1916) have argued
that a single (and inherited) ability underlies the varied competencies demonstrated by
individuals in their everyday lives. In psychometric terms this means that a single general factor
underlies the many specific abilities observed in everyday behavior. This position has most
recently been argued in the more popular literature in The Bell Curve (Herrnstein & Murray,
1994). On the other hand an equally numerous group (Gardner, 1983; Horn, 1985; Sternberg,
1988; Thurstone, 1938) have argued for a view that there are a set of different abilities with only
a modest underlying single factor structure. Steven Jay Gould's The Mismeasure of Man (1981)
is the most recent popularization of this position.
The arguments between the proponents of both positions have raged on for over 50 years
with an inconclusive outcome. The recent publication of The Bell Curve (Herrnstein & Murray,
1994) has exacerbated this discussion. Most of the furore about that book has been around the
issue of the degree of heritability of g, and, more acrimoniously, whether heritable within
group
differences in g can be generalized to inter-group differences. The aim of this paper is
more
restricted: to explore whether, in an IQ measure with 10 highly g-loaded components
(Herrnstein
& Murray, 1994) there are differences in the factor structure at different levels of
g.
Gardner (Gardner, 1983) has spoken most eloquently for the multidimensional position.
He states:
there is persuasive evidence for the existence of several relatively autonomous human
intelligence competences, abbreviated hereafter as "human intelligences." ...
The exact nature and breadth of each "frame" [competence or intelligence] has
not so far been satisfactorily established, nor has the precise number of intelligences been fixed.
But the conviction that there exist at least some intelligences, that these are relatively
independent of one another, and that they can be fashioned and combined in a multiplicity of
adaptive ways by individuals and cultures, seems to me to be increasingly difficult to deny. (p.
8-9, italics in the original).
On the other hand, Jensen (1986) puts the case for g most strongly. He argues (p.
318-319) from a measurement perspective that even if there is a multifaceted set of abilities, in
order
to fit measures of these abilities to a simple structure (Thurstone's criterion), a non-orthogonal
factor structure solution (oblique solution) must be generated; as a consequence, these correlated
factors can be subjected to a second-order factor analysis with the consequent emergence of
g.
Thus g results from a strong correlation between the scores on a series of ability tests.
I have, of course, stated the very extreme positions. Most scholars of intelligence take an
intermediate position similar to that of Vernon (1969) who argues that both g and
specific
intelligences (usually two or three -- mathematical, verbal, mechanical) are necessary for
understanding the competencies that people bring to their daily lives.
The point made by Jensen, Herrnstein and Murray, and others is that people who are high
in ability can do many things well. It occurred to me, after re-reading Davis' (1971) work on the
generation of "interesting" hypotheses, that perhaps these scholars were
focusing their attention
at the wrong end of the distribution: perhaps the emergence of g was due to the
likelihood
of the
less intellectually endowed people having low endowments in all the abilities, whereas those
with higher g might be well endowed with two or three abilities, and less well endowed
in
others.
2 Were this to be the
case, then we would expect the
average
intercorrelations between the
tests and the factor structure of the tests to vary as we examined that structure at different levels
of g. At low levels of g we would expect to find a factor structure very similar to that
resulting
from an analysis of the whole sample (with a full range of g). This is because we expect
to find
that people in the lower range of g to have similar scores on all the specific abilities. On
the other
hand, at moderately high and high levels of g, we expect there to be quite a different
factor
structure as people at the high end will be quite high on some subscales but may be low or
average on others. 3 So a
relatively high g can be obtained in a variety of different ways. By
exploring intelligence tests (such as the ASVAB) for groups of people with different levels of
g,
we can test this hypothesis.
It turns out that this idea is not new. In 1927, Spearman (1927) talked about a law of
diminishing effect that demonstrated a weakening of the intercorrelations among scales at a
high
level of g (see also (Deary & Pagliari, 1991). (Detterman & Daniel, 1989) have
demonstrated
this with respect to modern tests of intelligence, the WAIS-R and the WISC-R. They find
correlations between subtests of about .30 for the high ability subjects and of about 0.60 for low
ability subjects (see (Nesselroade & Thompson, 1995) for a contrary finding).
This relatively simple idea has the potential, if supported in this study and replicated on
other data sets, of shedding light on the controversy between the g theorists and their
opponents
who espouse a view of multiple intelligences. Furthermore (Detterman & Daniel, 1989) argue
that understanding these differences may ultimately contribute to our understanding of the
broader social issues implicated in the debates over the nature of intelligence.
Method
I undertook a reanalysis of the 1981 ability data from the National Longitudinal Survey of
Youth (1981). The ability data consisted of the ten subscales from the Armed Services
Vocational Aptitude Battery (ASVAB). This consists of ten subscales measuring General
Science (GenSci)
4, Arithmetic Reasoning (ArithR)
5, Word Knowledge (WordKnow)
6, Paragraph Comprehension
(ParaComp)
7, Numerical Operations (NumOps)
8, Coding Speed (CodeSp)
9, Automobile and Shop Information
(Auto)
10, Mathematics Knowledge (MathK)
11, MechanicalComprehension
(MechComp)
12, Electronics Information (Elec)
13. A subset of these tests
Arithmetic Reasoning (ArithR), Mathematics Knowledge (MathK), Paragraph Comprehension
(ParaComp), and Word Knowledge (WordKnow) makes up the Armed Forces Qualifying Test
(AFQT), a test whose components are known to have high correlations with g (between
0.8 and
0.9).
In the analysis to follow, I had intended to use both a broad scale based on the whole
ASVAB and the narrower scale based on AFQT as two indicators of general ability:
gASVAB and gAFQT; the narrower measure being made up of tests with higher
g-loadings so would have
provided a more rigorous test of my hypothesis. However, (Detterman & Daniel, 1989) have
pointed out that using a composite of the tests to be factored as a classification variable creates
an
artifact of negative correlations between the variables -- because, at any range, a higher score on
the test has to be balanced by a lower score on another test in order to keep the total score within
the range. This is indeed the case, so an alternative strategy had to be implemented. What is
needed is an independent measure of g based upon a large number of individuals to
provide a
means of classifying these individuals. I employed three strategies to accomplish this end. One
maximizes the size of the sample by using a subscale of the ASVAB; the other two maximize the
independence of the indicator of g by using some alternative measures that were
available
in this
set of data.
Method I: Use of an ASVAB subscale.
With only two scales measuring arithmetic ability and two scales measuring verbal
ability, it would not be a sensible strategy to use one of these as indicators of g even
though they
are highly correlated with g. Using one of these scales would have made it difficult for a
verbal
factor or a quantitative factor to emerge in the subsequent analyses. There was not another
g test
available with a large sample of individuals. Accordingly I compromised by choosing the
practical scale (Electronics Information) which had the highest loading (0.84) on g in
this
sample. This loading is as strong as the lower of the numerical and verbal test loadings (0.83 and
0.84 respectively). This would seem to be a reasonable, though not perfect, substitute for
g14
The major problem with this measure as a substitute for g is the difference between men
and
women on this dimension. Men score more highly. Accordingly any analyses performed were
undertaken separately for the male and female subsamples.
The correlations (male sample, female sample) were computed for the top 5% and bottom
5% of people based upon the Electronics Information Scale. These correlations were then
corrected for restriction of range so that both the high and low group correlations were made
equivalent to what those correlations would have been in the full range sample (Detterman &
Daniel, 1989). This helps assure that any differences in correlation are not due to differences in
the range of values on each test. Average corrected correlations were computed and compared.
To test the similarity of factor structures, following Nesselroade & Thompson (1995) the
original covariances in the two groups were subjected to a comparative, between-group LISREL
analysis that examined whether or not the top and bottom group fit the same model. We will fit
the data to a single factor model for two reasons. First, I want to test the proposition that a singe
factor underlies the variety of intelligences. Second, psychometrically this restricts the number
of
parameters that can vary in the LISREL model. With only a single factor, there can be no
question of intercorrelations between factors so the only source of variation are the loadings of
the indicators on that single factor. This greatly simplifies the analysis and the interpretation of
our results.
Method II: Combined data from several g measures.
To accomplish this, I took advantage of the richness of the data in the NLSY. In addition
to the ASVAB data, the researchers gathered from the participant's schools data on previously
taken Intelligence tests. A wide variety of tests had been completed by different participants in
the study: California Test of Mental Maturity, Otis-Lennon Mental Ability Test,
Lorge-Thorndike Intelligence Test, Henmon-Nelson Test of Mental Maturity,
Kuhlman-Anderson
Intelligence Test, Differential Aptitude Test, Coop School and College Ability Test,
Stanford-Binet Intelligence Scale, Weschler Intelligence Scale for Children. These tests were
taken at
different grades and different ages for different individuals and were sometimes reported as raw
scores, and sometimes as percentiles. The following procedure was followed. To control for age
effects, I classified individuals by the age at which they took the test. For some tests (CTMM,
Otis-Lennon, DAT, Lorge-Thorndike) that had large samples I was able to use a fine grained
categorization, for others I was only able to generate median splits. I then chose the top 5% and
the bottom 5% in each age range and accumulated them to give me a high and low sample of
individuals based upon each test. Four of the tests gave reasonable sample sizes (c. 40 people). I
restricted further analyses to these. I computed correlations between all the subtests of the
ASVAB for the high and low IQ groups based on each of the four IQ tests. These correlations
were then corrected for restriction of range. The corrected correlations were then averaged
across
the four tests (a formal meta analysis showed that some item pairs had high variance in the
correlations remaining after correcting for sampling error). The pooled correlation matrices for
the high and low IQ groups were then averaged (across the subtests of the ASVAB) and the
mean correlations for the high and low IQ groups were compared. The additional analysis of a
comparative between-group LISREL analysis was also performed on these data. As there were
no
gender differences on these indicators of g and sample sizes were small, only data for the
whole
sample were analyzed.
Method III: The use of a sample based on the Otis-Lennon indicator of g.
The Otis-Lennon Mental Ability test was the test taken by the largest number of
individuals. Accordingly, I chose the top and bottom 10% of persons taking this test, computed
correlations among the subtests of the ASVAB (after correcting for range restriction), compared
the mean correlations between the high and low IQ groups and, after also computing means and
standard deviations for each subgroup, I subjected these covariances (based on uncorrected
correlations) to a between-group confirmatory factor analysis using LISREL.
Sample
The original sample consisted of 12,686 young persons who were surveyed for the
National Longitudinal Survey of Youth. The 1981 administration of the ASVAB missed some
772 subjects so that the subsequent analysis is based upon a sample size of 11,914. I selected for
this analysis only those individuals who were 16 years of age or older at the time they
completed
the ASVAB so as to include a sample of sufficient maturity that g would have stabilized.
This
resulted in a sample of 8,575.
In summary, for the first analysis, I stratified the sample into two large groupings based
upon their scores on the Electronics Information scale. I chose the bottom 5% of the male
sample
and the bottom 5% of the female sample to represent the less intelligent groups; I chose the
same
proportion at the top of the curve to represent the higher IQ members of the sample. The actual
number of subjects used in each group do not conform exactly to this ideal owing to some
"lumpiness" on the distribution.
For my second analysis I created two groups based upon all the IQ data available from the
respondents' schools: one high on one or more of the tests, the other low on one or more of the
tests. A person was selected to be in the low group for the test if he or she was in the lowest
tenth
percentile on the percentile scores for any of these tests or if he or she was in the lowest 10% of
the raw score on one or more of the tests. Similar decision rules were used to identify the high
IQ
group. Some people (n =14) who scored high on one test and low on another were excluded from
the analysis.
A third analysis was performed using the test given most frequently (n=1191) of the
survey respondents: the Otis-Lennon Mental Ability Test. Those in the highest and lowest 10%
were selected for analysis.
Results
High and Low on Electronics information
In this analysis I separated the sample into two gender based subgroups before
intercorrelating the nine remaining ASVAB scales. In the top 5% of the men, the corrected
average correlation was 0.54, in the bottom 5% of the men it was 0.75. The differences for
females were more striking: for the top group, 0.39; for the bottom group, 0.85. I then took the
original covariance matrices to check whether or not the same single factor solution fitted. This
was accomplished by comparing two CFA's. In the first, the same structure of the model (single
factor, each subscale free to load on the factor) was imposed, but the parameters were allowed
to vary. In the second, the parameters of for each subscale were constrained to be the same for
both the high and low groups. No significant difference for the fit of the models would support
the existence of the same model at both ends of the g range. A significant difference
would
support the hypothesis advanced here that the structure of g differed at the two ends of
the
continuum. This what I found: in the male sub sample, chi-square for the constrained model was
453.68 (df 63), for the unconstrained model the chi-square was 327.62 (df 54), therefore the
difference in chi-square of 19.64 (df 10) was statistically significant (p<. 001) Similar results
were found for the female sample (constrained 410.57, 63 ; unconstrained 290.60, 54; difference
119.97, 9, p< .001).
High and Low groups on a variety of IQ tests.
In this analysis I have the advantage of combining a subset of data based upon people's
scores on a number of independent IQ tests. This gives me the ability to use the full ASVAB (all
ten items) in the analysis at the cost of having a severely limited sample size. The average
corrected correlation for the high g group was 0.59, the correlation for the low g
group was 0.71.
The difference is not as dramatic as in the previous analysis, but still significant. The chi-square
for the constrained model was 343.64 (df 80), for the unconstrained model the chi-square was
324 (df 70), therefore the difference in chi-square of 19.64 (df 10) was statistically significant
(p<. 05).
High and Low on the Otis.
In this analysis I use a single IQ test to carry out the classification This again gives me the
ability to use the full ASVAB (all ten items) in the analysis and allows me to use subjects drawn
from the extremes of a single test ensuring comparability . As the sample size was only 1190, I
used the top 10% and bottom 10% of the sample. At the top end of the Otis, the average
correlation was 0.35; at the low end, the average correlation was 0.50. The matrices were
significantly different. The chi-square for the constrained model was 222.50 (df 80), for the
unconstrained model the chi-square was 199.08 (df 70), once again the difference in chi-square
of 23.42 (df 10) was statistically significant (p<. 01).
Conclusion and Implications
The evidence reported in this paper suggests that the factor structure changes over the
range of g. I have undertaken three analyses with different strengths and weaknesses: one
with a
good sized sample but a weak indicator of g for classifying the subjects, a second based
on a
mixture of indicators of g, and the third with a single test indicator of g but with
a
small sample
size. Each analysis came up with the same results: the average correlations and the factor
structure differ at the two ends of the continuum. The scales are more highly intercorrelated at
the bottom end of the spectrum of g than at the top end.
The weaknesses in this study lies in the nature of the ASVAB. It is a relatively short test
and does not discriminate well at the high end of the spectrum. This lack of discrimination has
two effects that counterbalance each other. First it gives a number of persons top scores on all
tests which would inflate the correlation in the same way as my simulation (see earlier
comment); on
the other hand, the restricted range attenuates the relationships. My correction for restriction of
range leaves the enhancing factor still operating so that the differences between high and low
groups found here are likely to be underestimates. It might also be argued that the test will have
lower reliability at the high end of the spectrum. Were this to be the case, this would reduce the
average correlations between tests at the high end of the spectrum relative to those at the lower
end of the spectrum. To equate correlations at the high and low end of the spectrum, the
reliabilities of the tests at the high end would have to be 70% of the reliabilities of the same tests
at the low end. A difference of this magnitude in such a well validated test seems implausible.
Two additional programs of research are required. First to assess the generalizability of
this finding. Detterman and Daniel (1989) point to scattered sightings of the phenomenon in the
literature. Careful reanalysis of large data sets using a variety of standard tests are required. The
second program is more applied: to explore predictive or job related validity. The conventional
finding (Hunter, 1986) is that job tailored or job related tests to not predict better than g.
We
now need to explore whether or not there is differential validity between general and specific
tests at high or low levels of g. At its lower range, we would anticipate that g
would be a good
predictor of job performance. However, at higher levels, it is expected that the individual facets
would have better validity.
The implications for the understanding of intelligence are important. The possibility of a
breakdown of g at higher levels of intelligence, even with a narrow range of tests (as in
the
ASVAB) means that we have to reexamine the nature of intelligence. Hunt (1995) has argued
that these findings support an information processing view of intelligence.
15
There may be a single
driving factor at the low end, but it may manifest in a variety of different ways at the top end.
More work on exploring the behavior of g at different levels is required.
Author's Note
Vinay Kanetkar helped with the LISREL analyses; Kai Lamertz generated the correlation
matrices based on various IQ tests. I thank them as well as Joel Baum, Howard Gardner, Linda
Gottfredson, Hugh Gunz, Lloyd Humphreys, Diane Irvine, Arthur Jensen, Gary Latham, Andy
Mitchell, Malcolm Ree, Frank Schmidt, and Jacob Siegel for their helpful comments on earlier
drafts of this paper. Kathrine Evans added the html tags for the footnotes in this paper.
References
Davis, M. S. (1971). That's interesting! Philosophy of the Social Sciences}, 1,
309-344.
Deary, I. J., & Pagliari, C. (1991). The strength of g at different levels of ability: Have
Detterman and Daniel rediscovered Spearman's "Law of Diminishing Returns"?
Intelligence, 15,
247-250.
Detterman, D. K., & Daniel, M. H. (1989). Correlates of mental tests with each other and
with cognitive variables are highest for low IQ groups. Intelligence, 13, 349-359.
Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York, NY:
Basic Books.
Gottfredson, L. S. (1986). The societal consequences of the g factor in employment.
Journal
of Vocational Behavior, 29, 379-410.
Herrnstein, R. J., & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in
American Life. New York, NY: Free Press.
Horn, J. L. (1985). Remodelling old models of intelligence. In B. Wolman (Ed.), Handbook
of Intelligence: Theories, measurements, and applications (pp. 267-300). New York: Wiley.
Hunt, E. (1995). The role of intelligence in modern society. American Scientist, 83,
356-368.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job
performance. Journal of Vocational Behavior, 29, 340-362.
Jensen, A. R. (1986). g: Artifact or reality. Journal of Vocational Behavior, 29,
301-331.
Larson, G. E., & Wolfe, J. H. (1994). Validity results for g from an expanded test base.
Intelligence, 20, 15-25.
Nesselroade, J. R., & Thompson, W. W. (1995). Selection and related threats to group
comparisons: An example comparing factorial structures of higher and lower abilities of adult
twins.
Psychological Bulletin, 117, 271-284.
Ree, M. J., & Carretta, T. R. (1994). Factor analysis of ASVAB: Confirming a Vernon-like
structure. Educational and Psychological Measurement, 54, 457-461.
Spearman, C. E. (1927). The Abilities of Man. London, UK: Macmillan.
Sternberg, R. J. (1988). The Triarchic Mind: A New Theory of Human Intelligence. New
York, NY: Penguin.
Terman, L. M. (1916). The Measurement of Intelligence. Boston, MA:
Houghton-Mifflin.
Thurstone, L. L. (1938). Primary Mental Abilities (Psychometric Monographs I ed.).
Chicago, IL: University of Chicago Press.
Vernon, P. E. (1969). Intelligence and Cultural Environment. London, UK: Methuen.
1
One can visualize this in three
dimensions by imagining the various abilities as flowers arranged in a narrow vase -- at the
bottom they are bound together tightly, at the top they spread out broadly!
back
2
Before undertaking an analysis with real data, the plausibility of this position was
examined using simulated data. A set of ten orthogonal variables (with mean 100 and sd 15 --to
simulate the classic IQ scores) were generated for a set of 1000 records. As expected, the scales
were uncorrelated (one correlation pair, by chance, reached a significance level of p =.07). A
factor analysis of these correlations produced a set of ten factors each explaining between 8.5%
and 11.7% of the variance. I then added a group of twenty records to the data set who were at the
bottom of each of the ten scales (by adding twenty people with mean of 60 and sd 3). This
picture
changed dramatically: all ten scales were modestly correlated (between .10 and .19); the factor
analysis produced a single factor with modest factor loadings (between .40 and .50) with a
modest amount of variance explained (21.9%). Considering the original orthogonal construction
of the data set, this is a surprising finding and one that encouraged me to the subsequent analysis
using real ability data.
back
3
Except at the very highest levels of g where persons will have high
scores on all the subscales due to the mathematical construction of g as the sum of the
subscale scores.
back
4
This is a 25 item knowledge test of physical and biological sciences. It seems
to tap both verbal ability (Larson & Wolfe, 1994) and technical knowledge
(Ree & Carretta, 1994).
back
5
This is a 30 item arithmetic
word problem test. It taps one's numerical abilities.
back
6
This is a
35 item vocabulary testing synonyms or words embedded in questions. This test taps the
individual's verbal abilities.
back
7
This is a 35 item vocabulary testing synonyms or words embedded
in questions. This also taps the individual's verbal abilities.
back
8
This is a 50 item speeded addition, subtraction, multiplication, and division test using one or
two digit numbers. It taps the speed of accurate performance. Numerical Operations and Coding
speed are the only two timed tests in the battery.
back
9
This is an 84-item
speeded test requiring the recognition of number strings arbitrarily associated with words in a
table. Numerical Operations and Coding speed are the only two timed tests in the battery.
back
10
This is a 25 item knowledge test of automobiles, shop
practices, tools, and tool use. This taps technical ability in this area.
back
11
This is a 25 item test of algebra, geometry, fractions, decimals, and exponents. This
taps mathematical ability.
back
12
This is a 25-item test of mechanical and physical principles. It
taps into one's acquired knowledge in this area .
back
13
This is a 20
item test about electronics, radio, and electrical principles. It taps the person's acquired
knowledge in this area.
back
14
One reviewer asked the very pertinent question: would you like your intelligence
assessed on your knowledge of electronics. The answer of course is "No" unless no other
information is available, in which case I might reluctantly acquiesce on the basis that that was a
better indicator than random assignment of an IQ score. Fortunately, the argument here will not
rest solely on the use of Electronics information as the only indicator of g. Other
analyses will be based upon broader tests. This particular analysis has the advantage of large
numbers.Future research should undertake this analysis in which a subset of highly loading items
from each test can be used to construct an instrument to be used for categorizing the subjects
into
different levels of ability. The analysis can then be safely undertaken on the remaining items.
back
15
I found out late in the
preparation of this paper that Hunt and his colleagues have also reanalysed the ASVAB, but no
data are reported.
back