On the Asymmetry of g


On the asymmetry of g

Martin G. Evans
Faculty of Management
University of Toronto
Toronto Ontario Canada





Abstract

In this paper I explore the strength of general intelligence at different levels of g. By undertaking a confirmatory factor analysis of a standard ability measure based on a large sample (the ASVAB) at high and low levels of g, I show that at lower levels of g the factor structure is quite similar to, though not as well defined as, the factor structure for the complete sample; however, at higher levels of g, the factor structure shows some difference. This implies that clever people are clever in quite different ways, while those with low g have intellectual deficits across the board. The dominance of g in the factor structure is due to this similarity of scores at the bottom end of the set of abilities. The prevalence of specific abilities is due to the variety of ways in which people can demonstrate high ability. 1
Over the years there has been a great deal of controversy over the existence of g. On the one hand a number of scholars (Gottfredson, 1986; Jensen, 1986; Terman, 1916) have argued that a single (and inherited) ability underlies the varied competencies demonstrated by individuals in their everyday lives. In psychometric terms this means that a single general factor underlies the many specific abilities observed in everyday behavior. This position has most recently been argued in the more popular literature in The Bell Curve (Herrnstein & Murray, 1994). On the other hand an equally numerous group (Gardner, 1983; Horn, 1985; Sternberg, 1988; Thurstone, 1938) have argued for a view that there are a set of different abilities with only a modest underlying single factor structure. Steven Jay Gould's The Mismeasure of Man (1981) is the most recent popularization of this position.

The arguments between the proponents of both positions have raged on for over 50 years with an inconclusive outcome. The recent publication of The Bell Curve (Herrnstein & Murray, 1994) has exacerbated this discussion. Most of the furore about that book has been around the issue of the degree of heritability of g, and, more acrimoniously, whether heritable within group differences in g can be generalized to inter-group differences. The aim of this paper is more restricted: to explore whether, in an IQ measure with 10 highly g-loaded components (Herrnstein & Murray, 1994) there are differences in the factor structure at different levels of g.

Gardner (Gardner, 1983) has spoken most eloquently for the multidimensional position. He states:

there is persuasive evidence for the existence of several relatively autonomous human intelligence competences, abbreviated hereafter as "human intelligences." ... The exact nature and breadth of each "frame" [competence or intelligence] has not so far been satisfactorily established, nor has the precise number of intelligences been fixed. But the conviction that there exist at least some intelligences, that these are relatively independent of one another, and that they can be fashioned and combined in a multiplicity of adaptive ways by individuals and cultures, seems to me to be increasingly difficult to deny. (p. 8-9, italics in the original).



On the other hand, Jensen (1986) puts the case for g most strongly. He argues (p. 318-319) from a measurement perspective that even if there is a multifaceted set of abilities, in order to fit measures of these abilities to a simple structure (Thurstone's criterion), a non-orthogonal factor structure solution (oblique solution) must be generated; as a consequence, these correlated factors can be subjected to a second-order factor analysis with the consequent emergence of g. Thus g results from a strong correlation between the scores on a series of ability tests. I have, of course, stated the very extreme positions. Most scholars of intelligence take an intermediate position similar to that of Vernon (1969) who argues that both g and specific intelligences (usually two or three -- mathematical, verbal, mechanical) are necessary for understanding the competencies that people bring to their daily lives.

The point made by Jensen, Herrnstein and Murray, and others is that people who are high in ability can do many things well. It occurred to me, after re-reading Davis' (1971) work on the generation of "interesting" hypotheses, that perhaps these scholars were focusing their attention at the wrong end of the distribution: perhaps the emergence of g was due to the likelihood of the less intellectually endowed people having low endowments in all the abilities, whereas those with higher g might be well endowed with two or three abilities, and less well endowed in others. 2 Were this to be the case, then we would expect the average intercorrelations between the tests and the factor structure of the tests to vary as we examined that structure at different levels of g. At low levels of g we would expect to find a factor structure very similar to that resulting from an analysis of the whole sample (with a full range of g). This is because we expect to find that people in the lower range of g to have similar scores on all the specific abilities. On the other hand, at moderately high and high levels of g, we expect there to be quite a different factor structure as people at the high end will be quite high on some subscales but may be low or average on others. 3 So a relatively high g can be obtained in a variety of different ways. By exploring intelligence tests (such as the ASVAB) for groups of people with different levels of g, we can test this hypothesis.

It turns out that this idea is not new. In 1927, Spearman (1927) talked about a law of diminishing effect that demonstrated a weakening of the intercorrelations among scales at a high level of g (see also (Deary & Pagliari, 1991). (Detterman & Daniel, 1989) have demonstrated this with respect to modern tests of intelligence, the WAIS-R and the WISC-R. They find correlations between subtests of about .30 for the high ability subjects and of about 0.60 for low ability subjects (see (Nesselroade & Thompson, 1995) for a contrary finding).

This relatively simple idea has the potential, if supported in this study and replicated on other data sets, of shedding light on the controversy between the g theorists and their opponents who espouse a view of multiple intelligences. Furthermore (Detterman & Daniel, 1989) argue that understanding these differences may ultimately contribute to our understanding of the broader social issues implicated in the debates over the nature of intelligence.

Method


I undertook a reanalysis of the 1981 ability data from the National Longitudinal Survey of Youth (1981). The ability data consisted of the ten subscales from the Armed Services Vocational Aptitude Battery (ASVAB). This consists of ten subscales measuring General Science (GenSci) 4, Arithmetic Reasoning (ArithR) 5, Word Knowledge (WordKnow) 6, Paragraph Comprehension (ParaComp) 7, Numerical Operations (NumOps) 8, Coding Speed (CodeSp) 9, Automobile and Shop Information (Auto) 10, Mathematics Knowledge (MathK) 11, MechanicalComprehension (MechComp) 12, Electronics Information (Elec) 13. A subset of these tests Arithmetic Reasoning (ArithR), Mathematics Knowledge (MathK), Paragraph Comprehension (ParaComp), and Word Knowledge (WordKnow) makes up the Armed Forces Qualifying Test (AFQT), a test whose components are known to have high correlations with g (between 0.8 and 0.9).

In the analysis to follow, I had intended to use both a broad scale based on the whole ASVAB and the narrower scale based on AFQT as two indicators of general ability: gASVAB and gAFQT; the narrower measure being made up of tests with higher g-loadings so would have provided a more rigorous test of my hypothesis. However, (Detterman & Daniel, 1989) have pointed out that using a composite of the tests to be factored as a classification variable creates an artifact of negative correlations between the variables -- because, at any range, a higher score on the test has to be balanced by a lower score on another test in order to keep the total score within the range. This is indeed the case, so an alternative strategy had to be implemented. What is needed is an independent measure of g based upon a large number of individuals to provide a means of classifying these individuals. I employed three strategies to accomplish this end. One maximizes the size of the sample by using a subscale of the ASVAB; the other two maximize the independence of the indicator of g by using some alternative measures that were available in this set of data.

Method I: Use of an ASVAB subscale.


With only two scales measuring arithmetic ability and two scales measuring verbal ability, it would not be a sensible strategy to use one of these as indicators of g even though they are highly correlated with g. Using one of these scales would have made it difficult for a verbal factor or a quantitative factor to emerge in the subsequent analyses. There was not another g test available with a large sample of individuals. Accordingly I compromised by choosing the practical scale (Electronics Information) which had the highest loading (0.84) on g in this sample. This loading is as strong as the lower of the numerical and verbal test loadings (0.83 and 0.84 respectively). This would seem to be a reasonable, though not perfect, substitute for g14 The major problem with this measure as a substitute for g is the difference between men and women on this dimension. Men score more highly. Accordingly any analyses performed were undertaken separately for the male and female subsamples.

The correlations (male sample, female sample) were computed for the top 5% and bottom 5% of people based upon the Electronics Information Scale. These correlations were then corrected for restriction of range so that both the high and low group correlations were made equivalent to what those correlations would have been in the full range sample (Detterman & Daniel, 1989). This helps assure that any differences in correlation are not due to differences in the range of values on each test. Average corrected correlations were computed and compared. To test the similarity of factor structures, following Nesselroade & Thompson (1995) the original covariances in the two groups were subjected to a comparative, between-group LISREL analysis that examined whether or not the top and bottom group fit the same model. We will fit the data to a single factor model for two reasons. First, I want to test the proposition that a singe factor underlies the variety of intelligences. Second, psychometrically this restricts the number of parameters that can vary in the LISREL model. With only a single factor, there can be no question of intercorrelations between factors so the only source of variation are the loadings of the indicators on that single factor. This greatly simplifies the analysis and the interpretation of our results.

Method II: Combined data from several g measures.


To accomplish this, I took advantage of the richness of the data in the NLSY. In addition to the ASVAB data, the researchers gathered from the participant's schools data on previously taken Intelligence tests. A wide variety of tests had been completed by different participants in the study: California Test of Mental Maturity, Otis-Lennon Mental Ability Test, Lorge-Thorndike Intelligence Test, Henmon-Nelson Test of Mental Maturity, Kuhlman-Anderson Intelligence Test, Differential Aptitude Test, Coop School and College Ability Test, Stanford-Binet Intelligence Scale, Weschler Intelligence Scale for Children. These tests were taken at different grades and different ages for different individuals and were sometimes reported as raw scores, and sometimes as percentiles. The following procedure was followed. To control for age effects, I classified individuals by the age at which they took the test. For some tests (CTMM, Otis-Lennon, DAT, Lorge-Thorndike) that had large samples I was able to use a fine grained categorization, for others I was only able to generate median splits. I then chose the top 5% and the bottom 5% in each age range and accumulated them to give me a high and low sample of individuals based upon each test. Four of the tests gave reasonable sample sizes (c. 40 people). I restricted further analyses to these. I computed correlations between all the subtests of the ASVAB for the high and low IQ groups based on each of the four IQ tests. These correlations were then corrected for restriction of range. The corrected correlations were then averaged across the four tests (a formal meta analysis showed that some item pairs had high variance in the correlations remaining after correcting for sampling error). The pooled correlation matrices for the high and low IQ groups were then averaged (across the subtests of the ASVAB) and the mean correlations for the high and low IQ groups were compared. The additional analysis of a comparative between-group LISREL analysis was also performed on these data. As there were no gender differences on these indicators of g and sample sizes were small, only data for the whole sample were analyzed.

Method III: The use of a sample based on the Otis-Lennon indicator of g.


The Otis-Lennon Mental Ability test was the test taken by the largest number of individuals. Accordingly, I chose the top and bottom 10% of persons taking this test, computed correlations among the subtests of the ASVAB (after correcting for range restriction), compared the mean correlations between the high and low IQ groups and, after also computing means and standard deviations for each subgroup, I subjected these covariances (based on uncorrected correlations) to a between-group confirmatory factor analysis using LISREL.

Sample


The original sample consisted of 12,686 young persons who were surveyed for the National Longitudinal Survey of Youth. The 1981 administration of the ASVAB missed some 772 subjects so that the subsequent analysis is based upon a sample size of 11,914. I selected for this analysis only those individuals who were 16 years of age or older at the time they completed the ASVAB so as to include a sample of sufficient maturity that g would have stabilized. This resulted in a sample of 8,575. In summary, for the first analysis, I stratified the sample into two large groupings based upon their scores on the Electronics Information scale. I chose the bottom 5% of the male sample and the bottom 5% of the female sample to represent the less intelligent groups; I chose the same proportion at the top of the curve to represent the higher IQ members of the sample. The actual number of subjects used in each group do not conform exactly to this ideal owing to some "lumpiness" on the distribution.

For my second analysis I created two groups based upon all the IQ data available from the respondents' schools: one high on one or more of the tests, the other low on one or more of the tests. A person was selected to be in the low group for the test if he or she was in the lowest tenth percentile on the percentile scores for any of these tests or if he or she was in the lowest 10% of the raw score on one or more of the tests. Similar decision rules were used to identify the high IQ group. Some people (n =14) who scored high on one test and low on another were excluded from the analysis.

A third analysis was performed using the test given most frequently (n=1191) of the survey respondents: the Otis-Lennon Mental Ability Test. Those in the highest and lowest 10% were selected for analysis.

Results


High and Low on Electronics information


In this analysis I separated the sample into two gender based subgroups before intercorrelating the nine remaining ASVAB scales. In the top 5% of the men, the corrected average correlation was 0.54, in the bottom 5% of the men it was 0.75. The differences for females were more striking: for the top group, 0.39; for the bottom group, 0.85. I then took the original covariance matrices to check whether or not the same single factor solution fitted. This was accomplished by comparing two CFA's. In the first, the same structure of the model (single factor, each subscale free to load on the factor) was imposed, but the parameters were allowed to vary. In the second, the parameters of for each subscale were constrained to be the same for both the high and low groups. No significant difference for the fit of the models would support the existence of the same model at both ends of the g range. A significant difference would support the hypothesis advanced here that the structure of g differed at the two ends of the continuum. This what I found: in the male sub sample, chi-square for the constrained model was 453.68 (df 63), for the unconstrained model the chi-square was 327.62 (df 54), therefore the difference in chi-square of 19.64 (df 10) was statistically significant (p<. 001) Similar results were found for the female sample (constrained 410.57, 63 ; unconstrained 290.60, 54; difference 119.97, 9, p< .001).

High and Low groups on a variety of IQ tests.


In this analysis I have the advantage of combining a subset of data based upon people's scores on a number of independent IQ tests. This gives me the ability to use the full ASVAB (all ten items) in the analysis at the cost of having a severely limited sample size. The average corrected correlation for the high g group was 0.59, the correlation for the low g group was 0.71. The difference is not as dramatic as in the previous analysis, but still significant. The chi-square for the constrained model was 343.64 (df 80), for the unconstrained model the chi-square was 324 (df 70), therefore the difference in chi-square of 19.64 (df 10) was statistically significant (p<. 05).

High and Low on the Otis.


In this analysis I use a single IQ test to carry out the classification This again gives me the ability to use the full ASVAB (all ten items) in the analysis and allows me to use subjects drawn from the extremes of a single test ensuring comparability . As the sample size was only 1190, I used the top 10% and bottom 10% of the sample. At the top end of the Otis, the average correlation was 0.35; at the low end, the average correlation was 0.50. The matrices were significantly different. The chi-square for the constrained model was 222.50 (df 80), for the unconstrained model the chi-square was 199.08 (df 70), once again the difference in chi-square of 23.42 (df 10) was statistically significant (p<. 01).

Conclusion and Implications


The evidence reported in this paper suggests that the factor structure changes over the range of g. I have undertaken three analyses with different strengths and weaknesses: one with a good sized sample but a weak indicator of g for classifying the subjects, a second based on a mixture of indicators of g, and the third with a single test indicator of g but with a small sample size. Each analysis came up with the same results: the average correlations and the factor structure differ at the two ends of the continuum. The scales are more highly intercorrelated at the bottom end of the spectrum of g than at the top end.

The weaknesses in this study lies in the nature of the ASVAB. It is a relatively short test and does not discriminate well at the high end of the spectrum. This lack of discrimination has two effects that counterbalance each other. First it gives a number of persons top scores on all tests which would inflate the correlation in the same way as my simulation (see earlier comment); on the other hand, the restricted range attenuates the relationships. My correction for restriction of range leaves the enhancing factor still operating so that the differences between high and low groups found here are likely to be underestimates. It might also be argued that the test will have lower reliability at the high end of the spectrum. Were this to be the case, this would reduce the average correlations between tests at the high end of the spectrum relative to those at the lower end of the spectrum. To equate correlations at the high and low end of the spectrum, the reliabilities of the tests at the high end would have to be 70% of the reliabilities of the same tests at the low end. A difference of this magnitude in such a well validated test seems implausible. Two additional programs of research are required. First to assess the generalizability of this finding. Detterman and Daniel (1989) point to scattered sightings of the phenomenon in the literature. Careful reanalysis of large data sets using a variety of standard tests are required. The second program is more applied: to explore predictive or job related validity. The conventional finding (Hunter, 1986) is that job tailored or job related tests to not predict better than g. We now need to explore whether or not there is differential validity between general and specific tests at high or low levels of g. At its lower range, we would anticipate that g would be a good predictor of job performance. However, at higher levels, it is expected that the individual facets would have better validity.

The implications for the understanding of intelligence are important. The possibility of a breakdown of g at higher levels of intelligence, even with a narrow range of tests (as in the ASVAB) means that we have to reexamine the nature of intelligence. Hunt (1995) has argued that these findings support an information processing view of intelligence. 15 There may be a single driving factor at the low end, but it may manifest in a variety of different ways at the top end. More work on exploring the behavior of g at different levels is required.


Author's Note


Vinay Kanetkar helped with the LISREL analyses; Kai Lamertz generated the correlation matrices based on various IQ tests. I thank them as well as Joel Baum, Howard Gardner, Linda Gottfredson, Hugh Gunz, Lloyd Humphreys, Diane Irvine, Arthur Jensen, Gary Latham, Andy Mitchell, Malcolm Ree, Frank Schmidt, and Jacob Siegel for their helpful comments on earlier drafts of this paper. Kathrine Evans added the html tags for the footnotes in this paper.

References



Davis, M. S. (1971). That's interesting! Philosophy of the Social Sciences}, 1, 309-344.

Deary, I. J., & Pagliari, C. (1991). The strength of g at different levels of ability: Have Detterman and Daniel rediscovered Spearman's "Law of Diminishing Returns"? Intelligence, 15, 247-250.

Detterman, D. K., & Daniel, M. H. (1989). Correlates of mental tests with each other and with cognitive variables are highest for low IQ groups. Intelligence, 13, 349-359.

Gardner, H. (1983). Frames of Mind: The Theory of Multiple Intelligences. New York, NY: Basic Books.

Gottfredson, L. S. (1986). The societal consequences of the g factor in employment. Journal of Vocational Behavior, 29, 379-410.

Herrnstein, R. J., & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in American Life. New York, NY: Free Press.

Horn, J. L. (1985). Remodelling old models of intelligence. In B. Wolman (Ed.), Handbook of Intelligence: Theories, measurements, and applications (pp. 267-300). New York: Wiley.

Hunt, E. (1995). The role of intelligence in modern society. American Scientist, 83, 356-368.

Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior, 29, 340-362.

Jensen, A. R. (1986). g: Artifact or reality. Journal of Vocational Behavior, 29, 301-331.

Larson, G. E., & Wolfe, J. H. (1994). Validity results for g from an expanded test base. Intelligence, 20, 15-25.

Nesselroade, J. R., & Thompson, W. W. (1995). Selection and related threats to group comparisons: An example comparing factorial structures of higher and lower abilities of adult twins. Psychological Bulletin, 117, 271-284.

Ree, M. J., & Carretta, T. R. (1994). Factor analysis of ASVAB: Confirming a Vernon-like structure. Educational and Psychological Measurement, 54, 457-461.

Spearman, C. E. (1927). The Abilities of Man. London, UK: Macmillan.

Sternberg, R. J. (1988). The Triarchic Mind: A New Theory of Human Intelligence. New York, NY: Penguin.

Terman, L. M. (1916). The Measurement of Intelligence. Boston, MA: Houghton-Mifflin.

Thurstone, L. L. (1938). Primary Mental Abilities (Psychometric Monographs I ed.). Chicago, IL: University of Chicago Press.

Vernon, P. E. (1969). Intelligence and Cultural Environment. London, UK: Methuen.


1 One can visualize this in three dimensions by imagining the various abilities as flowers arranged in a narrow vase -- at the bottom they are bound together tightly, at the top they spread out broadly!
back

2 Before undertaking an analysis with real data, the plausibility of this position was examined using simulated data. A set of ten orthogonal variables (with mean 100 and sd 15 --to simulate the classic IQ scores) were generated for a set of 1000 records. As expected, the scales were uncorrelated (one correlation pair, by chance, reached a significance level of p =.07). A factor analysis of these correlations produced a set of ten factors each explaining between 8.5% and 11.7% of the variance. I then added a group of twenty records to the data set who were at the bottom of each of the ten scales (by adding twenty people with mean of 60 and sd 3). This picture changed dramatically: all ten scales were modestly correlated (between .10 and .19); the factor analysis produced a single factor with modest factor loadings (between .40 and .50) with a modest amount of variance explained (21.9%). Considering the original orthogonal construction of the data set, this is a surprising finding and one that encouraged me to the subsequent analysis using real ability data.
back

3 Except at the very highest levels of g where persons will have high scores on all the subscales due to the mathematical construction of g as the sum of the subscale scores.
back

4 This is a 25 item knowledge test of physical and biological sciences. It seems to tap both verbal ability (Larson & Wolfe, 1994) and technical knowledge (Ree & Carretta, 1994).
back

5 This is a 30 item arithmetic word problem test. It taps one's numerical abilities.
back

6 This is a 35 item vocabulary testing synonyms or words embedded in questions. This test taps the individual's verbal abilities.
back

7 This is a 35 item vocabulary testing synonyms or words embedded in questions. This also taps the individual's verbal abilities.
back

8 This is a 50 item speeded addition, subtraction, multiplication, and division test using one or two digit numbers. It taps the speed of accurate performance. Numerical Operations and Coding speed are the only two timed tests in the battery.
back

9 This is an 84-item speeded test requiring the recognition of number strings arbitrarily associated with words in a table. Numerical Operations and Coding speed are the only two timed tests in the battery.
back

10 This is a 25 item knowledge test of automobiles, shop practices, tools, and tool use. This taps technical ability in this area.
back

11 This is a 25 item test of algebra, geometry, fractions, decimals, and exponents. This taps mathematical ability.
back

12 This is a 25-item test of mechanical and physical principles. It taps into one's acquired knowledge in this area .
back

13 This is a 20 item test about electronics, radio, and electrical principles. It taps the person's acquired knowledge in this area.
back

14 One reviewer asked the very pertinent question: would you like your intelligence assessed on your knowledge of electronics. The answer of course is "No" unless no other information is available, in which case I might reluctantly acquiesce on the basis that that was a better indicator than random assignment of an IQ score. Fortunately, the argument here will not rest solely on the use of Electronics information as the only indicator of g. Other analyses will be based upon broader tests. This particular analysis has the advantage of large numbers.Future research should undertake this analysis in which a subset of highly loading items from each test can be used to construct an instrument to be used for categorizing the subjects into different levels of ability. The analysis can then be safely undertaken on the remaining items.
back

15 I found out late in the preparation of this paper that Hunt and his colleagues have also reanalysed the ASVAB, but no data are reported.
back