The limits of mental chronometry: g has not declined 15 points since Victorian era
Abstract: Using high speed photography with industrial machine vision cameras, Pochari Technologies has acquired ultra-high fidelity data on simple visual reaction time, which appears the first study of its kind. The vast preponderance of reaction time studies make use of computer software based digital measurements systems that are fraught with response lag. For illustration, Inquisit 6, a Windows PC software, is frequently used in psychological assessment settings. We used Inquisit 6 and performed 10 sample runs, with a running average of 242 ms using a standard keyboard and 232 ms with a laptop keyboard. The computer used is an HP laptop with 64 GB of DDR4 ram and a 4.0 GHz Intel processor. Using the machine vision camera, a mean speed of 151 milliseconds was achieved with a standard deviation of 16 ms. Depending on when one decides to begin the cutoff from finger movement and screen refresh, there is a standard interpretation lability of around 10 ms. Based on this high fidelity photographic analysis, our data leads to the conclusion that a latency of around 90 ms is built in with digital computer-based reaction time measurement. Each individual frame was calculated using Virtualdub 1.10.4 frame analysis software which allows the user to manipulate high frame rate video footage. This data would indicate modern reaction times showing 240-250 milliseconds (Deary etc) cannot be compared to Galton’s original measurement of around 185 ms. Although Galton’s device was no doubt far more accurate than today’s digital systems, it still possessed some intrinsic latency, we estimate Galton’s device had around 30 ms of latency based on this analysis assuming 240 as the modern mean. Dodonova et al constructed a pendulum-like chronometer very similar to Galton’s original device, they received a reaction time of 172 ms with this device.
After adjusting for latency, we come to the conclusion there has been minimal change in reaction time since 1889. We plan on using a higher speed camera to further reduce measurement error in a follow up study, although it is not necessary to attain such high degrees of precision since a total latency of +-3 milliseconds out of 150 represents a minuscule 2% standard error, there is much more room for error in defining the starting and ending point.
An interesting side note note: There is some data pointing to ultra-fast reaction time in athletes that seems to exceed the speed of normal simple reaction to visual stimuli under non-stressful conditions:
”Studies have measured people blinking as early as 30-40 ms after a loud acoustic stimulus, and the jaw can react even faster. The legs take longer to react, as they’re farther away from the brain and may have a longer electromechanical delay due to their larger size. A sprinter (male) had an average leg reaction time of 73 ms (fastest was 58 ms), and an average arm reaction time of 51 ms (fastest was 40 ms)”.
The device used in the study is a Shenzhen Kayeton Technology Co KYT-U400-CSM high speed USB 3.0 330fps @ 640 x 360 MJPEG camera. A single frame increment represents an elapsed time of 3 milliseconds. Pochari Technologies has purchased a Mars 640-815UM at 815 frames per second manufactured by Hangzhou Contrastech Co., Ltd, the purpose of the 815 fps camera is to further reduce latency down to 1.2 milliseconds. In the second study using a different participant we will use the 815 fps device.
To measure finger movement, we used a small metal lever. The camera is fast enough to detect the color transition of the LED monitor, note the color changing from red to green. We set point zero as the point where the color shift is around 50% through. It moves from top down. The participant is instructed to hold his/her finger as steady as as possible during the waiting period, there is effectively zero detectable movement until the muscle contraction takes place upon nerve signal arrival, which takes place at around 100 m/s, at a distance of 1.6 m (16 ms time) from the brain to the hand.
Mars 640-815UM 3.0 USB machine vision camera
Shenzhen Kayeton Technology Co KYT-U400-CSM high speed USB camera.
Introduction and motivation of the study
In 2005, Bruce Charlton came up with a novel idea for psychometric research: attempt to find historical reaction time data to estimate intelligence in past generations. In 2008 he wrote an email to Ian Deary proposing this new method to perform a diachronic analysis of intelligence. Ian Deary unfortunately did not have any information to provide Charlton with, so the project was put into abeyance until 2011 when Michael Woodley discovered Irwin Silverman’s 2010 paper which had rediscovered Galton’s reaction time collection. The sheer obscurity of Galton’s original study is evident considering the leading reaction time expert, that is Ian Deary, was not even aware of it. The original paper covering Galton’s study was from Johnson et al 1985. The subsequent paper: “were the Victorians clever than us” generated much publicity. One of the lead authors of the paper, Jan te Nijenhuis gave an interview with a Huffington post journalist on Youtube discussing this theory, it was also featured in the Dailymail. The notoriously dyspeptic Greg Cochran threw the gauntlet down on Charlton’s claim in his blog, arguing according to the breeder’s equation that such a decline is impossible. Many HBD bloggers, including HBD chick were initially very skeptical, prominent blogger Scott Alexander Siskind also gave a rebuttal mainly along the lines of sample representation and measurement veracity, the two main arguments made here.
Galton’s original sample has been criticized for not being representative of the population at the time as it mainly consisted of students and professionals visiting a science museum in London where the testing took place. At the time in 1889, most of the Victorian population was comprised of laborers and servants, who would have likely not attended this museum to begin with. Notwithstanding the lack of population representation, the sample was large, over 17,000 total measurements were taken at the South Kensington Museum from 1887 to 1893. Since Galton died in 1911 and never published his reaction time findings, we are reliant on subsequent reanalysis of the data, this is precisely where error may have accrued as Galton may have had personal insight into the workings of the measurement and data aggregation system he used which has not been completely documented. The data used by Silverman was provided by reanalysis of Galton’s original findings published by Koga and Morant (1923), and later more data was uncovered by Johnson 1985. Galton used a mechanical pendulum chronometer which is renowned for its accuracy and minimal latency. Measurement error is not where criticism is due, Galton’s tool was likely more accurate than modern methods on computer testing. Modern computers are thought to possess around 35-40 ms not including any software or internet latencies.
The issue with inferring g decline from Galton-to the present RT data is threefold:
The population is very unlikely to have been completely representative of the British population as aforementioned. It consisted of disproportionate numbers of high g individuals since at the time people who participate in events like this would have drawn overwhelmingly from a higher class strata. Society was far more class segregate and average and low g groups would have not participated in intellectual activities including visiting museums to pay to have their reaction time tested!
Scott Alexander comments: “This site tells me that about 3% of Victorians were “professionals” of one sort or another. But about 16% of Galton’s non-student visitors identified as that group. These students themselves (Galton calls them “students and scholars”, I don’t know what the distinction is) made up 44% of the sample – because the data was limited to those 16+, I believe these were mostly college students – aka once again the top few percent of society. Unskilled laborers, who made up 75% of Victorian society, made up less than four percent of Galton’s sample”
The second issue is measurement latency, when adjusting Galton’s original estimate, and correcting modern samples for digital latency, the loss in reaction collapses from the originally claimed 70 ms (14 IQ points) to a mere 20 milliseconds. Another factor mentioned by Dordonova et al is the process of “outlier cleaning”, where samples below 200 ms and above 750 ms are eliminated, this can have a strong effect on the mean, theoretically in any direction, although it appears that outlier cleaning increases the RT mean since slow outliers are rarer than fast outliers.
The third issue is that reaction time studies only 50-60 years after (1940s and 50s) show reaction times equal to modern samples, which indicates the declines must have taken place in a short timeframe of only 50-60 years. A large study from Forbes 1945 shows 286 ms for males in the UK. A study from Michael Persinger’s book on ELF waves shows a study from 1953 in Germany.
“On the occasion of the German 1953 Traffic Exhibition in Munich, the reaction times of visitors were measured on the exhibition grounds on a continuous basis. The reaction time measurements of the visitors to the exhibition consisted of the time span taken by each subject to release a key upon the presentation of a light stimulus”.
In the 1953 Germany study, they were comparing the reaction of people exposed to different levels of electromagnetic radiation. The mean appeared to be in the 240-260 ms range.
Lastly, it could have been the case that Galton instead chose the fasted of three samples, not the mean of the sum of the samples.
Dordonova et all says “It is also noteworthy that Cattell, in his seminal 1890 paper on measurement, on which Galton commented and that Cattell hoped would “meet his (Galton’s) approval” (p. 373), also stated: In measuring the reaction-time, I suggest that three valid reactions be taken, and the minimum recorded” (p. 376). The latter point in Cattell’s description is the most important one. In fact, what we know almost for sure is that it is very unlikely that Galton computed mean RT on these three trials (For example, Pearson (1914) claimed that Galton never used the mean in any of his analyses. The most plausible conclusion in the case of RT measurement is that Galton followed the same strategy as suggested by Cattell and recorded the best attempt, which would be well in line with other test procedures employed in Galton’s laboratory.
Woods 2015 et al confirms this statement: “based on Galton’s notebooks, Dordonova and Dordonov (2013) argued that Galton recorded the shortest-latency SRT obtained out of three independent trials per subject. Assuming a trial-to-trial SRT variance of 50 ms (see Table 1), Galton’s reported single-trial SRT latencies would be 35–43 ms below the mean SRT latencies predicted for the same subjects; i.e., the mean SRT latencies observed in Experiment 1 would be slightly less than the mean SRT latencies predicted for Galton’s subjects”
A website called humanbenchmark.com run by Ben D Wiklund has gathered 81 million clicks. Such a large sample size eliminates almost all sampling bias. The only issue would be population differences, it’s not known what percent are from Western nations. Assuming most are in Western nations, it’s safe to say this massive collection is far more accurate than a small sample performed by a psychologist. In order for this test to be compared to Galton’s original sample, since the test is online, both internet latency and hardware latency have to be accounted for. Internet latency depends on the distance between the user and the server, so an average is impossible to estimate. Humanbenchmark is hosted in North Bergen, US, so if half the users are outside the U.S, the distance should average at around 3000 km.
“The mean SRT latencies of 231 ms obtained in the current study were substantially shorter than those reported in most previous computerized SRT studies (Table 1). When corrected for the hardware delays associated with the video display and mouse response (17.8 ms), “true” SRTs in Experiment 1 ranged from 200 ms in the youngest subject group to 222 ms in the oldest, i.e., 15–30 ms above the SRT latencies reported by Galton for subjects of similar age (Johnson et al., 1985). However, based on Galton’s notebooks, Dordonova and Dordonov (2013) argued that Galton recorded the shortest-latency SRT obtained out of three independent trials per subject. Assuming a trial-to-trial SRT variance of 50 ms (see Table 1), Galton’s reported single-trial SRT latencies would be 35–43 ms below the mean SRT latencies predicted for the same subjects; i.e., the mean SRT latencies observed in Experiment 1 would be slightly less than the mean SRT latencies predicted for Galton’s subjects. Therefore, in contrast to the suggestions of Woodley et al. (2013), we found no evidence of slowed processing speed in contemporary populations.
They go on to say: “When measured with high-precision computer hardware and software, SRTs were obtained with short latencies (ca. 235 ms) that were similar across two large subject populations. When corrected for hardware and software delays, SRT latencies in young subjects were similar to those estimated from Galton’s historical studies, and provided no evidence of slowed processing speed in modern populations.”.
What the authors are saying is that correcting for device lag, there’s no appreciable difference in simple RT between Galton’s sample and modern ones. Dordonova and Dordonov claimed that Galton did not use means in computing his samples. Dordonova et al constructed a pendulum similar to Galton’s to ascertain its accuracy, they concluded it would have been a highly accurate device devoid of the latencies that plague modern digital systems. “What is obvious from this illustration is that RTs obtained by the computer are by a few tens of milliseconds longer than those obtained by the pendulum-based apparatus”.
They go on to say: “it is very unlikely that Galton’s apparatus suffered from a problem of such a delay. Galton’s system was entirely mechanical in nature, which means that arranging a simple system of levers could help to make a response key very short in its descent distance”.
There are two interpretations available to us. The first is that no decline whatsoever took place. If reaction time is to be used as a sole proxy for g, then it appears according to Dodonova and Woods, who provide a compelling argument, which I confirmed using data from mass online testing, that no statistically significant increase in RT has transpired.
Considering the extensive literature that shows negative fertility patterns on g, it seems implausible that some decline has not occurred. It appears that this decline is rather so subtle as to not be picked up by RT, the “signal is weak” in an environment of high noise. In an interview with intelligence blogger “Pumpkin person”, Davide Piffer argues that based on his extensive computation of polygenic data, g has fallen 3 points per century:
“I computed the decline based on the paper by Abdellaoui on British [Education Attainment] PGS and social stratification and it’s about 0.3 points per decade, so about 3 points over a century.
It’s not necessarily the case that IQ PGS declined more than the EA PGS..if anything, the latter was declining more because dysgenics on IQ is mainly via education so I think 3 points per century is a solid estimate”
Since Galton’s 1889 study, Western populations may have lost 3.9 points. What’s fascinating about this number is how close it is the IQ of East Asians, who average 104-105. East Asia industrialized only very recently, with China only having industrialized in the 1980s, the window for dysgenics to operate has been very narrow. Japan has been industrialized for longer, at the turn of the century, so selection pressures would likely assuaged earlier, which presents a Paradox since Japan’s IQ appears very close if not higher than China and South Korea. Of course this is only rough inference, these populations are somewhat genetically different, albeit minor differences, but still somewhat different as far as psychometric differences are concerned. Southern China has greater Australasian/Malay admixture which reduces its average compared to Northern China. For all intents and purposes, East Asia (Mongoloid) IQ has remained remarkably steady at 105, indicating an “apogee” of g in pre-industrial populations. Using indirect markers of g, Mongoloids have larger brains, slow life history speeds, and fast visual processing speeds than whites, corresponding to an ecology of harsh climate (colder winter temperatures than Europe, Nyborg 2003). If any population reached a climax of intelligence, it would have likely been North East Asians. Did Europe feature unique selective pressures?
Unlikely, if one uses a model of “Clarkian selection” of downward mobility, Unz documented a similar process in NEA. Additionally, plagues, climatic disruptions, and mini ice ages afflicted equally if not in greater frequency in NEA than in Europe. It’s plausible to argue group selection in NEA would have been markedly weaker since inter-group conflict was less frequent. China has historically been geographically unified, with major wars between groups being rare compared to Europe’s geographic disunity and practically constant inter-group conflict. NEA also includes Japan, which shows all the markers of strong group selection, that is high ethnocentrism, conformity, in-group loyalty and sacrifice, and a very strong honor culture. If genius is a product of strong group selection as warring tribes are strongly rewarded by genius contributions in weaponry etc that one would expect genius to be strongly tied to selection, which appears not the case. Europeans show lower ethnocentrism and group selection than North East Asians on almost all metrics according to Dutton’s research which refuted some of Rushton contradictory findings. A usual argument in the HBD community, and mainly espoused by Dutton, is that the ultra-harsh ecology of NEA featuring frigidly cold winters pushes the population into a regime of stabilizing selection (selection that reduces variance), this would result in low frequencies of outlier individuals. No genetic or trait analysis has been performed to compare the degree of variance in key traits such as g, personality, or brain size. What is needed is a global study of the coefficients of additive genetic variation (CVA) to ascertain the degree of historical stabilizing vs disruptive selection. Genius has been argued to be under negative frequency depended selection, where essentially the trait is only fitness salient if it remains rare, there is little reason to believe genius falls under this category. High cognitive ability would be universally under selection, and outlier abilities would simply follow that weak directional selection. Insofar Dutton is correct that genius may come with a fitness reducing baggage, such as bizarre or deviant personality and or general anti-social tendencies. This has been argued repeatedly but has never been convulsively demonstrated. The last remaining theory is the androgen mediated genius hypothesis. If one correlated per capita Nobel prizes with rate of left-handedness as a proxy for testosterone, or national differences in testosterone directly (I don’t believe Dutton did that), then when analyzing only countries with a minimum IQ of 90, testosterone correlates more strongly than IQ since the extremely low per capita Nobel prize rates in NEA cause the correlation to collapse.
In summary, basic logic points to some decline, but more modest, perhaps at best 5 points since 1850.
To be generous to the possibility Victorian g was markedly higher, we run a basic analysis to estimate the current historical frequency of outlier levels of g assuming Victorian g of 112.
We use the example of the British Isles for this simple experiment. In 1700, the population of England and Wales was 5,200,000. Two decades into this century, the population increased to 42,000,000, this is excluding immigrants and non-English natives. Charlton and Woodley infer a loss of 1 SD from 1850 onward, we use a more conservative estimate of 0.8 SD + as the mean as the pre-industrial peak.
This would mean 1700 England would have produced 163,000 individuals with cognitive abilities of 140 from a mean of 112 and an SD of 15. In today’s population, we assume the variance increased slightly due to increasing genetic diversity and stronger assortative mating, we use a slightly higher variance, SD 15.5, with a mean of 100. From today’s population of white British standing at 42,000,000, there are 205,000 individuals with an SD 2.6 times above the current Greenwich mean. If we assume there has been no increase in the variance, which is unlikely considering the increase in genetic diversity due to an expanding population providing room for more mutation, then the number is 168,000.
Three themes can be inferred from this very crude estimate.
The total number of individuals with extremely high cognitive ability may very well have fallen as a percentage, but the total number has remained remarkably steady when accounting for the substantial increase in population.
Secondly, this would indicate high g in today’s context may mean something very different from high g in a pre-industrial setting.
Thirdly, the global population of high g individuals is extraordinary, strongly indicating the pre-industrial population possessed traits not measurable by g alone which accounted for their prodigious creative abilities, and this was likely confined to European populations, but there is no reason to believe this enigmatic unnamed trait was normally distributed and thus followed a similar pattern to standard g, thus today’s population would necessarily produce fewer as a ratio, but at an aggregate level, the total number would remain steady. With massive populations in Asia, primarily India and China, a rough estimate based on Lynn’s IQ estimates give around 13,500,000 individuals in China with an IQ of 140 based on a mean of 105 and an SD of 15. There’s no evidence East Asian SDs are smaller than Europeans as claimed by many in the informal HBD community. While China excels in fields like telecommunication, artificial intelligence, and advanced manufacturing (high speed rail etc), there has been little in the way of major breakthrough innovations on par with pre-Modern European genius, especially in theoretical science, despite massive numerical advantage, 85x more than in 1700 England. Genius is thus a specialized ability not captured by g tests. It seams genius is enabled by g, that is in some form of synergistic epistasis, where genius is “activated” by a certain threshold of g in the presence of one or more unrelated and unknown cognitive traits, often claimed to be a cluster of unique personality traits, although this model has yet to be proven. India with a mean of 76 from Dave Becker’s dataset, assuming a standard SD (India’s ethnic and caste diversity would strongly favor a larger SD), but for the sake of this estimate, we use an SD of 16. We are left with 41,000 individuals in India with this cutoff, this number does not reconcile with the number of high-ability individuals that India is producing, so we assume either the mean of 76 is way too low, or the SD must be far higher. Even with just 40,000, non of these individuals are displaying any extraordinary abilities closely comparable to genius in pre-Modern Europe, indicating that either there are deep racial differences in creative potential, or that g alone must be failing to capture these abilities. Indian populations are classified as closer to Caucasoid according to genetic ancestry modeling, which allows us to speculate as to whether they are closer to Caucasoid in personality traits, novelty-seeking, risk-taking, androgen profiles, and assorted other traits that contribute to genius. Dutton and Kura 2016.
Despite Europe’s prodigious achievements in technology and science which have remained totally unsurpassed by comparably intelligent civilizations, ancient China did muster some remarkable achievements. Lynn says: “One of the most perplexing problems for our theory is why the peoples of East Asia with their high IQs lagged behind the European peoples in economic growth and development until the second half of the twentieth century. Until more parsimonious models on the origin of creativity and genius abilities are developed, rough historiometric analysis using RT as the sole proxy may be of limited use. Figueredo and Woodley developed a diachronic lexicographic model using high order woods as another proxy for g. The one issue with this model is that this may be simply measuring a natural process of language simplification over time, which may reflect an increasing emphasis on the speed of information delivery rather than pure accuracy. It is logical to assume in a modern setting where information density and speed of dissemination are extremely important, a smaller number of simpler words are more frequently used (Zipf’s law). Additionally, the fact that far fewer individuals, likely only those of the highest status, were engaging in writing in pre-modern times, should not be overlooked. Most of the population would not have had access to the leisure time to engage in writing, whereas in modern times the nature of written text reflects the palatability of a more simplistic writing style to cater to the masses.
Forbes, G, 1945. The effect of certain variables on visual and auditory reaction times. Journal of Experimental Psychology.
Woods et al (2015). Factors influencing the latency of simple reaction time. Front. Hum. Neurosci
Dodonova etal 2013. Is there any evidence of historical slowing of reaction time? No, unless we compare apples and oranges. Intelligence
Woodley and te Nijenhuis 2013. Were the Victorians cleverer than us? The decline in general intelligence estimated from a meta-analysis of the slowing of simple reaction time. Intelligence
155 157 154 191 157 164 151 173 158 134 179 152 172 176 163 139 155 182 166 169 179 155 152 169 205 170 149 143 170 142 143 149 174 130 149 139 142 170 127 131 152 127 136 124 125 157 149 127 124 139 158 149 130 149 136 155 143 145 185 152 105 152 130 139 139 140 130 152 166 158 134 142 128 140 155 127 131 139 145 146 139 127 152 145 142 140 143 112 182 185 133 133 130 145 154 158 152 161 152 173 134 145 133 139 148 152 173 158 176 151 181 155 176 149 157 163 167 143 160 145 200 182 140 155 154 148 140 173 173 152 142 143 127 136 164 139 133 145 146 142 149 140 142 124 151 182 166 133 170 152 164 181 121 170 185 164 133 133 149 146 149 119 188 154 150 146 143 151 173 152 160 157 167 148 145 140 155 182 139 166 163 152 170 169 149 136 155 167 154 179 148 155 124 170 134 155 151 181 146 130 173 194 140 131 149 172 182 149 161 155 151 167 157 151 143 142 169 163 136 157 164 133 131 173 133 151 133 143 160 139 157 164 130 131 173 133 151 133 143 152 149 157 142 139 164 136 142 158 145 155 130 166 136 148 133 161 134 145 151 173 146 142 152 166 158 151 173 148 161 172 143 130 148 155 163 142 176 164 173 166 160 142 133 124 152 137 170 142 133 118 152 145 124 151 130 137 157 157 164 155 149 136 137 131 161 142 143 148 115 161 148 167 151 130 139 154 142 149 143
All reaction times recorded
Standard Deviation =16.266789
Mean = 150.81505
Sum of Squares SS = 84410.088