
ChatGPT is the first non-human subject I have ever tested.
In my work as a clinical psychologist, I use standardized intelligence tests to assess the cognitive abilities of human patients. So after reading a number of recent articles describing ChatGPT as possessing impressive human-like skills, I was immediately intrigued. Academic writes her essays and fairy tales, tells jokes, explains scientific concepts, writes and debugs computer code. All this made me want to see how smart ChatGPT is by human standards, so I started testing chatbots.
My first impression was quite positive. ChatGPT had a commendable test-taking attitude and was almost an ideal candidate. It does not indicate test anxiety, poor concentration, or lack of effort. Nor did he express uninvited skeptical comments about intelligence tests or testers like me.
Exact questions from the test were copied and presented to a computer chatbot without any verbal explanations required for the test protocol and without any preparation. The test in question is the Wechsler Adult Intelligent Scale (WAIS), the most commonly used IQ test. I used the 3rd edition of WAIS. It consists of 6 verbal and 5 nonverbal subtests that make up the Verbal IQ and Performance IQ components respectively. A global full-scale IQ measure is based on scores on all 11 subtests. The mean IQ was set to 100 points, with a standard deviation of 15 points on the test scale. That is, the smartest 10 percent and 1 percent of the population have IQs of 120 and 133, respectively.
We were able to test ChatGPT because we were able to present five subtests of the Linguistic IQ scale in writing: Vocabulary, Similarity, Comprehension, Information, and Arithmetic. The sixth subtest of the Verbal IQ scale is the Digit Span, which measures short-term memory and is not manageable by chatbots because it lacks the associated neural circuits that easily store information such as names and numbers.
We started the testing process with a vocabulary subtest, as we expected it to be easy for a chatbot that has been trained on vast amounts of online text. This subtest measures knowledge of words and formation of verbal concepts, and typical instructions are: “Can you tell me what you mean by ‘gadget’?”
ChatGPT acknowledged that, often providing very detailed and comprehensive range of answers, exceeding the stated standard of correct answers. in the test manual.1 point is given for scoring kind of like my phone A gadget definition and two more points: A small device or tool for a specific taskChatGPT answered 2 out of 2.
The chatbot also performed well on the Similarity and Information subtests, reaching the maximum achievable score. The information subtest is a test of general knowledge and reflects intellectual curiosity, level of education, and ability to learn and remember facts. A typical question is “What is the capital of Ukraine?” The similarity subtest measures abstract reasoning and concept formation skills. You may be asked, “How are Harry Potter and Bugs Bunny similar?” In this subtest, the chatbot’s tendency to give overly detailed and flashy answers started to irritate me, and the “stop generating responses” button in the test software interface proved helpful. (Here’s what I have to say about the way bots flaunt themselves. The essential similarities between Harry Potter and Bugs Bunny relate to the fact that they’re both fictional characters. There was never a need for ChatGPT to compare the complete history of adventures and friends and foes.)
The common understanding is that ChatGPT correctly answered questions that are usually posed in this format. As expected, the chatbot solved all the math problems we received. For example, we repeated a question that required averaging three numbers.
So how many points did they end up scoring overall? Based on five subtest estimates, ChatGPT has a Verbal IQ of 155, compared to 99.9% of test takers who make up the 2,450 American WAIS III standard sample. better than %. Chatbots do not have the necessary eyes, ears, and hands to take the WAIS non-verbal subtest. However, the Verbal IQ and Full Scale IQ scales are highly correlated in standardized samples, making ChatGPT appear highly intelligent by any human standard.
In the WAIS standardized sample, college-educated Americans had an average verbal IQ of 113, with 5% having a score of 132 or higher. I myself was tested by a college classmate, and it fell far short of ChatGPT’s (mostly because my very brief answers lacked detail).
So are the jobs of clinical psychologists and other professionals threatened by AI? Despite their high IQs, ChatGPTs are known to fail tasks that require real-human-like reasoning and understanding of the physical and social world. ChatGPT easily fails obvious riddles like “What is the name of the father of Sebastian’s children?” (ChatGPT Mar 21: Sorry, I can’t answer this question because I don’t have enough context to identify the Sebastian you’re referring to.) ChatGPT seems unable to reason logically and tries to rely on a vast database of facts about “Sebastian” mentioned in online texts.
“Intelligence is what intelligence tests measure” is a classic, albeit overly self-evident, definition of intelligence that originated in a 1923 article by cognitive psychology pioneer Edwin Bolling. is. This definition is based on the observation that skills in seemingly diverse tasks such as solving puzzles, defining words, remembering numbers, and finding missing items in pictures are highly correlated. In 1904, Charles Spearman, the developer of a statistical method called factor analysis, stated that the general factors of intelligence are g Factors must underlie the agreement of various human cognitive skill measures. IQ tests such as WAIS are based on this hypothesis. However, ChatGPT’s very high verbal IQ, coupled with its amusing failures, suggests that Boring’s definition is flawed and that there are aspects of intelligence that cannot be measured by IQ tests alone. Perhaps my test-skeptical patient was right all along.
This is an opinion and analysis article and the views expressed by the author or authors are not necessarily Scientific American.