
Technology research firm OpenAI has released an updated version of its text-generating artificial intelligence program called GPT-4, demonstrating some of the new features of its language model. GPT-4 not only produces more natural text and solves problems more accurately than previous versions. It can process images as well as text. But the AI is still vulnerable to some of the same problems that plagued his previous GPT model. This means showing bias, overstepping guardrails to prevent saying offensive or dangerous things, “hallucinating”, or confidently fabricating falsehoods not seen in the training data. increase. .
On Twitter, OpenAI CEO Sam Altman described the model as the company’s “most capable and collaborative” to date. (By “consistent” I mean designed to follow human ethics.) But “it’s still flawed, it’s limited, and it’s better than it was after using it for the first time.” It’s more impressive,” he said. wrote in a tweetAt the time of publication of this article, a representative of OpenAI could not be reached for further comment.
Perhaps the most important change is that GPT-4 is “multimodal”. That means it works for both text and images. It cannot output images (similar to generative AI models such as DALL-E and Stable Diffusion), but it can process and respond to the visual input it receives. Annette Vie, an associate professor of English at the University of Pittsburgh who studies the intersection of computation and writing, watched a demonstration where the new model was instructed to identify what was funny in a humorous image. To be able to do that is to “understand the context of the image”. It’s about understanding how and why images are constructed, and connecting that to the social understanding of language,” she says. “ChatGPT couldn’t do that.”
A device with the ability to analyze and describe images can be of great value to the blind and visually impaired. For example, her mobile app called Be My Eyes describes objects in a user’s surroundings and helps users with low vision or low vision interpret their surroundings. The app recently incorporated GPT-4 into its “virtual volunteers”. According to a statement on his website for OpenAI, “it can generate the same level of context and understanding as human volunteers.”
But GPT-4’s image analysis goes beyond just explaining pictures. In the same demonstration Vee saw, an OpenAI rep sketched a simple image of his website and sent the drawing to his GPT-4. The model was then asked to write the code necessary to create such a website. “It basically looked like an image. It was very simple, but it worked,” says Jonathan May, a research associate professor at the University of Southern California. “That’s why it was cool.”
Even without its multimodal capabilities, the new program outperforms its predecessors on tasks that require reasoning and problem solving. OpenAI uses both GPT-3.5 and GPT-4 as designed for humans, including bar exam simulations for lawyers, SAT and Advanced Placement tests for high school students, the GRE for college graduates, and even the GRE for couples. It says it ran through various tests. About the sommelier exam. GPT-4 achieved human-level scores on many of these benchmarks, consistently outperforming its predecessors.Still, its extensive problem-solving abilities are great for managing complex schedules, spotting errors in code blocks, explaining grammatical nuances to foreign language learners, or Identify security vulnerabilities.
Additionally, OpenAI claims that the new model can interpret and output longer blocks of text. Over 25,000 words at a time. Earlier models were also used for long-form applications, and often lost what they were talking about. And the company touts the new model’s “creativity” as its ability to produce different kinds of artistic content in a particular style. In a demonstration comparing how it mimicked Borges’ English translation, Vee pointed out that the latest model attempted to be more accurate. “You have to know enough about the context to judge that,” she says. “Undergraduates may not understand why it’s good, but I’m an English professor..I understand it from my area of knowledge, and if I’m impressive in my area of knowledge, it’s an impression.” It’s a target.”
May also tested the model’s creativity on herself. He attempted the playful task of ordering it to create a “backronym” (an acronym that you arrive at by starting with an abbreviation and working backwards). In this case, May wanted a cute name for his lab, followed by “CUTE LAB NAME” and accurately representing his field of study. GPT-3.5 failed to generate associated labels, but GPT-4 succeeded. “I came up with ‘computational comprehension and expressive linguistic analysis transformation, bridging NLP, artificial intelligence, and machine education,'” he says. “‘Machine education’ isn’t great. The ‘intelligence’ part means there are extra letters in there.” But honestly, I’ve seen worse. (For context, the actual name of his lab is HIS CUTE LAB NAME, or Center for Useful Techniques to Enhance Language Applications Based on Natural and Meaningful Evidence). In another test, the model showed the limits of its creativity. When May asked him to write a particular kind of sonnet (he requested the form used by the Italian poet Petrarca), the models were unfamiliar with its poetic setting and so defaulted to Shakespeare. It became a sonnet form.
Of course, fixing this particular problem is relatively easy. GPT-4 just needs to learn an additional poetic form. In fact, having humans fail the model in this way helps program development. The program can learn from everything informal testers type into the system. Like its less fluent predecessor, GPT-4 was originally trained on large amounts of data, and this training was refined by human testers. (GPT stands for Generative pretrained Transformer.) But how did OpenAI improve his GPT-4 over his GPT-3.5, the model that powers the company’s popular ChatGPT chatbot? I keep it a secret. According to a paper published at the same time as the release of the new model, “Considering both the competitive environment and the impact on safety of large-scale models like GPT-4, this report presents an architecture (including model size ), hardware, training computing, dataset construction, training methods, etc.” OpenAI’s lack of transparency means that GPT-4 will have to compete with programs such as Google’s Bard and Meta’s LLaMA in this new reflects a competitive generative AI environment. However, the paper suggests that the company plans to eventually share such details with third parties.
These safety considerations are important because smarter chatbots can do harm. Without guardrails, they could provide terrorists with instructions on how to make bombs, flood threatening messages for harassment campaigns, or misinform foreign agents trying to sway them. . selection. OpenAI places limits on what is allowed in GPT models to avoid such scenarios, but dedicated testers are finding ways around them. As scientist and author Gary Marcus puts it, “These are like the bulls in the china shop. They’re powerful, but they’re reckless.” Scientific American Right before the release of GPT-4. “Do not think [version] Four is trying to change that. ”
And the more human-like these bots become, the better they are at tricking people into thinking there is a sentient agent behind their computer screen. “Because you imitate [human reasoning] Through language, we believe it, but internally, we cannot reason in the same way that humans do,” warns Vee. This illusion can make AI agents believe they are making human-like reasoning, making it easier to trust the AI agent’s answers. This is a serious issue as there is no guarantee that these responses are accurate yet. “Just because these models say something doesn’t mean what they say is true. [true]’ says May. “There is no database of answers that these models are eliciting.” Instead, systems like GPT-4 generate answers one word at a time, and the training data informs them of the most likely next word. but its training data can become outdated. “I don’t think GPT-4 even knows he’s GPT-4,” he says. “When I asked, ‘No, no, there’s no such thing as GPT-4. I’m GPT-3.'”
Now that the model is released, many researchers and AI enthusiasts have the opportunity to explore the strengths and weaknesses of GPT-4. Developers who want to use it in other applications can apply for access, and anyone who wants to “talk” with the program must subscribe to her ChatGPT Plus. At $20 a month, this paid program allows users to choose whether he will speak to a chatbot that runs on GPT-3.5 or a chatbot that runs on GPT-4.
Such investigations will undoubtedly reveal potential applications and flaws of GPT-4. “The real question is, ‘He’s two months from the initial shock, how will people feel about it?'” says Marcus. “Part of my advice is to curb your initial enthusiasm by recognizing that you’ve seen this movie before. Even if problems such as hallucinations and poor understanding of the physical and medical worlds still remain, its usefulness is somewhat limited. , would mean we have to be careful about what it is used for.”