Even with the improbable 6-month moratorium on AI development, GPT-4 looks like it has the potential to be a giant leap when looking closer at itself. Researchers criticized his GPT’s own work showing that performance improved his by 30%.

Researchers Noah Shinn and Ashwin Gopinath say that “humans develop new techniques to achieve state-of-the-art standards using decision-making processes once thought to be unique to human intelligence.” , not every day.”But that’s exactly what we did.”

The ‘reflex’ technique takes advantage of GPT-4’s already impressive ability to perform a variety of tests, creating a ‘framework that allows AI agents to emulate human-like self-reflexes and evaluate their performance’. ” will be introduced. In practice, GPT-4 introduces additional steps to design tests that critique your own answers, look for errors and failures, and rewrite solutions based on what is found.

In the HumanEval coding test, GPT-4 went from 67% to 88% accurate. This is greatly improved by using self-reflecting loops.

The team used that technique for several different performance tests. In the HumanEval test, which consisted of 164 Python programming problems that the model had never seen before, GPT-4 scored a record 67%, but using the Reflexion technique its score is very impressive. jumped to 88%.

The Alfworld test challenges the AI’s ability to make decisions and solve multi-step tasks by performing several different permissible actions in a variety of interactive environments, and reflexion technology outperforms GPT-4’s performance. improved from about 73% to a near-perfect 97%. %, only 4 of 134 tasks failed.

In another test, called HotPotQA, language models were given access to Wikipedia and given 100 out of 13,000 possible question-answer pairs. These pairs “ask the agent to parse the content and make inferences about multiple supporting documents.” In this test, the GPT-4 scored just 34% accuracy, while his GPT-4 using reflection outperformed him by a significant 54%.

Introspective LLM Agent

Equipped with LLM-based agents
– dynamic memory
– Self Reflective LLM
– How to detect hallucinations

Challenge agents to learn from your mistakes

-Assessment of knowledge-intensive tasks
– Better than ReAct Agent

Paper: https://t.co/URsJWbkwmj pic.twitter.com/WfNcPQvIs6

— John Nay (@johnjnay) March 23, 2023

Solutions to AI problems are increasingly looking like AI. In some ways, this feels like a generative adversarial network. In this network, two AIs hone each other’s skills. For example, one tries to generate an image that is indistinguishable from the “real” image, and the other tries to tell it. fake than real. But in this case, GPT is both a writer and an editor, working to improve its own output.

Very beautiful!

This paper is available at: archive.

Source: Explanation of Nano Thinking by AI

Source link

GPT-4 becomes 30% more accurate when asked to critique itself

Leave a ReplyCancel Reply