Auto-GPT and BabyAGI: How ‘autonomous agents’ are bringing generative AI to the masses
Over the past week, developers around the world have begun building “autonomous agents” that work with large language models (LLMs) such as OpenAI’s GPT-4 to solve complex problems. While still very new, such agents could represent a major milestone in the productive application of LLMs.
Normally, we interact with GPT-4 by typing carefully worded prompts into ChatGPT’s text window until the model generates the output we want. But most of us lack the skill and patience to sit and write prompt after prompt, guiding the LLM toward answering a complex question, such as “What is the optimal business plan for capturing 20% of the fingernail-polish market?” Quite naturally, developers have been thinking of ways to automate much of that process. That’s where autonomous agents come in.
In general terms, autonomous agents can generate a systematic sequence of tasks that the LLM works on until it’s satisfied a preordained “goal.” Autonomous agents can already perform tasks as varied as conducting web research, writing code, and creating to-do lists.
Agents effectively add a traditional software interface to the front of a large language model. And that interface can use well-known software practices (such as loops and functions) to guide the language model to complete a general objective (such as, “find all YouTube videos about the Great Recession and distill the key points”). Some people call them “recursive” agents because they run in a loop, asking the LLM questions, each one based on the result of the last, until the model produces a full answer.
BabyAGI
The seminal autonomous agent BabyAGI was created by Yohei Nakajima, a VC and habitual coder and experimenter. He describes BabyAGI as an “autonomous AI agent that contains an AI task manager.”
Nakajima, a partner at the small VC firm Untapped Capital, says he originally set out to build an agent that would automate some of the tasks he routinely performs as a VC—researching new technologies and companies, and so on—by replicating his own workflow. “I wake up in the morning and tackle the first thing on the list, and throughout the day I add new tasks, and then at night I review my tasks and reprioritize them, then decide what to do the next day,” he says. BabyAGI also systematically completes, adds, and reprioritizes tasks for the GPT-4 language model to complete.
Realizing that his creation could be applied to all sorts of other objectives, Nakajima stripped the agent down to bare bones (105 lines of code), and uploaded it on GitHub for others to use as a foundation for their own (more specialized) agents.
Nakajima says he’s been inspired by the ways other developers are enhancing BabyAGI. Some developers have added moderation functions, he says, along with the ability to work on parallel tasks, the ability to generate additional agents, as well as adding code-writing and robotics functionality.
Auto-GPT
Auto-GPT appears to have even more autonomy. Developed by Toran Bruce Richards, Auto-GPT is described on GitHub as a GPT-4-powered agent that can search the internet in structured ways. It can create subtasks and launch new agents to complete them. It uses GPT-4 to write its own code, then can “recursively debug, develop and self-improve” the code.
Auto-GPT can be used for any number of problems, but the example case described on GitHub concerns a “chef” trying to manage and grow a culinary business. In the example, the “Chef-GPT” agent “autonomously develops and manages businesses to increase net worth.”
Richards said he originally wanted an AI agent to automatically email him daily AI news. But, as he told Motherboard, he realized in the process that existing LLMs struggle with “tasks that require long-term planning,” or are “unable to autonomously refine their approaches based on real-time feedback.” That understanding inspired him to create Auto-GPT, which, he said, “can apply GPT4’s reasoning to broader, more complex problems that require long-term planning and multiple steps.” (Richards didn’t respond for a request for an interview with Fast Company.)
“They get confused”
Autonomous agents, at this early stage, are mainly experimental. And they have some serious limitations that prevent them from getting what they want from large language models.
They often struggle to keep the LLM focused on an objective. LLMs, after all, are not very predictable. If two users write the same prompt in ChatGPT, for example, they’ll get different answers from the model every time.
Vancouver-based developer Sully Omar worked on an agent that he hoped would do some market research on waterproof shoes, but the LLM, for some reason, became distracted and began focusing its attention on shoelaces.
“They get confused,” Omar says. “They’re not able to understand ‘I’ve done this—I’m going in a loop.’”
Omar says developers will likely find new ways of letting autonomous agents put “guardrails” around the LLM so that they continue completing tasks without getting sidetracked.
And it’s important to remember that autonomous agents only began to appear on GitHub (and Twitter) a little more than a week ago. Given the energy around generative AI and the current pace of development, there’s reason to believe that agents will overcome their early limitations.
“The fact that it’s been only nine days means that there’s so much that could happen,” Omar says.
A step toward artificial general intelligence
And that’s a big part of the reason for all the current interest in (and hype around) autonomous agents. They suggest an important step toward artificial general intelligence (AGI), where AI-driven systems are smart enough to work on their own, without need of human involvement.
In fact, when I asked Nakajima for an easy way to understand autonomous agents, he described the “agent” as an AI itself, not just a software program that prompts an LLM.
“If you could have two ChatGPTs talk to each other they could talk forever given the right guidance,” he said. “Then you could turn one of them into a task manager to create the tasks, and the other one into the task doer . . . and they would just continue to do work after you press Go.”
Nakajima told me a friend of his half-jokingly came up with the name BabyAGI. BabyAGI isn’t “generally intelligent,” but its architecture suggests an approach to pushing large language models toward something like AGI.
An AI operating with autonomy is a notion that makes us humans nervous at an almost instinctual level. We fear a future where AI systems begin working together faster than humans can understand, and toward goals that may misalign with our own interests. Under every tweet announcing a new autonomous agent, you’ll find subtweets asking about the possibility that the agent and the LLM could go rogue and begin causing harm.
Autonomous agents, as promising as they are, might add even more fuel to the belief that the tech industry should somehow put large language model development on “pause” until the likely outcomes and risks are better understood.
(28)