How OpenAI’s first reasoning model sparked Silicon Valley’s obsession with general-purpose AI agents — and what comes next.
Maxwell Zeff
7:00 AM PDT · August 3, 2025
Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of the fastest-growing products ever. Meanwhile, Lightman quietly worked on a team teaching OpenAI’s models to solve high school math competitions.
Today that team, known as MathGen, is considered instrumental to OpenAI’s industry-leading effort to create AI reasoning models: the core technology behind AI agents that can do tasks on a computer like a human would.
“We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at,” Lightman told TechCrunch, describing MathGen’s early work.
OpenAI’s models are far from perfect today — the company’s latest AI systems Still hallucinate and its agents struggle with complex tasks.
But its state-of-the-art models have improved significantly on mathematical reasoning. One of OpenAI’s models recently won a gold medal at the International Math Olympiad, a math competition for the world’s brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects, and ultimately power general-purpose agents that the company has always dreamed of building.
ChatGPT was a happy accident — a lowkey research preview turned viral consumer business — but OpenAI’s agents are the product of a years-long, deliberate effort within the company.
“Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” said OpenAI CEO Sam Altman at the company’s first developer conference in 2023. “These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”

OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 06, 2023 in San Francisco, California.(Photo by Justin Sullivan/Getty Images)Image Credits:Justin Sullivan / Getty Images
Whether agents will meet Altman’s vision remains to be seen, but OpenAI shocked the world with the release of its first AI reasoning model ,01 in the fall of 2024. Less than a year later, the 21 foundational researchers behind that breakthrough are the most highly sought-after talent in Silicon Valley.
Mark Zuckerberg recruited five of the o1 researchers to work on Meta’s new superintelligence-focused unit, offering some compensation packages north of $100 million. One of them, Shengjia Zhao, was recently named chief scientist of Meta Superintelligence Labs.
The reinforcement learning renaissance
The rise of OpenAI’s reasoning models and agents are tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on whether its choices were correct or not in simulated environments.
RL has been used for decades. For instance, in 2016, about a year after OpenAI was founded in 2015, an AI system created by Google DeepMind using RL, Alpha GO, gained global attention after beating a world champion in the board game, Go.

South Korean professional Go player Lee Se-Dol (R) prepares for his fourth match against Google’s artificial intelligence program, AlphaGo, during the Google DeepMind Challenge Match on March 13, 2016 in Seoul, South Korea. Lee Se-dol played a five-game match against a computer program developed by a Google, AlphaGo. (Photo by Google via Getty Images)
Around that time, one of OpenAI’s first employees, Andrej Karpathy, began pondering how to leverage RL to create an AI agent that could use a computer. But it would take years for OpenAI to develop the necessary models and training techniques.
By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive amounts of internet data and a large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math.
It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed “Q*” and then “Strawberry,” by combining LLMs, RL, and a technique called test-time computation. The latter gave the models extra time and computing power to plan and work through problems, verifying its steps, before providing an answer.
This allowed OpenAI to introduce a new approach called “chain-of-thought” (CoT), which improved AI’s performance on math questions the models hadn’t seen before.
“I could see the model starting to reason,” said El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.”
Though individually these techniques weren’t novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact checking abilities of AI reasoning models could be useful to power AI agents.
“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”
Scaling reasoning
With AI reasoning models, OpenAI determined it had two new axes that would allow it to improve AI models: using more computational power during the post-training of AI models, and giving AI models more time and processing power while answering a question.
“OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” said Lightman.
Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an “Agents” team led by OpenAI researcher Daniel Selsam to make further progress on this new paradigm, two sources told TechCrunch. Although the team was called “Agents,” OpenAI didn’t initially differentiate between reasoning models and agents as we think of them today. The company just wanted to make AI systems capable of completing complex tasks.
Eventually, the work of Selsam’s Agents team became part of a larger project to develop the o1 reasoning model, with leaders including OpenAI co-founder Ilya Sutskever, chief research officer Mark Chen, and chief scientist Jakub Pachocki.

Ilya Sutskever, Russian Israeli-Canadian computer scientist and co-founder and Chief Scientist of OpenAI, speaks at Tel Aviv University in Tel Aviv on June 5, 2023. (Photo by JACK GUEZ / AFP)Image Credits:Getty Images
OpenAI would have to divert precious resources — mainly talent and GPUs — to create o1. Throughout OpenAI’s history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them.
“One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’”
Some former employees say that the startup’s mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models. By focusing on developing the smartest-possible AI models, rather than products, OpenAI was able to prioritize o1 above other efforts. That type of large investment in ideas wasn’t always possible at competing AI labs.
The decision to try new training methods proved prescient. By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field’s momentum comes from advances in reasoning models.
What does it mean for an AI to “reason?”
In many ways, the goal of AI research is to recreate human intelligence with computers. Since the launch of o1, ChatGPT’s UX has been filled with more human-sounding features such as “thinking” and “reasoning.”
When asked whether OpenAI’s models were truly reasoning, El Kishky hedged, saying he thinks about the concept in terms of computer science.
“We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning,” said El Kishky.
Lightman takes the approach of focusing on the model’s results and not as much on the means or their relation to human brains.
Sources: Tech crunch