INSUBCONTINENT EXCLUSIVE:

In a recent article, Ars Technica's Benj Edwards explored some of the limitations of reasoning models trained with reinforcement learning

For example, one study "revealed puzzling inconsistencies in how models fail

Claude 3.7 Sonnet could perform up to 100 correct moves in the Tower of Hanoi but failed after just five moves in a river crossing

system performs a keyword- or vector-based search to retrieve the most relevant documents

RAG systems can make for compelling demos

possible to develop much better information retrieval systems by allowing the model itself to choose search queries

can stay on task across multiple rounds of searching and analysis

LLMs were terrible at this prior to 2024, as the examples of AutoGPT and BabyAGI demonstrated

point applies to the other agentic applications I mentioned at the start of the article, such as coding and computer use agents

What these systems have in common is a capacity for iterated reasoning

They think, take an action, think about the result, take another action, and so forth.Timothy B

Lee was on staff at Ars Technica from 2017 to 2021

How a huge shift in training LLMs resulted in an ability explosion