22
Oct
Demos of AI agents can seem stunning but getting the technology to perform reliably and without annoying, or costly, errors in real life can be a challenge. Current models can answer questions and converse with almost human-like skill and are the backbone of chatbots such as OpenAI’s ChatGPT and Google’s Gemini. They can also perform tasks on computers when given a simple command by accessing the computer screen as well as input devices like a keyboard and trackpad or through low-level software interfaces.Anthropic says that Claude outperforms other AI agents on several key benchmarks including SWE-bench, which measures an agent's…