OpenAI Launches Operator to Help Users Automate Browser Tasks

OpenAI has released a research preview for a new AI agent that can take control of your computer’s browser and perform actions on your behalf. The tool can interact with web pages by typing, clicking, and scrolling.

Operator is one of OpenAI’s first AI agents. The company claims it outperforms rival AI agents such as Google DeepMind’s Mariner, built on top of Gemini 2.0, and Anthropic’s Computer Use, an upgraded version of Claude 3.5 Sonnet.

So what exactly can Operator do? According to OpenAI, you can perform a wide variety of browser-related tasks with the tool. This includes personal shopping, filling out forms, and travel booking. Businesses can program Operator for expense management, meeting scheduling, and data migration.

OpenAI’s Operator is powered by a new model called Computer-Using Agent (CUA). By integrating advanced reasoning and vision through reinforcement learning, CUA is trained to navigate and use graphical user interfaces (GUIs). This allows it to take screenshots to “see” the screen and “interact” using the computer’s mouse and keyboard functions. The tool doesn’t need any custom API integrations.

While Operator is designed to overcome challenges or mistakes through self-correction, if it gets stuck or needs assistance, it can hand back control to the user. OpenAI states that CUA is in its early stages and has limitations but it still performed well on WebVoyager and WebArena – two of the more commonly used benchmark frameworks to evaluate AI agents.

Operator is trained to ask the user to take over for tasks that require payment details, login, or when solving CAPTCHAs. Similar to using multiple tabs on a browser, users can have Operator run multiple tasks simultaneously.

OpenAI admits that Operator “currently encounters challenges with complex interfaces like creating slideshows or managing calendars,” but it expects the tool to continue improving and evolving over time.

“To ensure a safe and iterative rollout, we are starting small,” shared OpenAI via a blog introducing Operator. “Starting today, Operator is available to Pro users in the U.S. at operator.chatgpt.com⁠(opens in a new window). This research preview allows us to learn from our users and the broader ecosystem, refining and improving as we go. Our plan is to expand to Plus, Team, and Enterprise users and integrate these capabilities into ChatGPT in the future.”

“Early user feedback will play a vital role in enhancing its accuracy, reliability, and safety, helping us make Operator better for everyone.”

Operator is released to a limited audience to allow the company to learn and refine the tool’s capabilities and fix any potential safety risks. According to OpenAI, Operator ensures user safety and control through multiple safeguards.

The tool asks for inputs at critical points. It enters into a Takeover Mode for inputting sensitive information, such as login details, and requires User Confirmation before finalizing significant actions, such as submitting an order. In addition, operators are trained to decline certain high-stakes tasks, such as banking transition, and go into Watch Mode by giving control back to the user.

For data privacy, users can opt out of data usage for model training, delete browsing data, and log out of all sites with one click. For defense against malicious actors and adversarial websites, Operator is trained to continuously update safeguards against new threats through automated and human reviews.

OpenAI is already collaborating with a number of businesses to expand Operator’s user base and ecosystems. “Operator ⁠transforms AI from a passive tool to an active participant in the digital ecosystem,” share OpenAI. “It will streamline tasks for users and bring the benefits of agents to companies that want innovative customer experiences and desire higher rates of conversion.”

“We’re collaborating with companies like DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, Uber, and others to ensure Operator addresses real-world needs while respecting established norms.”

Superintelligence and AGI (artificial general intelligence) have been gaining rapid publicity over the last few weeks. These two concepts refer to advanced forms of AI. Superintelligence refers to an AI system that surpasses human intelligence across virtually all fields, while AGI is the concept of an AI capable of performing any intellectual task that a human can.

Earlier this year, OpenAI CEO Sam Altman, shared via his personal blog that OpenAI knows how to build AGI, which is considered a holy grail in the world of machine learning (ML). Altman went further to add that the company is now aiming to go beyond that and has set a course for superintelligence.

While AI agents come with their share of risks and uncertainties, tech giants are already heralding them as the next frontier in AI. The AI agent market could reach a valuation of $47.1 billion by 2030. The introduction of Operator is OpenAI’s first real shot at AGI.

Operator seems like the tool “can do” a lot of tasks, but only time will tell how practical and safe it truly is. In this initial phase, Operator shows promise in handling web-based activities, but skepticism remains about its real-world application. Critics argue that while Operator’s capabilities appear impressive, the true test lies in whether it can consistently perform these tasks without needing too much human intervention or posing risks to users.

OpenAI Launches Operator to Help Users Automate Browser Tasks

By stp2y

Leave a Reply Cancel reply