Reading early coverage of OpenAI's Operator the consensus seems to be that in its current form it is clunky and fragile, but shows exciting potential if it can deliver on promised improvements.
Kevin Roose for the New York Times:
In all, I found that using Operator was usually more trouble than it was worth. Most of what it did for me I could have done faster myself, with fewer headaches. Even when it worked, it asked for so many confirmations and reassurances before acting that I felt less like I had a virtual assistant and more like I was supervising the world’s most insecure intern.
This is, of course, early days for A.I. agents. A.I. products tend to improve from version to version, and it’s a good bet that the next iterations of Operator will be better. But in its current form, Operator is more an intriguing demo than a product I’d recommend using — and definitely not something most people need to spend $200 a month on.
Though I haven't ponied up the $200/mo to try out Operator for myself, these experiences aren't entirely shocking given what we saw when Anthropic demoed their experiments with Claude Computer Use. Sadly, the web can be a frustratingly hostile experience and so I'm not surprised large language models aren't exactly great at it.
But this is what these AI companies have to do. If users only ever wanted to do a distinctly narrow subset of actions such as buy groceries, order takeout, and book a restaurant reservation then these companies could sign a couple partnership deals with the likes of Instacart, DoorDash, and a few other key players and suddenly have access to much more reliable APIs with which to carry out user's requests. However, if the only things AI "Agents" could do was a small handful of pre-defined actions AI companies would have a hard time justifying their stratospheric valuations (and equally eye-watering spending). And so, again, this is what these companies have to do because they can't expect to have a nice and tidy API for the vast array of requests users might make.
To touch on the second hurdle these AI "Agents" face Casey Newton of Platformer details a maddening example of the intersection of the fragility LLM-as-web-browser-driver and the challenging privacy implications of typing personal info into a web browser virtualized in a data center:
My most frustrating experience with Operator was my first one: trying to order groceries. “Help me buy groceries on Instacart,” I said, expecting it to ask me some basic questions. Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want?
It didn’t ask me any of that. Instead, Operator opened Instacart in the browser tab and begin searching for milk in grocery stores located in Des Moines, Iowa.
At that point, I told Operator to buy groceries from my local grocery store in San Francisco. Operator then tried to enter my local grocery store’s address as my delivery address.
After a surreal exchange in which I tried to explain how to use a computer to a computer, Operator asked for help. “It seems the location is still set to Des Moines, and I wasn't able to access the store,” it told me. “Do you have any specific suggestions or preferences for setting the location to San Francisco to find the store?”
At that point, I asked to take over. Operator handed me the reins, and I logged into my account and picked my usual grocery store. From there, I was able to add a few items into my cart by asking for them specifically. The process was painstaking and inefficient in a way that personally made me laugh but I imagine might drive others insane. In the end, adding six bananas, a 12-pack of seltzer, and a package of raspberries to a cart had taken me 15 minutes.
The experience revealed to me one of Operator’s key deficiencies: it can use a web browser, but it cannot use your web browser. This matters a lot, because your browser is already set up for you to use the web efficiently. You’re already logged in to the services you use most, and many of those services are further modified to reflect your personal preferences and make using them more efficient. Open a browser on a different computer, and every single time you’re starting from scratch.
So, not only is Operator having to fumble through a bunch of web ui components designed for humans, but it also needs to ask you for tons of personal details in order to provide more value to you. Again, this tension isn't exactly novel — the same trade-offs would exist if I were to hire an actual assistant. A human assistant would struggle to order me something if they didn't have my payment info or successfully get anything delivered to my house if they didn't know my address. Any assistant that doesn't really know that much about you is fundamentally limited in how useful it can be to you. In particular, I think Parker Ortolani did a great job succinctly getting to the heart of why Operator feels so clunky and yet also why the promise is so compelling:
I would love to send my Operator out to clean up my inbox, to beautify presentations, to organize paywalled stories across sites, to post on all social media sites for me at once, to clean up storage space on my machines, to create watchlists on streaming platforms, and so on. The possibilities are endless, but the reality is that we need Operator to run locally or a formal API that securely interfaces with these services. I do not want a remote browser controlled by someone else to have carte blanche access to my most important accounts. Without granting Operator those credentials, it isn’t particularly useful.
This tension is where I think Google, and maybe especially Apple, are in a great position. Both already have tons of your personal information. No need to trust OpenAI with access to your Gmail account in order to clean up your inbox if you're using Google's AI[1]. Google and Apple already have the personal info necessary to make an AI Assistant valuable. But it's Apple that has another strategy credit that gives them an advantage in the form of App Intents.[2]
From Apple's documentation (emphasis mine):
By adopting the App Intents framework, you allow people to personalize their devices by instantly using your app’s functionality with:
- Interactions with Siri, including those that use the personal context awareness and action capabilities of Apple Intelligence.
- Spotlight suggestions and search.
- Actions and automations in the Shortcuts app.
- Hardware interactions that initiate app actions, like the Action button and squeeze gestures on Apple Pencil.
- Focus to allow people to reduce distractions.
For example, App Intents enables you to express your app’s actions, by offering an App Shortcut. People can then ask Siri to take those actions on their behalf, whether they’re in your app or elsewhere in the system. Use App Entities to expose content in your app to Spotlight and semantic indexing with Apple Intelligence. People can then ask Siri to retrieve information from your app, like asking Siri to pull up flight information from a travel app to share with a loved one.
If Apple's existing relationship with your personal info on your device solves the second problem for AI "operators" then App Intents solves the first. App Intents have the potential to solve the fragility of LLMs-as-assistant by giving Apple Intelligence a regularized framework into every app installed on your device — with the bonus upside that a chunk of the work is picked up by the third-party developer, not Apple. Instead of having to spin up a virtual web browser and click around the Instacart website, Apple Intelligence could invoke the Instacart App right on the user's device to add_to_cart: [milk, eggs, carrots, hummus]
. And no need to worry about deliveries going to the wrong address or needing to trust the LLM provider with your personal info. Frankly, Apple doesn't even need your Instacart info in this hypothetical — they just rely on assuming you've already logged into the Instacart app.
To be clear, this is all still a vision of a glorious future and there are risks and challenges to get from here to there. For one, Apple Intelligence has to actually get good (an area where I think Google's years of AI investment is likely to pay off). And Apple needs to convince third-party developers to invest in App Intents.[3] However, the more I see from leading AI companies productizing their breakthrough research the more I'm convinced Apple and Google might end up being the ones with the moat that matters.
Presumably you already trust Google with your email data if you're using Gmail. ↩︎
It seems like Google has something similar called App Actions for Android, but I don't have as much familiarity with developer sentiment around Android and App Actions or how much Google is investing in it. ↩︎
Though I do think market competition has potential to help out here. If DoorDash has robust App Intents support but Uber Eats does not and thus Apple Intelligence is sending more orders to DoorDash as a result, perhaps that is then incentive for Uber Eats to invest in it? ↩︎