Here’s how Apple can compete against OpenAI and Google

Here’s how Apple can compete against OpenAI and Google

Siri hasn’t progressed in 13 years, and now there’s formidable competition. But there are levers Apple can pull to stay in the game.

BY Tiernan Ray

Users of Apple’s Siri would probably agree the talking assistant hasn’t progressed much since it was first introduced in September of 2011. In fact, on certain days, Siri seems actually to have gotten less intelligent, more prone to misconstrue requests or to respond sluggishly. Certainly, Siri has not lived up to the loftiest expectations we all had for its evolution.

That’s a big problem for Apple, which lead the pack in voice assistants but now faces formidable competition from OpenAI, Microsoft’s artificial intelligence partner, and from Alphabet’s Google. On Monday, OpenAI debuted GPT-4o, which has an impressive synthesized voice, and the ability to process pictures and maintain a continuous stream of assistant tasks.

Google followed Tuesday with the latest version of its Gemini program, with similarly impressive voice abilities. It’s possible Siri could be sideswiped by Google Pixel phones using Gemini, but also by OpenAI on non-Apple devices, including a rumored OpenAI phone (or if Microsoft ever revives its failed Windows Phone.)

The story is not over for Siri, however. Apple CEO Tim Cook has hinted at things to come next month at the company’s Worldwide Developer Conference (a virtual-only event). Although not leading in the science of AI to date, Apple has made significant offerings in machine learning research, so it’s worth considering what the company might have up its sleeve.

On the simplest level, rumors have circulated that Apple will license the technology of either OpenAI or Google. GPT-4o or Gemini could be a drop-in upgrade for Siri, which would be a boost for all iPhone users given the sorry state of Siri.

As the number-one or number-two phone maker in any given quarter, vying with Samsung, which licenses Google’s Android operating system, Apple has a lot to offer the user of on-device AI.

An assistant that can competently follow sequential steps of instruction could be instructed to “choose a photo of my living room and email it to Nikki, with the message, ‘Check out the new sofa.’” The assistant could be corrected where it selected one particular pic when in fact I’d like it to choose another pic “with more of the sofa in the image.”

A modern Siri could also follow vague imperatives such as “find a time for me and Max to talk next week” or even, “Schedule these three tasks in the next week.” Such basic assistant functions are far beyond Siri’s present capabilities but could certainly be achieved by simply swapping in GPT-4o or Gemini.

Leaning into such multistep tasks—connect my photos, my calendar, my travel app—is an advantage for Apple given that it has the App Store, whereby it can connect an AI assistant to third-party applications. That is an emerging area of AI known as “AI agents.” The idea is that the AI model is just a friendly interface to all the functions of many other programs. How it gets done is a serious security challenge so that malicious code doesn’t take over all the functions on a phone. That’s an area where Apple can innovate.

There are other things to upgrade or improve as far as the device experience. Any photos used by a GPT-4o or Gemini on an iPhone would use the built-in camera, for example. Apple could be adept in how the camera is used as an AI-companion, such as letting the assistant help the user pick the best frames when taking a multi-exposure “live” photo. Even better, “Tell me what’s wrong with this composition” is the kind of photography-for-dummies advice some people might want in real time, before they press the shutter button.

Better yet would be a Siri assistant whom one can simply address with the command, “Take a picture of the three of us and fix the back-lighting” and have it automatically bring up the camera and snap a groupie photo with some automatic exposure adjustment.

The same is true for lots of on-device data, such as searching through voicemail messages, which currently exist as rather mediocre transcripts in text form. Finding the contents of people’s masses of photos is something Google showed off, and Apple can use the technology just as well to perform photo search. “Find all pictures of me and a friend outdoors” is a kind of standard query Siri can’t even approach today.

Apple has control of hundreds of millions of iMessage users, and in past that has been an area of moderate innovation, such as the “Memojis,” animated cartoons that embody the voice message a person records. Google has shown the ability to create a video clip of a person speaking from just a single photo.

Apple could do something similar to upgrade Memojis to a personal avatar that would look just like oneself, and have realistic motion and sound synchronization. Apple would have the benefit of capturing multiple exposures of the individual, at different angles, with the front-facing FaceTime camera in an initial setup phase. It’s an area where Apple has the ability to work with its hardware partners to make the lens and sensors as good as possible.

The next level up is grounding AI in customers’ data. Neural networks including GPT-4o and Gemini are notorious for what are known as “hallucinations,” where they seem to confidently assert falsehoods. One solution for that chaotic situation is to ground the programs in valid data.

Apple has an advantage in having tons of users’ data both on the device and in the cloud. If one of these assistants were connected to contact, calendar, and document and Web data, the program would seek authoritative answers that have real context to them. Queries such as, “Did anything I read recently in FastCompany match up with documents I’ve written or people in my contacts or my email,” represent an ambitious kind of deep dive, but nothing impossible if the integration of data is well crafted.

An assistant that could, say, summarize all my e-books on my device, and give me a brief overview, would approach what Vanavar Bush described in his famous 1945 essay, As We May Think, as an extension of people’s memories.

Apple’s biometric and health and fitness data could be used to create a more-than-arbitrary fitness program. One could ask gigantic questions such as, “Given my age, weight, height, heart rate, and diet, and my recent step-count history and cycling history, what’s the best program of exercise for me for the next few months?” Borrowing a page from Microsoft, this becomes a kind of “Copilot” for health and fitness. 

Moving beyond what can be done with data out of the box, Apple has an opportunity to pioneer a fascinating emerging realm of AI known as “on-device training.”

Neural networks such as GPT-4o and Gemini are developed during an initial phase in the lab known as training. The neural net is given numerous examples of success and its results are tweaked until they produce optimal answers. That training then becomes the basis of the neural network’s question-answering. That is done using the most powerful computers on the planet, far more computing power than an individual has on their phone.

The problem with such AI training is that it is generic. It is built from tons of data from tons of pages scraped from the Web or from various corpi of published books. As a result, it’s not so personal. If I want to write an essay, GPT-4o and Gemini are filled with examples of how Jane Austin and Paul Auster write, presumably, but they don’t know anything about my style as a writer.

Apple has an opportunity to deeply personalize how these neural nets function, at an individual level, which is something that’s never before been achieved because these programs have never been trained on individuals’ data in an individual fashion. (OpenAI collects its users’ data for training, as, presumably, does Google, but because neither are closing the loop by producing individualized results, the end result is never very personal.)

 

One approach is to “fine-tune” what Apple might license from an OpenAI or a Google, by applying a little bit of extra data to the finished GPT-4o or Gemini so that they are refined to lean more toward individual preferences. Whether that is possible will depend on the terms of a private label deal, if any, between Apple and OpenAI or Google. The two vendors may place restrictions that prevent Apple from modifying their neural nets, just as Google doesn’t let Apple modify the Google search results algorithm that functions on the iPhone.

Apple could also go it alone. Apple’s most interesting research work to date (at least, what’s publicly disclosed) is to conduct some training on the client device itself. That is no mean feat because of the computing budget required.

Apple could take some of the many neural nets that are freely shared in source-code form, so-called open source AI, and train them while a person is walking around talking, typing, and snapping pics. The open-source programs can be modified by any party. Such programs can’t do all the things that GPT-4o and Gemini can do, but they might do more important, more personal, things by being more focused.

Because it takes a lot of computing power to train neural nets, Apple could split the work between the “A-series” chips in an iPhone, say, and its own chips working in cloud data centers, where rumor has it Apple is developing greater AI processing power.

What can you do if you train the neural net on a person’s constantly updated device data?

A simple example is to boost photos categorization by giving the neural net more context about what’s in the image. This is not “a cat” in the photo you’re looking at but your cat, similar to the many others you have taken, presented to you as an instant album of your cat, similar to what Apple does today when it recognizes faces in portraits.

Walking through an art gallery, if you snap a pic of a painting, your phone might recall connections between that artist and something you’ve snapped in a museum last month.

Even rather goofy examples abound. “Create a picture that is a mash up of my cat inside that Van Gogh painting I saw last week” is the kind of novelty that could be initially a viral feature of an enhanced Siri.

On a more sophisticated level, if I take five pictures of my sofa at different angles, the neural net can be trained to understand the physical, three-dimensional structure of my sofa in ways that GPT-4o and Gemini will not because their data is more scattered and not specific. Such a locally trained AI model would start to have an understanding of objects in the physical world that could be tremendously valuable for anything from product categorization to home renovation.

A more interesting example is to have the Siri assistant approach something like “reasoning,” where it anticipates, for example, the way your trip planning works. If you ask Siri to “check out travel deals for me for next month,” the assistant might notice your calendar is chock-full of appointments and advise taking an itinerary that doesn’t conflict.

A yet more complex example is to get information from the various social media apps one uses to connect the dots between the things you’ve “liked” across Facebook and Pinterest and X. These applications exist in a silo, but they have something to say about preferences for consumer goods your might buy or political issues you might follow. Such signals could be used to train a neural net to direct the user to similar posts emerging on the services.

The next stage is what’s called in AI circles “federated learning,” where all of the hundreds of millions of users of iOS devices contribute some data which is then anonymized and aggregated to train a neural net.

The sofa I have trained Siri to understand in three dimensions can be compared to other people’s home furnishings. All of what I’m reading in e-books can be compared to anonymized summaries of what others are reading. Or social media posts that I have liked can be compared to analogous things people are perusing. The goal here is to meld what you prefer with some of the wisdom of the crowd.

That last point raises many thorny issues. It’s not certain how much access social media giants will allow Apple. And it remains to be seen how comfortable individual users might be when presented with comparisons between their own habits and interests and those of hundreds of millions of other people.

Clearly, though, there is a lot of opportunity for Apple in all the ways that personal devices can give focus to what has been a lot of very general use of “generative AI” to answer questions or craft images. It’s really what is being referred to generally as the “AI PC,” a device that gives more attention not to the generic Internet but to what interests people in their individual pursuits.

Apple has some advantage with control of hardware and software on the device, and a cloud computing business. It’s your ball, Tim Cook.


ABOUT THE AUTHOR

Tiernan Ray is editor of The Technology Letter and is a senior contributing writer at ZDNET. His work has also been published in The New York Times, Fortune, Barron’s and Bloomberg 


Fast Company

(15)