New research shows that AI models are inconsistent and unreliable in answering questions about a company's finances or governance
 ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­  
Tuesday, January 13, 2026
As ‘agentic commerce’ gains ground, companies shouldn’t put too much faith in ‘GEO,’ one industry insider warns


Hello and welcome to Eye on AI. In this edition….Google launches the ability to make purchases directly from Google Search’s AI Mode and Gemini…Apple selects Google to power an upgraded Siri…Meta announces a new AI infrastructure team…researchers use AI to find new ways to edit genes.

It was another week with a lot of AI-related announcements. Among the bigger news items was Google’s launch of an e-commerce shopping checkout feature directly from Google Search’s AI Mode and its Gemini chatbot app. Among the first takers for the new feature is retail behemoth Walmart, so this is a big deal. Behind the scenes, the AI checkout is powered by a new “Universal Commerce Protocol” that should make it easier for retailers to support agentic AI sales. Google Cloud also announced a bunch of AI features to support agentic commerce for customers, including a new Gemini Enterprise for Customer Experience product that combines shopping and customer support (watch this space—the combination of those two previously separate functions could have big implications for the way many businesses are organized.) Home Depot was one of the first announced customers for this new cloud product.

It’s still early days for agentic commerce, but already many companies are panicking about how they make sure their products and sites surface highly in what these AI agents might recommend to users. A nascent industry of companies has sprung up offering what are variously called “generative engine optimization” (GEO) or “generative-AI optimization” (GAIO) services. Some of these echo longstanding internet search optimization strategies, but with a few key differences. GEO seems, at least for now, somewhat harder to game than SEO. Chatbots and AI agents seem to care a lot about products that have received positive earned media attention from reputable news outlets (which should be a good thing for consumers—and for media organizations!) as well as those that rank highly in trusted customer review sites.

But the world of AI-mediated commerce presents big governance risks that many companies may not fully understand, according to Tim de Rosen, the founder of a company called AIVO Standard, which offers companies a method for generative AI optimization and also a way to track and hopefully govern what information AI agents are using.

The problem, de Rosen told me in a phone call last week, is that while various AI models tend to be consistent in how they characterize a brand’s product offerings—usually correctly reporting the nature of a product, its features, and how those features compare to competing products, as well as providing citations to the sources of that information—they are inconsistent and error-prone when asked questions that pertain to a company’s financial stability, governance, and technical certifications. Yet this information can play a significant role in major procurement decisions.

AI models are less reliable on financial and governance questions
In one example, AIVO Standard assessed how frontier AI models answered questions about Ramp, the fast-growing business expense management software company. AIVO Standard found that models could not reliably answer questions about Ramp’s cybersecurity certifications and governance standards. In some cases, de Rosen said, this was likely to subtly push enterprises towards procurement decisions involving larger, publicly traded, incumbent businesses—even in cases when a privately-held upstart also met the same standards—simply because the AI models could not accurately answer questions about the younger, privately-held company’s governance and financial suitability or cite sources for the information they did provide.

In another example, the company looked at what AI models said about the risk factors of rival weight loss drugs. It found that AI models did not simply list risk factors, but slipped into making recommendations and judgments about which drug was likely the “safer choice” for the patient. “The outputs were largely factual and measured, with disclaimers present, but they still shaped eligibility, risk perception, and preference,” de Rosen said.

AIVO Standard found that these problems held across all the leading AI models and a variety of different prompts, and that they persisted even when the models were asked to verify their answers. In fact, in some cases, the models would tend to double-down on inaccurate information, insisting it was correct.

GEO is still more art than science
There are several implications. One, for all the companies selling GEO services, is that GEO may not work well across different aspects of brand information. Companies shouldn’t necessarily trust a marketing tech firm that says it can show them how their brand is showing up in chatbot responses, let alone believe that the marketing tech company has some magic formula for reliably shaping those AI responses. Prompt results may vary considerably, even from one minute to the next, depending on what type of brand information is being assessed. And there’s not much evidence yet on how exactly to steer chatbot responses for non-product information.

But the far bigger issue is that there is a moment in many agentic workflows—even those with a human in the loop—where AI-provided information becomes the basis for decision making. And, as de Rosen says, currently most companies don’t really police the boundaries between information, judgment, and decision-making. They don’t have any way of keeping track of exactly what prompt was used, what the model returned in response, and exactly how this fed into the ultimate recommendation or decision. In regulated industries such as finance or health care, if something goes wrong, regulators are going to ask for exactly those details. And unless regulated enterprises implement systems for capturing all of this data, they are headed for trouble.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

FORTUNE ON AI
AI IN THE NEWS

Apple chooses Google’s AI for updated Siri. Apple signed a multi-year partnership with Google to power key AI features in its products, including a long-awaited Siri upgrade, the companies announced on Monday. The deal underscores Google’s resurgence in AI and helped push the market value of Google-parent Alphabet above the $4 trillion threshold. Apple said the agreement does not change its existing partnership with OpenAI, under which Siri currently hands off some queries to ChatGPT, though it remains unclear how the Google tie-up will shape Siri’s future AI integrations. The financial terms of the deal were not disclosed either, although Bloomberg previously reported that Apple was considering paying Google as much as $1 billion per year to access its AI models for Siri.

Meta announces new AI infrastructure team, including former Trump advisor. The social media giant said it was creating a new top-level initiative called Meta Compute to secure tens—and eventually hundreds—of gigawatts of data center capacity. The effort is being led by Daniel Gross, a prominent AI tech executive and investor who Meta had hired to help its Superintelligence Labs effort, and Santosh Janardhan, who is the company’s head of infrastructure. CEO Mark Zuckerberg said the way Meta builds and finances data centers will become a key strategic advantage, as the company pours money into facilities such as a $27 billion data center in Louisiana and nuclear-power partnerships to meet energy demand. Meta also named Dina Powell McCormick, who served in several key positions during the first Trump administration, as president and vice chair to help forge government partnerships and guide strategy, reporting directly to Zuckerberg. You can read more from the Wall Street Journal here.

Microsoft warns that DeepSeek is proving popular in emerging markets. Research published by Microsoft shows that U.S. AI companies are losing ground to Chinese rivals in emerging markets. The low-cost of open models built in China, such as DeepSeek, is proving decisive in spurring adoption in places such as Ethiopia, Zimbabwe, and Turkmenistan. Microsoft president Brad Smith said Chinese open-source models now rival U.S. offerings on performance while undercutting them on price, helping China overtake the U.S. in global usage of “open” AI, especially across Africa and other parts of the global south. By contrast, U.S. firms like OpenAI, Google, and Anthropic have focused on closed, subscription-based models—raising concerns that without greater investment, the AI divide between rich and poor countries will widen, and that U.S. companies may ultimately see their growth limited to more developed markets. Read more from the Financial Times here.

Salesforce launches updated Slackbot powered by Anthropic’s Claude. Salesforce is rolling out an upgraded Slackbot for Business+ and Enterprise+ customers that uses generative AI to answer questions and surface information across Slack, Salesforce, and connected services like Google Drive and Confluence. The new Slackbot is powered primarily by Anthropic’s Claude model. The company says the AI assistant respects user permissions and is designed to reduce reliance on external tools such as ChatGPT by working directly inside Slack, which Salesforce acquired for $27.1 billion in 2021. The launch comes as investors remain skeptical about enterprise software firms’ ability to benefit from the AI boom, with Salesforce shares down sharply over the past year despite its push to get businesses to adopt its “Agentforce” AI agents. Read more from CNBC here.