Also: Israeli government approves Gaza deal as troops pull out.
 ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏  ͏ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­  
Friday, October 10, 2025
OpenAI study suggests AI may be about to eclipse human expertise in real-world tasks


  • In today’s CEO Daily: Geoff Colvin on AI potentially eclipsing human expertise.
  • The big story: Israeli government approves Gaza deal as troops pull out.
  • The markets: Mostly down on news of China restrictions on rare-earth exports.
  • Plus: All the news and watercooler chat from Fortune.
Good morning. Rarely does a 29-page scholarly paper merit the attention of top-level executives, but every business leader should be familiar with a recent study from OpenAI. It’s the best description yet of how AI can handle real-world tasks, showing which AI models are excelling, and hinting at what it all means for humans in the years ahead. The paper can be heavy going, but you can get a masterful summary from our AI Editor, Jeremy Kahn.

For leaders, three points stand out:

The study is highly realistic. It examined 44 occupations and 1,320 specialized tasks required by those occupations. For example: the final testing step in manufacturing a cable spooling truck for underground mining operations. Appropriate professionals (average experience: 14 years) vetted the tasks, all of which are elements of actual work deliverables. Previous research has almost always focused on less realistic tests. The AI results were graded by expert humans who didn’t know if they were looking at work from AI or from an expert human professional.

The best models are already nearly as good as human industry experts. The study examined seven AI models from Open AI, Google’s Gemini, xAI’s Grok, and Anthropic’s Claude. The clear winner was Claude Opus 4.1, which came within a few percentage points of reaching parity with human industry experts. The best models also completed tasks about 100 times faster and 100 times cheaper than the industry experts, though the comparisons ignore “the human oversight, iteration, and integration steps required in real workplace settings,” OpenAI says.

The models are improving at a galloping pace. For example, as OpenAI’s models improved, the percentage of their task outputs that were as good as or better than humans’ outputs more than tripled. If that rate continues—a big if—OpenAI would be better at these real-world tasks than humans overall in a few months. At least some AI competitors could well be on similar trajectories.

The pace of change described in this new research may be the hardest challenge for business leaders. Consider the two-year cycle of Moore’s Law, which changed the world and inspired new corporate giants while dooming others. In retrospect, those were the days. John Chambers, who ran Cisco through the internet frenzy and its crash, said recently that 50% of executives “won’t have the skills to adjust to this new innovation economy driven by AI because they were trained to move at the speed of a five-year cycle as opposed to a 12-month cycle.” His warning to leaders is worth remembering: “With the speed the market is moving at now, you have to be able to reinvent yourself, which most CEOs and business leaders don’t know how to do—especially with AI.”—Geoff Colvin

Contact CEO Daily via Diane Brady at diane.brady@fortune.com

Advertisement
Top news

Israeli government approves Gaza deal as troops pull out

The IDF now has 24 hours to retreat to an agreed-upon line and Hamas has 72 hours to release all Israeli hostages. So far, events are going as planned and the mood is upbeat on both sides. Live coverage from the BBC here.

China places export controls on rare earth minerals

The new rules curb the supply chain for the semiconductors that are used in phones, computers, AI data centers, cars, solar panels, and other IT kit. China has a virtual monopoly in rare earths.

NY Attorney General indicted

Laetitia James is charged with bank fraud and making false statements. The prosecution is part of President Trump’s retribution plan: It was James who secured a $367 million fine against Trump in a civil suit (the fine was later reversed).

Make Argentina Great Again

Yes, the U.S. is bailing out Argentina. Treasury Secretary Scott Bessent confirmed that the Treasury has bought pesos to support the government of Trump ally President Javier Milei. The U.S. is also providing a $20 billion swap line to Argentina. (A swap line allows central banks to exchange fixed amounts of currency on the understanding that the swap will be reversed later and interest will be paid on the repaid currency.)

Moody’s chief economist: roughly half of U.S. states are contracting economically

Moody’s Analytics chief economist Mark Zandi exclusively told Fortune that nearly half of U.S. states are seeing their economies contract—and only 16 are experiencing growth. Zandi also noted that lower-income households are ““hanging on by their fingertips financially…and their world is going into recession pretty quickly.”

KPMG survey identifies quarter where AI sentiment changed

A new KPMG survey of 130 business leaders in companies making more than $1 billion annually found that the adoption of agentic AI technology has quadrupled in the past six months. A principle and aIQ program lead at the company told Fortune that the most recent quarter was when the “fear factor” surrounding the technology faded, leading to what she describes as “cognitive fatigue.”

Google restricts WFH to just 4 days per year

Google’s previous policy was to allow staff to work from anywhere for up to four weeks per year. The new rule says that a single WFH day will now count as an entire week. 

Federal workers will get back pay

U.S. House Speaker Mike Johnson said furloughed federal workers will get the wages they are owed once the shutdown ends.

The markets

S&P 500 futures were up 0.14% this morning. The index closed down 0.28% in its last session. STOXX Europe 600 was flat in early trading. The U.K.’s FTSE 100 was down 0.14% in early trading. Japan’s Nikkei 225 was down 1.01%. China’s CSI 300 was down 1.97%. The South Korea KOSPI was up 1.73%. India’s Nifty 50 was up 0.51% before the end of the session. Bitcoin held at $121.4K.

Advertisement
Around the watercooler
This email was sent to npmuhv8wju@niepodam.pl
Fortune Media
40 Fulton Street, New York, NY, 10038, United States