Tech Brew // Morning Brew // Update
Rethinking AI benchmarks.

It’s Wednesday. Artificial intelligence: Does it measure up? Today we look at the tools to grade AI smarts.

In today’s edition:

Patrick Kulp, Jordyn Grzelewski, Annie Saunders

AI

A human hand reaching out toward a robot hand

Andriy Onufriyenko/Getty Images

A February paper from the European Commission may have marked one of the first times that the phrase “hella swag” has appeared in the lawmaking body’s policy research.

HellaSwag—the name is actually a mouthful of an acronym—is one of many go-to benchmarks that AI developers use to gauge the performance of their models. You’ve probably seen them listed like report cards in OpenAI or Google announcements.

These benchmarks influence how companies train and develop models. And with regulations like the European Union’s AI Act now a law, policymakers are also turning to some of these measurement schemes to classify AI systems.

The only problem? Experts say many of the widely used benchmarks leave a lot to be desired. They might be easily gamed or outdated, or do a bad job of taking stock of a model’s actual skills. They may evaluate capabilities that are largely irrelevant in the way people actually use the AI. And lately, as the pace of AI development has accelerated, researchers are having an increasingly hard time devising tests that AI can’t quickly master.

“If you look at [common benchmarks] from a technical perspective, they’re actually not that good,” Anka Reuel, a graduate fellow at the Stanford Institute for Human-Centered AI, told Tech Brew. “It’s kind of like the Wild West when it comes to benchmarks and actually, [evaluation] design more broadly, which is a huge issue right now. Because as a community, we never really put a focus on how to design them.”

Keep reading here.—PK

Presented By JumpCloud

FUTURE OF TRAVEL

Image of a Rivian truck

Hapabapa/Getty Images

Eliminating emissions from fossil fuel-burning cars, trucks, and SUVs is imperative to tackling the climate crisis.

One complementary solution is ditching large vehicles altogether—at least for the short trips that make up most people’s day-to-day travel.

That’s the premise behind Also, a micromobility startup that recently emerged from stealth with the backing of EV maker Rivian and $105 million in Series B funding from VC firm Eclipse.

Also’s ties with Rivian run deep. Rivian CEO and founder RJ Scaringe is one of Also’s co-founders, along with Chris Yu, who served as Rivian’s VP of future programs for over three years. Jiten Behl, a partner at Eclipse, is also a Rivian alum. The startup emerged from a skunkworks team at Rivian and is now taking that IP and venturing out on its own. Rivian holds a minority stake in Also.

“Where we started off was this joint observation/vision that the world, and particularly in the US at that point, was ready to do better than sitting in traffic, battling for parking spaces, etc., for the supermajority of trips that we all take in cars that are really short,” Yu told Tech Brew. “But the problem is there was nothing, and still isn’t anything, that existed that was a compelling user experience in the smaller-than-a-car EV space.”

Keep reading here.—JG

AI

The White House on a map of the US that's stylized like a computer chip.

Burcu Demir/Getty Images

What do Paul McCartney, New York Gov. Kathy Hochul, and Sam Altman have in common?

They are all among the many, many people who have proffered or co-signed opinions on what should be included in the Trump administration’s forthcoming strategy around AI.

The White House has vowed to craft a new AI Action Plan to replace former President Biden’s nixed executive order within around six months of the latter’s repeal in January. The administration invited public comment on the effort in February and received nearly 9,000 submissions before the deadline. Many of the groups and signatories behind those comments opted to share their letters publicly.

It seems that all corners of the business, culture, academic, and policy worlds have opinions, a testament to the breadth of impact AI policy stands to have.

Keep reading here.—PK

Together With Notion

BITS AND BYTES

Stat: 3 in 5. That’s how many physicians are using AI technology, Healthcare Brew reported, citing data from an American Medical Association survey.

Quote: “Ultimately, like there’s sometimes a disconnect between what the financial markets see and what is happening in reality on the ground…what we see is just relentless, overwhelming demand out there.”—Brannin McBee, CoreWeave’s co-founder and chief development officer, to Brew Markets for a story about the AI company’s IPO last week

Read: White House Starlink internet is a security minefield, experts warn (IT Brew)

Shape your AI future: Nearly 1,000 IT pros weigh in on AI security risks, IT sprawl, and shadow IT. Read JumpCloud’s latest report to discover how to gain the control you need to lead.*

*A message from our sponsor.

JOBS

Elevate your job search beyond the traditional channels. CollabWORK is where employers seek qualified candidates through trusted, community-based referrals. Let the power of community work for you, and click here to browse jobs curated especially for Tech Brew readers.

SHARE THE BREW

Share Tech Brew with your coworkers, acquire free Brew swag, and then make new friends as a result of your fresh Brew swag.

We’re saying we’ll give you free stuff and more friends if you share a link. One link.

Your referral count: 0

Click to Share

Or copy & paste your referral link to others:
emergingtechbrew.com/r/?kid=ee47c878

         
ADVERTISE // CAREERS // SHOP // FAQ

Update your email preferences or unsubscribe here.
View our privacy policy here.

Copyright © 2025 Morning Brew Inc. All rights reserved.
22 W 19th St, 4th Floor, New York, NY 10011