AI scientists & more (May 2, 2025)

Hey y’all,

From AI scientists to shady leaderboard tactics and mini models beating giants, this week’s AI news is full of shake-ups. Microsoft’s new reasoning models are surprisingly powerful (and phone-sized), Claude got a serious productivity upgrade, and decentralized models might just be the next big thing.

Let’s break it all down — fast.

🧠 Model Mayhem & Benchmark Battles

🏆 LMArena’s leaderboard credibility questioned
A damning study from Cohere Labs, MIT, and Stanford suggests the popular LMArena leaderboard might be rigged in favor of tech giants like OpenAI and Google. Allegations include private testing, silent model removals, and biased sampling. LMArena denies wrongdoing, but the episode casts doubt on benchmark integrity — just as Llama 4 Maverick’s drama fades.

Why it matters: Leaderboards shape perception — and funding. If they’re gamed, the whole AI model race loses meaning.

🧠 Microsoft and Anthropic go small, go smart
Microsoft’s new Phi-4 models show that small can be mighty. The flagship 14B-parameter Phi-4-reasoning outpaces larger models like o1-mini and even holds up against DeepSeek's 671B titan. Meanwhile, Anthropic’s new Claude Integrations eliminate the complexity of MCPs, letting Claude plug into apps like Zapier or Square and fetch live data or web results for 45 minutes.

A glowing light blue microchip labeled "Phi-4 Reasoning" floats in a cloud of light, surrounded by soft, glowing orbs

Why it matters: Power is shifting from bloated models to nimble, task-specific ones that run on your laptop or smartphone — no data center required.

🔬 Agents, Apps, and Edge AI

🔬 AI scientists enter the chat
FutureHouse launched “AI scientists” — agents that can review research, answer deep scientific questions, and in one case (hello, Phoenix), help you design new chemistry experiments from scratch. This push into public-facing research agents is backed by none other than Eric Schmidt.

AI humanoid in a white lab coat types on a computer in a dimly lit science lab

Why it matters: It’s a glimpse into a future where AI doesn’t just summarize papers — it creates the next breakthroughs.

🌐 A new kind of AI model: decentralized and user-owned
Vana and Flower Labs are teaming up to build a “user-owned” large language model, Collective-1. The model is powered by volunteered compute and personal data, with the goal of reaching 100B parameters.

Why it matters: This decentralized approach could let smaller players compete with tech giants — and give users control over their data (finally).

🧰 Tools to Try

New this week:

Scrybe: Helps you grow on LinkedIn in under 5 minutes a day
Freebeat: Instantly turn your ideas into viral music videos
Spring: Build tailored business apps instantly with AI
Guidde: Turn PDFs or screen recordings into quick how-to videos

⚡ Quick Hits

🎶 Suno v4.5 debuts 8-minute AI songs and better genre control

📻 Australian radio ran an AI host for 6 months — no one noticed

🕵️ Google’s AMIE now reads medical images during diagnosis

🛰️ AI is now uncannily good at guessing where a photo was taken

🔁 Prompt of the Day

Conduct Recursive Research Iterations
Prompt: Act as a recursive research optimizer. Analyze results, identify gaps, refine searches, and repeat until you hit max-quality insights.

Use it when you're deep in research mode and want to push past surface-level summaries.

TL;DR

Benchmark trust is breaking down, Microsoft and Anthropic are proving small can be powerful, and AI scientists are moving from theory to hands-on discovery. Meanwhile, decentralized models and Claude’s new integrations offer a peek at AI’s more open, connected future.

Catch you on the next iteration,
—David