Back to posts

AI scientists & more (May 2, 2025)

David Pawlan

David Pawlan

Co-Founder

May 2, 2025
AI scientists & more (May 2, 2025)

Hey y’all,

From AI scientists to shady leaderboard tactics and mini models beating giants, this week’s AI news is full of shake-ups. Microsoft’s new reasoning models are surprisingly powerful (and phone-sized), Claude got a serious productivity upgrade, and decentralized models might just be the next big thing.

Let’s break it all down — fast.

🧠 Model Mayhem & Benchmark Battles

🏆 LMArena’s leaderboard credibility questioned
A damning study from Cohere Labs, MIT, and Stanford suggests the popular LMArena leaderboard might be rigged in favor of tech giants like OpenAI and Google. Allegations include private testing, silent model removals, and biased sampling. LMArena denies wrongdoing, but the episode casts doubt on benchmark integrity — just as Llama 4 Maverick’s drama fades.


Why it matters: Leaderboards shape perception — and funding. If they’re gamed, the whole AI model race loses meaning.

🧠 Microsoft and Anthropic go small, go smart
Microsoft’s new Phi-4 models show that small can be mighty. The flagship 14B-parameter Phi-4-reasoning outpaces larger models like o1-mini and even holds up against DeepSeek's 671B titan. Meanwhile, Anthropic’s new Claude Integrations eliminate the complexity of MCPs, letting Claude plug into apps like Zapier or Square and fetch live data or web results for 45 minutes.


Why it matters: Power is shifting from bloated models to nimble, task-specific ones that run on your laptop or smartphone — no data center required.

🔬 Agents, Apps, and Edge AI

🔬 AI scientists enter the chat
FutureHouse launched “AI scientists” — agents that can review research, answer deep scientific questions, and in one case (hello, Phoenix), help you design new chemistry experiments from scratch. This push into public-facing research agents is backed by none other than Eric Schmidt.

Why it matters: It’s a glimpse into a future where AI doesn’t just summarize papers — it creates the next breakthroughs.

🌐 A new kind of AI model: decentralized and user-owned
Vana and Flower Labs are teaming up to build a “user-owned” large language model, Collective-1. The model is powered by volunteered compute and personal data, with the goal of reaching 100B parameters.


Why it matters: This decentralized approach could let smaller players compete with tech giants — and give users control over their data (finally).

🧰 Tools to Try

New this week:

  • Scrybe: Helps you grow on LinkedIn in under 5 minutes a day
  • Freebeat: Instantly turn your ideas into viral music videos
  • Spring: Build tailored business apps instantly with AI
  • Guidde: Turn PDFs or screen recordings into quick how-to videos

⚡ Quick Hits

🎶 Suno v4.5 debuts 8-minute AI songs and better genre control

📻 Australian radio ran an AI host for 6 months — no one noticed

🕵️ Google’s AMIE now reads medical images during diagnosis

🛰️ AI is now uncannily good at guessing where a photo was taken

🔁 Prompt of the Day

Conduct Recursive Research Iterations
Prompt: Act as a recursive research optimizer. Analyze results, identify gaps, refine searches, and repeat until you hit max-quality insights.

Use it when you're deep in research mode and want to push past surface-level summaries.

TL;DR

Benchmark trust is breaking down, Microsoft and Anthropic are proving small can be powerful, and AI scientists are moving from theory to hands-on discovery. Meanwhile, decentralized models and Claude’s new integrations offer a peek at AI’s more open, connected future.

Catch you on the next iteration,
—David

Byte-Sized May 1, 2025
Newsletter Signup

Subscribe to our newsletter

AI Industry Insights

Read by 10,000+ AI professionals and builders.

Related Posts

AI gets credit cards & more (May 1, 2025)

AI gets credit cards & more (May 1, 2025)

Visa, Mastercard, and PayPal launch AI shopping tools. GPT-4o gets rolled back for being too agreeable. China and Amazon drop powerful new models.

Reddit bots, privacy & more (April 30, 2025)

Reddit bots, privacy & more (April 30, 2025)

Reddit’s AI scandal raises red flags, Meta launches a Llama-fueled social AI app, and anyone can build with no-code tools. It’s all happening this week in AI.

AI goes shopping & more (April 29, 2025)

AI goes shopping & more (April 29, 2025)

GPT-4o gets too agreeable, Qwen3 drops open-source, ChatGPT adds shopping smarts, and a new AI startup wants to build actual starships.