| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI, Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most in...
Anastasios Angelopoulos discusses Arena's journey from a Berkeley basement project to a $100M company that processes tens of millions of AI model comparisons monthly. The conversation covers Arena's unique incubation by a16z's Anjney Midha, their response to the "Leaderboard Illusion" controversy, and why they chose to spin out as a company rather than remain academic. Key insights include their platform economics (funding all inference at market rates), the viral success of Google's "Nano Banana" model launch, and their principles for maintaining leaderboard integrity while expanding into new verticals like Code Arena and Expert Arena.
Anjney Midha at a16z incubated Arena by forming an entity and providing grants before the team committed to starting a company, allowing them to walk away at any time. The decision to spin out came from realizing that scaling Arena's mission of measuring frontier AI on real-world usage required company resources beyond what academic or nonprofit structures could provide.
Arena raised $100M primarily to fund inference costs and scaling. The platform processes 250M+ total conversations with tens of millions monthly, making it one of the largest consumer LLM platforms after ChatGPT. They pay standard enterprise rates for inference, not special discounts, and fund all model calls for free user access.
A major use of funds was migrating from Gradio to React/Next.js. While Gradio scaled Arena to 1M MAU, custom features like loading icons with notifications became difficult to implement. The team also needed a more common tech stack for hiring developers.
The "Leaderboard Illusion" paper from Cohere researchers claimed Arena did undisclosed private testing that created leaderboard inequities. Arena's response pointed out factual errors: they claimed 9% open-source sampling when it was actually 60%, and the prerelease testing (like "Nano Banana") was always public and community-loved, not hidden.
Google's "Nano Banana" image model launch on Arena became a global sensation that changed Google's market share and roadmap. The discussion reveals how multimodal models, especially image generation, are becoming economically critical for marketing and content creation use cases, despite initial skepticism about their strategic importance.
Arena's public leaderboard is treated as a "charity" and loss leader - models cannot pay to be listed, cannot pay for better scores, and cannot pay to be removed. Every released model gets a statistically sound score based on millions of real user votes, maintaining the platform's role as the industry's North Star.
Arena is expanding into vertical-specific evaluations with Expert Arena showing model performance across medicine, legal, finance, and creative fields. Code Arena evaluates full agent harnesses, not just models. Video arena launching later in 2025, focusing on categories where they have sufficient expert user density.
Arena views every user as earned daily in the competitive consumer market. Persistent chat history after sign-in was a major retention driver. The team focuses on providing continuous value rather than chasing ChatGPT's scale, recognizing that consumer success requires both execution and luck (like viral Nano Banana moments).
[State of Evals] LMArena's $100M Vision — Anastasios Angelopoulos, LMArena
Ask me anything about this podcast episode...
Try asking: