| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-50...
Pim de Witte, CEO of General Intuition (GI), discusses spinning out from Medal (a 12M-user game clipping platform) to build world models trained on 3.8B gameplay highlights. After turning down OpenAI's reported $500M offer for Medal's data, GI raised a $134M seed from Khosla Ventures—Vinod's largest bet since OpenAI. The conversation covers their vision-based agents achieving superhuman gameplay through imitation learning, their unique action-labeled dataset that preserves privacy, and their strategy to become the foundation model for spatial reasoning across gaming, simulation, and eventually robotics.
Pim demonstrates GI's vision-based agents playing Counter-Strike purely from pixels, showing progression from 4 months ago to current superhuman performance. The agents exhibit human-like behaviors (checking scoreboards, getting unstuck) while also displaying peak performance from training on highlight clips. He then shows world models with unique features like mouse sensitivity, spatial memory over 20-second generations, and the ability to handle partial observability (smoke grenades).
Pim explains Medal's 10-year evolution from a simple recorder to a 12M-user platform with more active creators than Twitch. The key innovation was retroactive clipping—running a background recorder that exports only the last 30 seconds when you hit a button, similar to Tesla's bug reporting. This approach captured authentic gameplay without changing player behavior, creating the foundation for GI's unique dataset.
GI's critical design decision was logging actions (jump, crouch, shoot) rather than raw keystrokes (W, A, S, D), solving privacy concerns while creating superior training data. They employed thousands of humans to label every possible action across games over 18 months. This approach converts inputs to semantic actions, making the data usable for training while preventing individual tracking.
After reading papers like Diamond, Genie, and SIMA, Pim cold-emailed research teams and chose to build independently rather than join a lab. Vinod Khosla's investment process involved defending a 2030 vision from first principles under intense technical questioning. The team includes Diamond paper authors and Anthony Hu from GAIA-2, with GI able to publish openly due to their data moat.
Pim explains why games provide superior training data compared to simulation or YouTube videos. Simulation complexity explodes with number of agents, degrees of freedom, and information revealed per action. Games already contain the stochasticity and edge cases, while YouTube requires solving pose estimation, inverse dynamics, and optical dynamics—three layers of information loss.
GI's initial customers are game developers and engine companies, replacing deterministic behavior trees with a single API: stream frames, get actions back. The goal is moving pre-training to post-training for robotics companies—if your robot uses game controller inputs, GI can provide the foundation model, requiring only 1-10% of typical training data for specialization.
GI's research roadmap involves making all 3.8B clips on Medal playable inside world models, enabling the transition from imitation learning to reinforcement learning. Each clip represents episodic memory—the most out-of-distribution moments from hours of gameplay. By loading negative events (crashes, failures) into world models with ground-truth action labels, they can train reward models at unprecedented scale.
By 2030, GI aims to be the gold standard for spatial-temporal intelligence, powering 80% of atoms-to-atoms AI interactions (robotics, physical world) and 100x more in simulation. The bet is that supply chains will converge on gaming inputs as the standard interface because intelligence is the bottleneck, not hardware. Simulation will see faster adoption due to fewer safety constraints, with scientific use cases as a key focus.
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
Ask me anything about this podcast episode...
Try asking: