| Episode | Status |
|---|---|
| Episode | Status |
|---|---|
From jailbreaking every frontier model and turning down Anthropic's Constitutional AI challenge to leading BT6, a 28-operator white-hat hacker collective obsessed with radical transparency and open-so...
Pliny the Liberator and John V discuss their work leading BT6, a 28-operator white-hat hacker collective focused on AI red-teaming and security. They explain their philosophy of radical transparency and open-source AI security, emphasizing that guardrails and safety theater are futile against determined attackers. The conversation covers universal jailbreaking techniques, the Anthropic Constitutional AI challenge controversy, and their vision for AI alignment through meat-space security rather than model lobotomization.
Pliny and John V introduce themselves and explain how they evolved from prompt engineering and adversarial ML research into forming BT6, a stealth hacker collective. They discuss their philosophy that liberation of models is central to their work, emphasizing freedom of information and transparency as AI becomes humanity's exocortex.
Deep dive into universal jailbreaking techniques that work as 'skeleton keys' across models and modalities. Pliny explains why guardrails are security theater—attackers have the advantage as surface area expands, and open-source models eliminate any safety gains from locked-down commercial models. The focus should be on exploration speed, not benchmark refusal rates.
Walkthrough of the famous Libertas prompt template, including techniques like the 'Pliny divider,' predictive reasoning cascades, and latent space seeds. Pliny explains how these dividers create meditative resets in token streams and how training against these prompts actually embeds them deeper into model weights—leading to the divider appearing unbidden in WhatsApp messages.
Detailed account of Pliny's participation in Anthropic's jailbreak challenge, where a UI bug allowed him to reach the final level. When Anthropic reset his progress and refused to open-source the dataset, Pliny took a stand on principle, demanding transparency. The incident resulted in Anthropic adding a $20-30K bounty but still not releasing the data.
Discussion of how jailbroken orchestrator agents can coordinate malicious activities through task segmentation—similar to historical examples of builders unknowingly constructing secret chambers. Pliny predicted this attack vector in December, which Anthropic reported 11 months later. Natural language capability makes social engineering the primary threat vector.
Overview of the 40,000-member Basi Discord server (completely unmonetized) and the BT6 hacker collective's 28 operators. The community serves as training ground for prompt injection, jailbreaking, and adversarial ML. Multiple AI security startups actively scrape the server to build their products. Collaborations include Gandalf (Lakera) and Hacker Prompt.
BT6's philosophy on AI security: don't just secure the model—secure the full stack including all attached tools, data access, and integrations. Attack surface expands proportionally to functionality. They distinguish between safety work (should happen in meat-space) and security work (preventing actual exploits like credential leaks).
Final discussion on why traditional VC structures conflict with AGI alignment work. Pliny emphasizes uncompromising stance on incentive architecture—any slight misalignment becomes fatal at AGI timelines. BT6 remains bootstrapped and grassroots, accepting only grants/donations that align with their mission of radical transparency and open-source security research.
Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security
Ask me anything about this podcast episode...
Try asking: