Episode	Podcast	Published	Duration	Status

How I AI

“Nobody wanted to do this work”: How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

November 17, 2025•47m•9,145 words

Description

Tim McAleer is a producer at Ken Burns’s Florentine Films who is responsible for the technology and processes that power their documentary production. Rather than using AI to generate creative content...

Summary

Tim McAleer, a producer at Ken Burns's Florentine Films, demonstrates how he built custom AI-powered tools to automate the tedious parts of documentary production. Rather than using AI for creative generation, Tim focuses on solving data management challenges: automatically extracting metadata from tens of thousands of archival images and videos, building iOS apps for field research, and creating OCR tools for historical documents. His approach shows how AI can eliminate manual data entry while maintaining journalistic accuracy through metadata guardrails and semantic search capabilities.

Jump to Topic

The Post-Production Data Management Problem

Tim explains the core challenge in documentary post-production: managing hundreds of hours of footage and tens of thousands of photos across multiple file types. For the Muhammad Ali series alone, they gathered 20,000 still images and over 100 hours of footage. His goal was to automate the manual data entry process that had been done for years.

•Documentary shooting ratios are extremely high - 8 hour show required 20,000+ stills and 100+ hours of footage
•Post-production involves managing images, archival footage, field footage, interviews, and transcripts across many file types
•Focus on AI as a tool for automation rather than content generation
•Manual data entry for metadata was the primary pain point to solve

Building the Initial AI Metadata Extraction System

Tim demonstrates live coding with Cursor and Claude to build a Python script that submits images to OpenAI for description. The breakthrough came when ChatGPT added image upload capability. He shows how to scrape embedded metadata from archival images and use it as guardrails to prevent AI hallucination, ensuring journalistic accuracy.

•Started with ChatGPT image upload feature - 'this thing can see' was a breakthrough moment
•Uses Super Whisper for voice-to-text to clean up vibe coding prompts
•Scrapes EXIF metadata from archival images (Library of Congress, etc.) to provide factual grounding
•Adding metadata as context dramatically improves accuracy - went from 'small rural main street' to 'main street of Cascade, Idaho, 1941 by Russell Lee'
•Switched between Claude models for coding and OpenAI for image analysis based on what worked first

Scaling to a Production REST API System

Tim shows his evolved system: a REST API running on a Mac mini that processes every asset added to their database. The five-step 'autolog' process gathers file specs, copies files, parses metadata, scrapes URLs for additional context, and generates descriptions. For video, he samples frames every 5 seconds using cheap models, then sends consolidated data to reasoning models.

•Built REST API that database software calls via webhooks for automated processing
•Five-step autolog process: gather info, copy file, parse metadata, scrape web, generate description
•Video processing uses frame sampling at 5-second intervals with GPT-4o Nano for cost efficiency
•Sends frame captions plus audio transcripts (via Whisper) to reasoning models for comprehensive video description
•System processes all file types: images, video, audio, maintaining consistent metadata standards

Vector Embeddings for Semantic Search

Beyond readable metadata, Tim generates vector embeddings using CLIP for images and OpenAI text models for descriptions, then fuses them. This enables semantic discovery - searching for 'dog' finds 'puppy' - replacing exact text search. The 'Find Similar' feature uses reverse image search within their collection to discover visually similar assets.

•Dual embedding approach: CLIP for image thumbnails, OpenAI for text descriptions, then fusion
•Semantic search solves the exact-match problem (dog vs puppy) that plagued traditional databases
•Reverse image search within collection finds similar assets based on visual similarity
•Freed researchers from manual data entry to focus on gathering more archival material
•Muhammad Ali project could scale to 25,000 images because automation eliminated copy-paste work

Flip Flop: iOS App for Field Research

Tim vibe-coded an iOS app called Flip Flop to solve field research chaos. Researchers photograph archive materials (front and back), but files get out of order. The app pairs fronts with backs, immediately transcribes text from the back, and embeds all metadata directly into the image file's EXIF data - making it portable across any system.

•Created PRD with ChatGPT during dog walk, built UI in one shot with Claude
•App captures front ('flip') and back ('flop') of archival images with structured file naming
•Immediately transcribes text from backs and embeds into EXIF metadata of original image
•Metadata travels with the file - usable in any app or computer that reads EXIF
•Colleagues returned from field with 1,400 images using the app - dramatically increased capture volume
•TAM is essentially 'two colleagues' - hyper-specific internal tooling

OCR Party: Selective Document Transcription

Tim built OCR Party, a Mac menu bar app for selectively transcribing parts of historical documents. Instead of OCR-ing entire newspapers, editors crop just the relevant article. The app handles poor quality images, creases, handwriting, and multiple languages - tasks traditional OCR engines fail at. Includes option for macOS Vision or AI API for user trust.

•Swift/Xcode build for Mac menu bar - enables precise cropping of document sections
•AI OCR handles degraded images, handwriting, old typefaces that break traditional OCR
•Can infer missing text (black marks, creases) while maintaining fact-checking standards
•Dual mode: macOS Vision or AI API to build user trust and provide options
•Enables translation of historical documents (17th century cursive letters) to searchable English text
•Processes thousands of documents efficiently by focusing only on relevant sections

Learning Philosophy: Creative Tools as Mental Model

Tim's approach to learning AI tools mirrors his experience with creative software like Photoshop and Premiere. Both require navigating complex menus via YouTube and Reddit. He emphasizes that creative professionals are better suited for vibe coding than they realize - coding now feels more like creation than technical work.

•Parallel between learning Cursor/Claude and mastering Photoshop/Premiere - both require community learning
•Start from knowing what's possible, then find the path (which is faster than ever)
•Creative professionals have the right mindset for vibe coding despite initial intimidation
•Coding tools now activate the same creative feeling as design tools
•Learning comes from Cursor YouTube, Reddit, and the 'vibe coding people of the Internet'

AI in Film: Practical Tools vs. Generation Concerns

Tim distinguishes between AI for tooling (ready today) and creation (not professional-grade yet). He's cautious about generating fake archival footage in nonfiction - a journalistic integrity issue. While acknowledging job displacement concerns in commercial video, he advocates learning the tools regardless. Video generation works well for storyboarding without displacing final production.

•Current AI video generation not professional-grade - can't match shot quality or footage consistency
•Critical concern: generating fake archival footage in documentaries violates journalistic standards
•PBS has guidelines against AI-generated historical content - 'definitely not doing that'
•Job displacement most concerning in commercial video production, less in documentary
•Best defense: learn the tools whether you like the direction or not - knowledge is power
•Safe use cases: storyboarding, pre-visualization, proving concepts before expensive shoots
•Commercial and VFX work will get easier - 'it's coming one way or another'

Prompting Technique: Resume Work and Starting Fresh

Tim's voice-mode prompting strategy involves being polite to AI and using 'resume work' prompts when stuck. He asks the AI to summarize everything for another developer, which reveals misunderstandings. This summary shows where communication broke down, allowing him to prune and restart in a fresh chat rather than fighting the same conversation.

•Always polite to AI models - 'I'm gonna be nice to all the models'
•When stuck, request a 'resume work prompt' with everything another AI dev would need
•The summarization reveals where the AI misunderstood the request
•Prune the resume prompt and start fresh rather than continuing failed conversation
•Avoids 'beating your head against the wall for twenty minutes' in same chat
•Voice mode requires different approach than typing - must say prompts out loud

How I AI

“Nobody wanted to do this work”: How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

0:00 / 0:00

View original episode →

Summary

Jump to Topic

The Post-Production Data Management Problem

•Documentary shooting ratios are extremely high - 8 hour show required 20,000+ stills and 100+ hours of footage
•Post-production involves managing images, archival footage, field footage, interviews, and transcripts across many file types
•Focus on AI as a tool for automation rather than content generation
•Manual data entry for metadata was the primary pain point to solve

Building the Initial AI Metadata Extraction System

•Started with ChatGPT image upload feature - 'this thing can see' was a breakthrough moment
•Uses Super Whisper for voice-to-text to clean up vibe coding prompts
•Scrapes EXIF metadata from archival images (Library of Congress, etc.) to provide factual grounding
•Adding metadata as context dramatically improves accuracy - went from 'small rural main street' to 'main street of Cascade, Idaho, 1941 by Russell Lee'
•Switched between Claude models for coding and OpenAI for image analysis based on what worked first

Scaling to a Production REST API System

•Built REST API that database software calls via webhooks for automated processing
•Five-step autolog process: gather info, copy file, parse metadata, scrape web, generate description
•Video processing uses frame sampling at 5-second intervals with GPT-4o Nano for cost efficiency
•Sends frame captions plus audio transcripts (via Whisper) to reasoning models for comprehensive video description
•System processes all file types: images, video, audio, maintaining consistent metadata standards

Vector Embeddings for Semantic Search

•Dual embedding approach: CLIP for image thumbnails, OpenAI for text descriptions, then fusion
•Semantic search solves the exact-match problem (dog vs puppy) that plagued traditional databases
•Reverse image search within collection finds similar assets based on visual similarity
•Freed researchers from manual data entry to focus on gathering more archival material
•Muhammad Ali project could scale to 25,000 images because automation eliminated copy-paste work

Flip Flop: iOS App for Field Research

•Created PRD with ChatGPT during dog walk, built UI in one shot with Claude
•App captures front ('flip') and back ('flop') of archival images with structured file naming
•Immediately transcribes text from backs and embeds into EXIF metadata of original image
•Metadata travels with the file - usable in any app or computer that reads EXIF
•Colleagues returned from field with 1,400 images using the app - dramatically increased capture volume
•TAM is essentially 'two colleagues' - hyper-specific internal tooling

OCR Party: Selective Document Transcription

•Swift/Xcode build for Mac menu bar - enables precise cropping of document sections
•AI OCR handles degraded images, handwriting, old typefaces that break traditional OCR
•Can infer missing text (black marks, creases) while maintaining fact-checking standards
•Dual mode: macOS Vision or AI API to build user trust and provide options
•Enables translation of historical documents (17th century cursive letters) to searchable English text
•Processes thousands of documents efficiently by focusing only on relevant sections

Learning Philosophy: Creative Tools as Mental Model

•Parallel between learning Cursor/Claude and mastering Photoshop/Premiere - both require community learning
•Start from knowing what's possible, then find the path (which is faster than ever)
•Creative professionals have the right mindset for vibe coding despite initial intimidation
•Coding tools now activate the same creative feeling as design tools
•Learning comes from Cursor YouTube, Reddit, and the 'vibe coding people of the Internet'

AI in Film: Practical Tools vs. Generation Concerns

•Current AI video generation not professional-grade - can't match shot quality or footage consistency
•Critical concern: generating fake archival footage in documentaries violates journalistic standards
•PBS has guidelines against AI-generated historical content - 'definitely not doing that'
•Job displacement most concerning in commercial video production, less in documentary
•Best defense: learn the tools whether you like the direction or not - knowledge is power
•Safe use cases: storyboarding, pre-visualization, proving concepts before expensive shoots
•Commercial and VFX work will get easier - 'it's coming one way or another'

Prompting Technique: Resume Work and Starting Fresh

•Always polite to AI models - 'I'm gonna be nice to all the models'
•When stuck, request a 'resume work prompt' with everything another AI dev would need
•The summarization reveals where the AI misunderstood the request
•Prune the resume prompt and start fresh rather than continuing failed conversation
•Avoids 'beating your head against the wall for twenty minutes' in same chat
•Voice mode requires different approach than typing - must say prompts out loud

How I AI

“Nobody wanted to do this work”: How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

0:00 / 0:00

“Nobody wanted to do this work”: How Emmy Award–winning filmmakers use AI to automate the tedious parts of documentaries

Description

Summary

Jump to Topic

The Post-Production Data Management Problem

Building the Initial AI Metadata Extraction System

Scaling to a Production REST API System

Vector Embeddings for Semantic Search

Flip Flop: iOS App for Field Research

OCR Party: Selective Document Transcription

Learning Philosophy: Creative Tools as Mental Model

AI in Film: Practical Tools vs. Generation Concerns

Prompting Technique: Resume Work and Starting Fresh

Navigate

Chat with Episode

Summary

Jump to Topic

The Post-Production Data Management Problem

Building the Initial AI Metadata Extraction System

Scaling to a Production REST API System

Vector Embeddings for Semantic Search

Flip Flop: iOS App for Field Research

OCR Party: Selective Document Transcription

Learning Philosophy: Creative Tools as Mental Model

AI in Film: Practical Tools vs. Generation Concerns

Prompting Technique: Resume Work and Starting Fresh

Navigate

Chat with Episode