I have a script that uses yt-dlp to get subtitles off a YouTube video and summarises the main points for me with a language model so that I don’t have to watch a 20 minute top10 list video that could’ve been a buzzfeed article.
The whole thing is fully vibe engineered too.







A 20€ plan of Claude Code for a month will teach you all of it.
It’s pretty much just scripting ffmpeg and feeding the screenshots to a local model via an API (or use a CLIP model)