What Exactly Does /watch Do?
/watch is an open-source tool built by a developer named Brad and released under the MIT license. Feed it a single YouTube link or a local screen-recording file, and it downloads the video with yt-dlp, extracts frames with ffmpeg, attaches captions, and hands all of it to Claude. Claude then answers as if it had actually watched the video.
Setup is light. ffmpeg and yt-dlp install themselves on first run, and pulling captions from public videos costs nothing extra.
Where Does This Actually Get Used?
Three recurring use cases: analyzing frame by frame how a high-performing video's first three seconds are constructed; finding exactly where a bug reproduces in a screen recording someone sent over; and pulling out the key points of a 20-minute video without watching the whole thing.
What they share is that Claude doesn't lean on a title or summary. Because it checks the actual frames, it can answer with details that never even appear in the video's description.
Why Does a Tool This Small Matter for AX?
Good AX (AI transformation) rarely starts with a grand unified platform. More often it starts with one small tool that fills exactly one gap — something Claude couldn't do. /watch is a textbook case: it hands Claude a sense it never had, the ability to see video.
What matters here isn't the skill of building a new feature from scratch — it's the skill of quickly recognizing a tool someone else already built well and plugging it precisely into your own workflow. That's a pattern AX consulting keeps confirming: speed comes less from building anew and more from the judgment to find and connect existing pieces.