Insights·2026-06-09

How Can Claude "Watch" a Video and Answer Questions About It?

Claude cannot play a video directly, but connecting the open-source tool /watch lets it break a YouTube link or a local screen recording down frame by frame, attach captions, and answer based on what actually appears on screen — not by guessing from a title or description.

What Exactly Does /watch Do?

/watch is an open-source tool built by a developer named Brad and released under the MIT license. Feed it a single YouTube link or a local screen-recording file, and it downloads the video with yt-dlp, extracts frames with ffmpeg, attaches captions, and hands all of it to Claude. Claude then answers as if it had actually watched the video.

Setup is light. ffmpeg and yt-dlp install themselves on first run, and pulling captions from public videos costs nothing extra.

Where Does This Actually Get Used?

Three recurring use cases: analyzing frame by frame how a high-performing video's first three seconds are constructed; finding exactly where a bug reproduces in a screen recording someone sent over; and pulling out the key points of a 20-minute video without watching the whole thing.

What they share is that Claude doesn't lean on a title or summary. Because it checks the actual frames, it can answer with details that never even appear in the video's description.

Why Does a Tool This Small Matter for AX?

Good AX (AI transformation) rarely starts with a grand unified platform. More often it starts with one small tool that fills exactly one gap — something Claude couldn't do. /watch is a textbook case: it hands Claude a sense it never had, the ability to see video.

What matters here isn't the skill of building a new feature from scratch — it's the skill of quickly recognizing a tool someone else already built well and plugging it precisely into your own workflow. That's a pattern AX consulting keeps confirming: speed comes less from building anew and more from the judgment to find and connect existing pieces.

Source: GitHub