Every screenshot I take on a Mac gets named something like Screenshot 2026-03-24 at 11.00.40.png. That filename tells you exactly one useful thing — when it was taken. It tells you nothing about what’s in it.

My Screenshots folder had thousands of them. Finding a specific screenshot meant either remembering roughly when you took it, or opening them one by one. Neither is a great answer.

The obvious solution

Shortcuts. Someone had already built a solid 118-action implementation. I tried it. It mostly worked for text screenshots. But even when it worked, it was slow — 5 to 10 seconds per file. My workflow is take screenshot, go to folder, share. The file was never ready when I got there.

Apple Intelligence as a bridge

Apple Intelligence can describe images. In theory, perfect for this. In practice: it queues the request, processes it in the background, and there’s no reliable hook into when it’s done. You can’t say “rename this file once Apple Intelligence has finished describing it.” You can only check after the fact, which means polling, which means more latency.

Throwing out the stack

Why does renaming a file require a 118-action Shortcut, Apple Intelligence, and a chat app running in the background? It doesn’t. It’s a bash script.

screenshot-rename.sh
# OCR via macOS Vision framework ocrText=$(osascript -e " use framework \"Vision\" set request to current application's VNRecognizeTextRequest's alloc()'s init() return ocrResult " 2>/dev/null) # If text found, send to API for slug if [ -n "$ocrText" ] && [ ${#ocrText} -gt 10 ]; then slug=$(call_api --text "$ocrText") else # Resize to 300px, send to vision API sips -Z 300 "$inputFile" --out "$tmpImg" slug=$(call_api --image "$tmpImg") fi mv "$inputFile" "${datePrefix}-${slug}.png"

Hazelnut watches the Desktop. New screenshot appears, script runs, file renamed. Under a second for OCR, under two for vision. No Shortcuts, no Apple Intelligence queue, no chat app dependency.

How it works

The script runs two passes. First, it tries OCR using macOS’s native Vision framework via osascript. If there’s readable text in the screenshot — a UI, a terminal, a webpage — that text goes to the API and comes back as a slug like claude-code-settings-hooks.

If OCR comes back empty or too short to be meaningful, it falls back to vision: the image gets resized to 300px on its longest side via sips, then sent to the vision API as a base64 blob. The model describes what it sees and returns a slug.

The 300px finding

I originally sent full-resolution images to the vision API. Then I tried 512px, then 300px. The slugs are identical. A 300px thumbnail is enough for a vision model to describe the content of a typical screenshot. Resizing first cuts the API call from ~2s to under 1s on a good connection, and cuts token cost by roughly 90%.

Provider notes

The script works with any API that accepts a text prompt or base64 image and returns a string. I use Claude for both — claude-3-haiku-20240307 for OCR slugs (fast, cheap) and claude-3-5-sonnet-20241022 for vision (better descriptions). You can swap in any provider that fits the interface.

The prompt is short: Describe this screenshot in 4-6 words as a lowercase hyphenated slug. Return only the slug. Haiku follows it reliably. Sonnet always does.

The repo

Everything is at github.com/usefulish/screenshot-renamer. The README has setup instructions for both Hazelnut and Hazel. The script is about 80 lines, no dependencies beyond curl and osascript.