YouTube Transcript Downloader

Extract clean, readable transcripts from YouTube videos in seconds. Perfect for studying tutorials, converting talks into notes, or creating written content from video sources.

Essential for learners, researchers, and content creators who want to reference video content in text form without manual transcription.

Core Functionality

Two-mode operation:

Subtitle Download (fast) - Grabs existing captions
Whisper Transcription (slow but comprehensive) - Generates transcript from audio

Automatic deduplication removes the repetitive lines common in VTT subtitle formatting.

How It Works

Intelligent Fallback System

Primary Method:

Downloads official subtitles via yt-dlp
Checks for multiple language options
Preserves timing and structure
Lightning fast (seconds)

Fallback Method:

Downloads audio with yt-dlp
Transcribes with OpenAI Whisper
Works on any video (even without captions)
Slower (minutes depending on video length)

Post-Processing:

Removes duplicate lines from VTT format
Cleans up formatting artifacts
Creates readable paragraph structure
Names file using video title

Usage Examples

Basic download:

Download the transcript for https://youtube.com/watch?v=dQw4w9WgXcQ

Specify language:

Get the Spanish transcript from this video: [URL]

Batch processing:

Download transcripts for these tutorial videos:
- https://youtube.com/watch?v=tutorial1
- https://youtube.com/watch?v=tutorial2
- https://youtube.com/watch?v=tutorial3

Output Format

Generated filename: [Video Title] - Transcript.txt

Content structure:

[Clean paragraph-style transcript with duplicate lines removed]

Natural reading flow without VTT timestamp artifacts.
Preserves sentence structure and logical breaks.

Installation

Required (core functionality):

BASH

# macOS
brew install yt-dlp

# Ubuntu/Debian
sudo apt install yt-dlp

# pip (cross-platform)
pip3 install yt-dlp

Optional (for videos without subtitles):

BASH

pip3 install openai-whisper

Whisper Model Sizes:

tiny - 1GB, fast but less accurate
base - 1GB, good balance
small - 2GB, better accuracy
medium - 5GB, high quality
large - 10GB, best quality (slow)

For most use cases, base or small models provide excellent results.

Technical Details

Subtitle Detection:

BASH

yt-dlp --list-subs [URL]

Shows available caption languages before download.

Download Process:

BASH

yt-dlp --write-auto-sub --skip-download [URL]

Grabs auto-generated or manual subtitles without downloading video.

Whisper Fallback:

BASH

yt-dlp -x --audio-format mp3 [URL]
whisper audio.mp3 --model base

Extracts audio and generates transcript using AI.

Deduplication Algorithm

Problem: VTT subtitle files contain duplicates for accessibility

00:00:01 --> 00:00:03
Welcome to this tutorial

00:00:01 --> 00:00:03
Welcome to this tutorial

00:00:03 --> 00:00:05
Today we'll learn about React

Solution: Python script removes consecutive duplicates

Welcome to this tutorial
Today we'll learn about React

Best Practices

Do:

Use subtitle download when available (faster)
Choose appropriate Whisper model for speed/quality tradeoff
Review transcript for technical terms (AI may misinterpret jargon)
Respect video creator's copyright

Don't:

Download transcripts from copyrighted content for commercial redistribution
Assume 100% accuracy (especially with auto-generated subtitles)
Use largest Whisper model unless quality is critical (very slow)
Skip checking if subtitles already exist before using Whisper

Common Use Cases

For Students:

Tutorial videos → Study notes
Lecture recordings → Reference material
Conference talks → Written summaries

For Researchers:

Interview videos → Analysis data
Documentary content → Citations
Expert talks → Quote extraction

For Content Creators:

Competitor analysis → Written breakdowns
Video scripts → Blog post foundations
Podcast episodes → Show notes

Language Support

Subtitle download supports:

All languages offered by YouTube creators
Auto-generated captions in major languages
Manual captions in creator-specified languages

Whisper transcription supports:

99+ languages with varying accuracy
Best performance on English
Automatic language detection
Translations available with --task translate

Integration with Tapestry

Complete Learning Workflow

YouTube Transcript is part of the Tapestry ecosystem:

Standalone usage: Extract transcripts for any purpose

With Tapestry orchestration:

YouTube Transcript - Get clean text ← You are here
Ship-Learn-Next - Convert to action plan
Ship - Build something concrete

One command for full workflow:

tapestry https://youtube.com/watch?v=example

Performance

Subtitle download:

Speed: 5-15 seconds (network dependent)
Accuracy: Varies (auto-captions vs. manual)
Cost: Free

Whisper transcription:

Speed:
- tiny model: 2-5x real-time
- base model: 1-2x real-time
- large model: 0.5x real-time (slower than video)
Accuracy: Excellent (especially medium+)
Cost: Free (runs locally)

Recommendation: Try subtitles first, use Whisper only when needed.

Troubleshooting

"No subtitles available" error:

Check yt-dlp --list-subs [URL] to verify
Install Whisper for fallback transcription
Some videos genuinely have no captions

Whisper fails with memory error:

Use smaller model (tiny or base)
Close other applications
Process shorter video segments

Duplicate lines not removed:

Check Python is installed (needed for deduplication script)
Manually clean with find/replace in text editor

Wrong language downloaded:

Specify language: yt-dlp --write-sub --sub-lang es [URL]
List available languages: yt-dlp --list-subs [URL]

Advanced Options

Download specific subtitle format:

BASH

yt-dlp --write-sub --sub-format vtt/srt [URL]

Translate to English:

BASH

whisper audio.mp3 --model base --task translate

Keep timestamps:

BASH

yt-dlp --write-sub --skip-download [URL]
# Use raw VTT file with timestamps preserved

About This Skill

This skill was created by michalparkola as part of the Tapestry Skills for Claude Code collection.

Philosophy: Transform passive video watching into active learning by extracting transcripts that can be turned into action plans, study guides, or reference materials.

Tools used: yt-dlp (subtitle download), OpenAI Whisper (transcription), Python (text processing)

Downloads and processes YouTube transcripts for readability, with automatic deduplication and fallback to Whisper transcription for videos without subtitles.