video-skill-transcriber

Skill from JackMeds/Video-Skill-Transcriber

Video-Skill-Transcriber 🧠

The cure for your "Watch Later" backlog. Let AI binge-watch those thousands of saved videos for you, turning them into summaries and knowledge.

License Python Release

δΈ­ζ–‡θ―΄ζ˜Ž (Chinese README)


πŸ“– Table of Contents


The Problem: Information Overload

Have you ever looked at your YouTube "Watch Later" or Bilibili "Favorites" list and felt anxiety?

You've saved thousands of high-quality tutorials, lectures, and talks, thinking "I'll learn this later." But "later" never comes because watching video is time-consuming.

Video-Skill-Transcriber is the solution. It autonomously batches download and transcribes your backlog, converting hours of video into structured text that AI can digest in seconds.

Turn "Watch Later" into "Knowledge Acquired".

Features

FeatureDescriptionNote
Universal DownloadSupports YouTube, Bilibili, TikTok, etc.Powered by yt-dlp
Video UnderstandingGemini 1.5 Pro/Flash reads video directlyNew (Requires Key)
Multi-Engine ASRWhisper (Local), Qwen3 (Chinese Optimized), OpenAI APIOffline & Online support
API ServerFastAPI interface for remote callsNew
Batch PipelineAuto-fetch "Watch Later" -> Download -> TranscribeCore Feature
Privacy FirstCredentials and Inference run 100% LocallySafe for private lists
Agent ReadyStandardized Skill Definition for Claude/GPTAutomate the process

Installation

  1. Clone or Download ZIP:

    BASH
    git clone https://github.com/JackMeds/Video-Skill-Transcriber.git
    # Or download ZIP from Release page
    cd Video-Skill-Transcriber
    
  2. Install dependencies:

    BASH
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    

    (Requires FFmpeg installed)

  3. Update: Run the self-update tool (works for both Git and ZIP installs):

    BASH
    python -m tools.update_skill
    

Method 2: Install to Agent (e.g., OpenClaw)

To integrate this skill into an existing Agent environment:

BASH
python install.py --target /path/to/.agent/skills

This creates a symlink, ensuring your Agent always uses the latest code.

  1. (Optional) Configure API: Copy .env.example to .env if you want to use Online Transcription.

Usage

1. General Download

BASH
python -m tools.download "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

2. Transcribe / Video Understanding

BASH
# Local Whisper (Default)
python -m tools.transcribe "output/video.m4a"

# Local Qwen3-ASR (Best for Chinese)
python -m tools.transcribe "output/video.m4a" -m Qwen/Qwen3-ASR-0.6B

# Multimodal AI (Gemini 1.5) - Reads video directly
python -m tools.transcribe "output/video.mp4" -m gemini

# Online API (Fastest)
python -m tools.transcribe "output/video.m4a" -m openai

3. Start API Server

Allow remote Agents to use these tools via HTTP:

BASH
python -m tools.api_server
# Docs: http://localhost:8000/docs

Bilibili Workflow

We support both Public and Authenticated modes.

Mode 1: Public Access (Default)

For standard public videos, no login is required. Just use the download tool directly.

BASH
python -m tools.download "https://www.bilibili.com/video/BVxxx"

Mode 2: Authenticated (Advanced)

Login is required ONLY if you want to:

  1. Access your private "Watch Later" or "Favorites" lists.
  2. Download 1080P+ / Premium quality videos.

Steps:

  1. Login via QR Code:

    BASH
    python -m tools.auth
    

    (Session is saved locally to .user_session.json)

  2. Process Backlog: Once logged in, you can fetch your private lists:

    BASH
    # 1. Fetch Top 10 from Watch Later
    python -m tools.list --watch-later --limit 10
    
    # 2. Run the pipeline
    python -m tools.batch_run
    

For AI Agents (Skills)

Give skills/VIDEO_SKILL.md to your AI Agent (Claude/ChatGPT). It will learn to use these tools autonomously.

License

MIT License