🎙️ Whisper Analizador de Voz

Procesamiento de Audio · Transcripción Local · Multilenguaje

Let your LLM truly "hear" the soundwaves of the human world! Built on OpenAI's powerful open-source Whisper model engine, converting audio clips to high-quality text streams in milliseconds — entirely offline on your local hardware.

Equipo OpenClaw

🚀 Instalación Rápida

Ejecute el siguiente comando en su terminal para instalar:

npx clawhub install openai-whisper

📊 Resumen de Estadísticas

⭐ Estrellas	☁️ Llamadas Totales	👥 Usuarios Activos	🎯 Versión Estable
871	6.13M	7,800	v2.1.4

🎛️ Cómo Funciona

Unlike expensive per-minute cloud-based speech recognition services (like Azure / AWS), this plugin dominates with pure local brute-force algorithms:

💻 True Edge-side Engine Rendering: Completely free from internet restrictions. Pull tiny, base, or even large Whisper weight models onto your device and decode audio using host CPU / GPU memory — 100% protection for meeting confidentiality and personal recording privacy.
🌐 99+ Language Support: Whether the speaker has a thick Indian-accented English or Chinese dialogue peppered with Japanese vocabulary, Whisper's generalization ability can precisely transcribe and record mixed-language phrases seamlessly.
⏱️ Auto-timestamping & SRT Attachment: Goes beyond plain text output. When rich format output is requested, it provides VTT / SRT timeline breakpoints accurate to the millisecond — perfect as a foundational pre-processing pipeline for fully automated subtitle video slicing.
🧹 Multi-file Error-tolerant Encapsulation: Automatically strips silent segments from input audio streams, and natively supports mp3, wav, m4a, ogg, and various other formats without manual FFmpeg re-encoding.

🧭 Casos de Uso Típicos

📝 Escenario 1: Ultimate Meeting Minutes Extractor

Integrated with internal workflows: after a three-hour international board meeting, simply drop the recorder's M4A file into a designated folder. The monitoring Agent mounts openai-whisper for full-speed decoding, then immediately calls the LLM to compress tens of thousands of words of chaotic dialogue into "Key Agenda Items" and "Who Spoke" Markdown tables, and pushes them to the entire company's Slack.

🤖 Escenario 2: Retro Hardware Voice Assistant (Siri Killer)

Mount the ultra-lightweight tiny.en model on a Raspberry Pi or similar IoT terminal as a persistent listening environment. No typing needed at home — just speak into the microphone, the plugin instantly converts to text and hands it to the LLM intent processor, achieving silky-smooth "streaming auditory feedback" home voice control.

💻 Referencia de Comandos

Después de la instalación, puede dejar que la IA los invoque de forma autónoma a través de la conversación, o activar operaciones manualmente desde la CLI:

Speed transcription mode — use the default base model for Chinese audio extraction:

clawhub execute openai-whisper file="./meeting_01.mp3" language="zh"

Cross-language translation — force the model to not just understand, but directly translate raw audio to English:

clawhub execute openai-whisper file="./french_interview.wav" task="translate"

Professional subtitles — output detailed SRT array structures with timestamps:

clawhub execute openai-whisper file="./podcast_raw.m4a" output_format="srt" model="large-v3"

🛡️ Requisitos y Rendimiento

🔧 Required Toolchain: This is a hardcore AI model module. Before execution, your host system must have ffmpeg (for underlying audio decoding) and a working python3 (to support native Whisper's inference pipeline).
💻 Hardware Constraints: Running the large top-tier model on a thin laptop without GPU/CUDA acceleration may take as long or longer than the meeting duration itself. Low-spec machines should default to base or small weight levels.

🔗 Ver Código en GitHub