🎙️ Whisper Analizador de Voz
Procesamiento de Audio · Transcripción Local · Multilenguaje
Let your LLM truly "hear" the soundwaves of the human world! Built on OpenAI's powerful open-source Whisper model engine, converting audio clips to high-quality text streams in milliseconds — entirely offline on your local hardware.
Equipo OpenClaw
🚀 Instalación Rápida
Ejecute el siguiente comando en su terminal para instalar:
npx clawhub install openai-whisper
📊 Resumen de Estadísticas
| ⭐ Estrellas | ☁️ Llamadas Totales | 👥 Usuarios Activos | 🎯 Versión Estable |
|---|---|---|---|
| 871 | 6.13M | 7,800 | v2.1.4 |
🎛️ Cómo Funciona
Unlike expensive per-minute cloud-based speech recognition services (like Azure / AWS), this plugin dominates with pure local brute-force algorithms:
- 💻 True Edge-side Engine Rendering: Completely free from internet restrictions. Pull
tiny,base, or evenlargeWhisper weight models onto your device and decode audio using host CPU / GPU memory — 100% protection for meeting confidentiality and personal recording privacy. - 🌐 99+ Language Support: Whether the speaker has a thick Indian-accented English or Chinese dialogue peppered with Japanese vocabulary, Whisper's generalization ability can precisely transcribe and record mixed-language phrases seamlessly.
- ⏱️ Auto-timestamping & SRT Attachment: Goes beyond plain text output. When rich format output is requested, it provides VTT / SRT timeline breakpoints accurate to the millisecond — perfect as a foundational pre-processing pipeline for fully automated subtitle video slicing.
- 🧹 Multi-file Error-tolerant Encapsulation: Automatically strips silent segments from input audio streams, and natively supports mp3, wav, m4a, ogg, and various other formats without manual FFmpeg re-encoding.
🧭 Casos de Uso Típicos
📝 Escenario 1: Ultimate Meeting Minutes Extractor
Integrated with internal workflows: after a three-hour international board meeting, simply drop the recorder's M4A file into a designated folder. The monitoring Agent mounts openai-whisper for full-speed decoding, then immediately calls the LLM to compress tens of thousands of words of chaotic dialogue into "Key Agenda Items" and "Who Spoke" Markdown tables, and pushes them to the entire company's Slack.
🤖 Escenario 2: Retro Hardware Voice Assistant (Siri Killer)
Mount the ultra-lightweight tiny.en model on a Raspberry Pi or similar IoT terminal as a persistent listening environment. No typing needed at home — just speak into the microphone, the plugin instantly converts to text and hands it to the LLM intent processor, achieving silky-smooth "streaming auditory feedback" home voice control.
💻 Referencia de Comandos
Después de la instalación, puede dejar que la IA los invoque de forma autónoma a través de la conversación, o activar operaciones manualmente desde la CLI:
Speed transcription mode — use the default base model for Chinese audio extraction:
clawhub execute openai-whisper file="./meeting_01.mp3" language="zh"
Cross-language translation — force the model to not just understand, but directly translate raw audio to English:
clawhub execute openai-whisper file="./french_interview.wav" task="translate"
Professional subtitles — output detailed SRT array structures with timestamps:
clawhub execute openai-whisper file="./podcast_raw.m4a" output_format="srt" model="large-v3"
🛡️ Requisitos y Rendimiento
- 🔧 Required Toolchain: This is a hardcore AI model module. Before execution, your host system must have
ffmpeg(for underlying audio decoding) and a workingpython3(to support native Whisper's inference pipeline). - 💻 Hardware Constraints: Running the
largetop-tier model on a thin laptop without GPU/CUDA acceleration may take as long or longer than the meeting duration itself. Low-spec machines should default tobaseorsmallweight levels.
© 2026 OpenClaw. All rights reserved.
