Back to tapWhisper
Model Directory Profile

OpenAI Whisper GGML

16 variants

Specifications

Size 75 MB (Tiny) to 1.5 GB (Large)
Architecture Transformer Encoder-Decoder
Latency 1-3s for average dictation
Language 99+ languages

Developer / Creator

OpenAI (original weights), GGML / whisper.cpp community (quantized files)

Download Source

Verified Repository Source

Hugging Face Hub (via tapWhisper downloader)

Open Model Repository (ggerganov/whisper.cpp)

Model Overview

Whisper is OpenAI's state-of-the-art general-purpose speech recognition model. In tapWhisper, Whisper models run offline using whisper.cpp (GGML format) with full Metal GPU acceleration on Apple Silicon. Users can download different sizes (Base, Small, Medium, Large) from settings. It offers extreme multilingual accuracy and includes custom vocabulary prompting.

Available Model Variants

Model Name File Size RAM Usage Format/Quant Languages Description
Whisper Very Small 74 MB 180 MB Float16 (Full) Multilingual Fastest transcription speed, lower accuracy. Ideal for fast test queries.
Whisper Very Small Q5 31 MB 110 MB Q5_1 (Quantized) Multilingual Smallest quantized Whisper option. Ultra-low storage requirement.
Whisper Small 141 MB 300 MB Float16 (Full) Multilingual Balanced base model with decent accuracy for simple everyday sentences.
Whisper Small Q5 57 MB 180 MB Q5_1 (Quantized) Multilingual Quantized Whisper base model. Optimized memory and storage usage.
Whisper Medium ⭐ 547 MB 900 MB Q5_0 (Quantized) Multilingual Best speed-to-quality ratio. Recommended as the default offline model.
Whisper Very Small (English) 74 MB 180 MB Float16 (Full) English Fastest English-only dictation model. Low resource consumption.
Whisper Very Small Q5 (English) 31 MB 110 MB Q5_1 (Quantized) English Quantized English-only tiny model. Extremely lightweight.
Whisper Small (English) 141 MB 300 MB Float16 (Full) English Standard English-only base model for standard dictation.
Whisper Small Q5 (English) 57 MB 180 MB Q5_1 (Quantized) English Quantized English-only base model. High efficiency.
Whisper Standard 465 MB 850 MB Float16 (Full) Multilingual Standard model. Offers solid recognition accuracy for multiple languages.
Whisper Standard Q5 181 MB 450 MB Q5_1 (Quantized) Multilingual Quantized Whisper small model. Excellent balance of size and fidelity.
Whisper Standard (English) 465 MB 850 MB Float16 (Full) English Standard English-only model. Ideal for clean English speech dictation.
Whisper Standard Q5 (English) 181 MB 450 MB Q5_1 (Quantized) English Quantized English-only standard model. High memory efficiency.
Whisper Large (legacy) 1.43 GB 2.2 GB Float16 (Full) Multilingual Older large model with broad language coverage. High accuracy, heavy footprint.
Whisper Medium HQ 1.51 GB 2.3 GB Float16 (Full) Multilingual High-quality medium model (Turbo architecture). Outstanding accuracy.
Whisper Very Big 2.88 GB 4.2 GB Float16 (Full) Multilingual Maximum general accuracy. Heavy download, slower processing overhead.