Specifications
Size
75 MB (Tiny) to 1.5 GB (Large)
Architecture
Transformer Encoder-Decoder
Latency
1-3s for average dictation
Language
99+ languages
Developer / Creator
OpenAI (original weights), GGML / whisper.cpp community (quantized files)
Download Source
Verified Repository Source
Hugging Face Hub (via tapWhisper downloader)
Open Model Repository (ggerganov/whisper.cpp)Model Overview
Whisper is OpenAI's state-of-the-art general-purpose speech recognition model. In tapWhisper, Whisper models run offline using whisper.cpp (GGML format) with full Metal GPU acceleration on Apple Silicon. Users can download different sizes (Base, Small, Medium, Large) from settings. It offers extreme multilingual accuracy and includes custom vocabulary prompting.
Available Model Variants
| Model Name | File Size | RAM Usage | Format/Quant | Languages | Description |
|---|---|---|---|---|---|
| Whisper Very Small | 74 MB | 180 MB | Float16 (Full) | Multilingual | Fastest transcription speed, lower accuracy. Ideal for fast test queries. |
| Whisper Very Small Q5 | 31 MB | 110 MB | Q5_1 (Quantized) | Multilingual | Smallest quantized Whisper option. Ultra-low storage requirement. |
| Whisper Small | 141 MB | 300 MB | Float16 (Full) | Multilingual | Balanced base model with decent accuracy for simple everyday sentences. |
| Whisper Small Q5 | 57 MB | 180 MB | Q5_1 (Quantized) | Multilingual | Quantized Whisper base model. Optimized memory and storage usage. |
| Whisper Medium ⭐ | 547 MB | 900 MB | Q5_0 (Quantized) | Multilingual | Best speed-to-quality ratio. Recommended as the default offline model. |
| Whisper Very Small (English) | 74 MB | 180 MB | Float16 (Full) | English | Fastest English-only dictation model. Low resource consumption. |
| Whisper Very Small Q5 (English) | 31 MB | 110 MB | Q5_1 (Quantized) | English | Quantized English-only tiny model. Extremely lightweight. |
| Whisper Small (English) | 141 MB | 300 MB | Float16 (Full) | English | Standard English-only base model for standard dictation. |
| Whisper Small Q5 (English) | 57 MB | 180 MB | Q5_1 (Quantized) | English | Quantized English-only base model. High efficiency. |
| Whisper Standard | 465 MB | 850 MB | Float16 (Full) | Multilingual | Standard model. Offers solid recognition accuracy for multiple languages. |
| Whisper Standard Q5 | 181 MB | 450 MB | Q5_1 (Quantized) | Multilingual | Quantized Whisper small model. Excellent balance of size and fidelity. |
| Whisper Standard (English) | 465 MB | 850 MB | Float16 (Full) | English | Standard English-only model. Ideal for clean English speech dictation. |
| Whisper Standard Q5 (English) | 181 MB | 450 MB | Q5_1 (Quantized) | English | Quantized English-only standard model. High memory efficiency. |
| Whisper Large (legacy) | 1.43 GB | 2.2 GB | Float16 (Full) | Multilingual | Older large model with broad language coverage. High accuracy, heavy footprint. |
| Whisper Medium HQ | 1.51 GB | 2.3 GB | Float16 (Full) | Multilingual | High-quality medium model (Turbo architecture). Outstanding accuracy. |
| Whisper Very Big | 2.88 GB | 4.2 GB | Float16 (Full) | Multilingual | Maximum general accuracy. Heavy download, slower processing overhead. |