Specifications
Size
397 MB (0.6B) to 2.5 GB (4B)
Architecture
GGUF LLM (Q4_K_M)
Latency
Fast (80-120 tok/s on Apple Silicon)
Function
Formatting & grammar cleanup
Developer / Creator
Alibaba Group / llama.cpp community
Download Source
Verified Repository Source
Hugging Face Hub (via tapWhisper downloader)
Open Model Repository (Qwen/Qwen2.5)Model Overview
Qwen 3 is a family of lightweight, high-performance language models (0.6B to 4B parameters) in GGUF format used for local text formatting. In tapWhisper, selecting STT + LLM formatting starts a persistent, localhost-only llama.cpp server. Qwen formats and cleans up raw speech output: adding punctuation, correcting grammar, and formatting code on-device.
Available Model Variants
| Model Name | File Size | RAM Usage | Format/Quant | Languages | Description |
|---|---|---|---|---|---|
| Apple Built-in Cleanup | 0 MB | 0 MB | System API | English | Built-in local text cleanup for basic grammar and spacing correction. |
| Small (Qwen 3 0.6B) ⭐ | 378 MB | 650 MB | Q4_K_M (GGUF) | Multilingual | Default recommended formatter. Lightning-fast grammar, punctuation, and coding layout. |
| Medium (Qwen 3 1.7B) | 1.03 GB | 1.5 GB | Q4_K_M (GGUF) | Multilingual | Enhanced local language parsing. Handles structural text reorganization. |
| Large (Qwen 3 4B) | 2.33 GB | 3.2 GB | Q4_K_M (GGUF) | Multilingual | Highest-accuracy offline text formatter. Requires a powerful Mac (8GB+ RAM). |