
OfflineGPT
On-device AI for Android — chat, image generation, voice, TTS, and more. No cloud backend, no subscription.
iOS version: Offline AI Studio on the App Store
What's new
Recent additions in v2.9.x
NPU image generation
Stable Diffusion now runs on the Qualcomm Hexagon NPU on supported Snapdragon devices, with dedicated "NPU" model variants in the image model manager.
Text-to-speech
Answers can be read aloud — via system voice zero setup, or the optional Supertonic 3 neural engine with streaming TTS during generation.
Gemma 4 E2B
Google's multimodal on-device model: text, image, and audio understanding in one conversation. Runs via LiteRT-LM with optional web search.
What OfflineGPT can do
Android 8.0+, 64-bit ARM. Models run on-device — CPU, GPU (Vulkan/OpenCL), or Snapdragon NPU.
Offline by default
Chats, generations, and voice input stay on your device. No cloud AI backend, no ads, no tracking.
Gemma 4 E2B — multimodal
Google's on-device model understands text, images, and audio in one conversation. Optional web search when you want it.
Thinking models
Qwen3 (0.6B – 4B) and SmolLM3-3B show step-by-step reasoning before answering. Good for problems that need a second look.
Vision & image generation
Describe images with InternVL or SmolVLM2. Generate art with Stable Diffusion (Anything V5, AbsoluteReality, ChilloutMix, CuteYukiMix).
Snapdragon NPU acceleration
Image generation runs on the Hexagon NPU on supported Snapdragon devices — faster, cooler, less battery drain.
Whisper voice input
Dictate prompts in any language with on-device speech recognition. A compact model ships with the app; a higher-quality one is an optional download.
Read aloud / TTS
Answers are read back via system voice or the optional Supertonic 3 neural engine — 31 languages, 10 voices, streaming during generation.
AI writing tools
Summarize, grammar, tone, simplify, social posts. Pro adds document processing, email drafting, and Idea Blast brainstorming.
Code Lab
Prototype HTML, CSS, and JavaScript with a built-in live preview and an AI coding assistant alongside.
Offline OCR (Pro)
Extract text from photos and PDFs on-device using ML Kit — no cloud OCR.
Custom AI tools (Pro)
Build your own tools with a custom name, icon, system prompt, and optional file knowledge base. Pin them to the home screen.
Optional API mode
Connect your own keys for OpenAI, Gemini, DeepSeek, Groq, Mistral, Grok, or a custom endpoint. Fully optional — local inference works without any key.
On-device model catalog
Download sizes are approximate. Custom GGUF imports also supported.
Text models (GGUF · llama.cpp)
LFM2.5 1.2B
LiquidAI
Llama-3.2-1B
Meta
Gemma-2B
Qwen2.5-3B
Alibaba
Rocket-3B
Community
Thinking models (GGUF · llama.cpp)
Qwen3 0.6B Thinking
Alibaba
Qwen3 1.7B Thinking
Alibaba
Qwen3 4B Thinking
Alibaba
SmolLM3-3B
Hugging Face
Multimodal — text + image + audio (LiteRT)
Gemma-4-E2B-it
Google (LiteRT)
Vision models — image understanding (MNN)
InternVL2.5 1B
OpenGVLab (MNN)
SmolVLM2 2.2B
Hugging Face (MNN)
Image generation (Stable Diffusion · MNN + NPU)
Anything V5
Anime — CPU + NPU
AbsoluteReality
Photorealistic — CPU + NPU
ChilloutMix
Photorealistic — CPU + NPU
CuteYukiMix
Anime — CPU + NPU
Voice input & text-to-speech
Whisper base (bundled)
whisper.cpp
Whisper enhanced (optional)
whisper.cpp
Supertonic 3 (optional)
ONNX · 31 languages · 10 voices
Free vs Pro
Pro is a one-time purchase on Google Play — no subscription, no recurring charge.
| Feature | Free | Pro |
|---|---|---|
| On-device text models installed | Max 2 | Unlimited |
| On-device image models installed | Max 1 | Unlimited |
| Custom GGUF imports | Max 1 | Unlimited |
| Context memory | Up to 2,048 tokens | Up to 32,768 tokens |
| Custom system prompt | Locked | Full edit + saved prompts |
| Prompt templates | Max 5 | Unlimited |
| Themes | 9 free themes | All 21 themes |
| Offline OCR in chat | Locked | Included |
| Custom AI tools | Locked | Create & manage |
| Pro tools (Docs, Email, Brainstorm) | Locked | Included |
| API key mode | Available | Available |
| Ads | None | None |
| Purchase type | — | One-time, no subscription |
Price shown in the Play Store — varies by region.
Frequently asked questions
Do I need internet?
To download models and app updates, yes. Once downloaded, local chat, image generation, and voice input work fully offline. Optional: Gemma 4 web search and API mode require a connection when used.
Is OfflineGPT free?
Yes. The free tier covers local chat, image generation, voice input, and basic writing tools. Pro is a one-time Google Play purchase — no subscription.
What is Gemma 4 E2B?
Google's multimodal on-device model — it understands text, images, and audio attachments in a single conversation. It runs via LiteRT-LM, not llama.cpp, and supports an optional web search tool you can toggle on or off.
Is there an iPhone or iPad version?
Yes. On the App Store the app is listed as Offline AI Studio — same offline-first philosophy, built for iOS with Core ML.
How does image generation work?
Stable Diffusion models (Anything V5, AbsoluteReality, ChilloutMix, CuteYukiMix) run locally via MNN. On supported Snapdragon devices, NPU variants run on the Hexagon DSP for faster generation.
How does voice input work?
A compact Whisper model ships bundled with the app. Grant microphone access, tap the mic, and dictate. Audio is transcribed on-device — nothing leaves your phone.
What is text-to-speech?
Answers can be read aloud automatically or on tap. The system voice works with zero setup. Supertonic 3 is an optional ~380 MB download with 31 languages and 10 neural voices, and supports streaming TTS during generation.
How does privacy work?
By default, prompts and chat history stay on your device. Data only leaves your phone when you explicitly use optional features: model downloads, Gemma 4 web search, or cloud API mode (your keys, your provider). Full privacy policy.









