OfflineGPT

On-device AI for Android — chat, image generation, voice, TTS, and more. No cloud backend, no subscription.

iOS version: Offline AI Studio on the App Store

What's new

Recent additions in v2.9.x

v2.9.1

NPU image generation

Stable Diffusion now runs on the Qualcomm Hexagon NPU on supported Snapdragon devices, with dedicated "NPU" model variants in the image model manager.

v2.9.0

Text-to-speech

Answers can be read aloud — via system voice zero setup, or the optional Supertonic 3 neural engine with streaming TTS during generation.

New model

Gemma 4 E2B

Google's multimodal on-device model: text, image, and audio understanding in one conversation. Runs via LiteRT-LM with optional web search.

What OfflineGPT can do

Android 8.0+, 64-bit ARM. Models run on-device — CPU, GPU (Vulkan/OpenCL), or Snapdragon NPU.

Offline by default

Chats, generations, and voice input stay on your device. No cloud AI backend, no ads, no tracking.

Gemma 4 E2B — multimodal

Google's on-device model understands text, images, and audio in one conversation. Optional web search when you want it.

Thinking models

Qwen3 (0.6B – 4B) and SmolLM3-3B show step-by-step reasoning before answering. Good for problems that need a second look.

Vision & image generation

Describe images with InternVL or SmolVLM2. Generate art with Stable Diffusion (Anything V5, AbsoluteReality, ChilloutMix, CuteYukiMix).

Snapdragon NPU acceleration

Image generation runs on the Hexagon NPU on supported Snapdragon devices — faster, cooler, less battery drain.

Whisper voice input

Dictate prompts in any language with on-device speech recognition. A compact model ships with the app; a higher-quality one is an optional download.

Read aloud / TTS

Answers are read back via system voice or the optional Supertonic 3 neural engine — 31 languages, 10 voices, streaming during generation.

AI writing tools

Summarize, grammar, tone, simplify, social posts. Pro adds document processing, email drafting, and Idea Blast brainstorming.

Code Lab

Prototype HTML, CSS, and JavaScript with a built-in live preview and an AI coding assistant alongside.

Offline OCR (Pro)

Extract text from photos and PDFs on-device using ML Kit — no cloud OCR.

Custom AI tools (Pro)

Build your own tools with a custom name, icon, system prompt, and optional file knowledge base. Pin them to the home screen.

Optional API mode

Connect your own keys for OpenAI, Gemini, DeepSeek, Groq, Mistral, Grok, or a custom endpoint. Fully optional — local inference works without any key.

On-device model catalog

Download sizes are approximate. Custom GGUF imports also supported.

Text models (GGUF · llama.cpp)

LFM2.5 1.2B

LiquidAI

~731 MB

Llama-3.2-1B

Thinking models (GGUF · llama.cpp)

Qwen3 0.6B Thinking

Alibaba

~640 MB

Qwen3 1.7B Thinking

Alibaba

~1.8 GB

Qwen3 4B Thinking

Alibaba

~2.6 GB

SmolLM3-3B

Hugging Face

~1.9 GB

Multimodal — text + image + audio (LiteRT)

Gemma-4-E2B-it

Google (LiteRT)

~2.58 GB

Vision models — image understanding (MNN)

InternVL2.5 1B

OpenGVLab (MNN)

4 GB RAM min.

SmolVLM2 2.2B

Hugging Face (MNN)

6 GB RAM min.

Image generation (Stable Diffusion · MNN + NPU)

Anything V5

Anime — CPU + NPU

~1.25 GB

AbsoluteReality

Photorealistic — CPU + NPU

~1.25 GB

ChilloutMix

Photorealistic — CPU + NPU

~1.25 GB

CuteYukiMix

Anime — CPU + NPU

~1.25 GB

Voice input & text-to-speech

Whisper base (bundled)

whisper.cpp

~70 MB

Whisper enhanced (optional)

whisper.cpp

~140 MB

Supertonic 3 (optional)

ONNX · 31 languages · 10 voices

~380 MB

Free vs Pro

Pro is a one-time purchase on Google Play — no subscription, no recurring charge.

Feature	Free	Pro
On-device text models installed	Max 2	Unlimited
On-device image models installed	Max 1	Unlimited
Custom GGUF imports	Max 1	Unlimited
Context memory	Up to 2,048 tokens	Up to 32,768 tokens
Custom system prompt	Locked	Full edit + saved prompts
Prompt templates	Max 5	Unlimited
Themes	9 free themes	All 21 themes
Offline OCR in chat	Locked	Included
Custom AI tools	Locked	Create & manage
Pro tools (Docs, Email, Brainstorm)	Locked	Included
API key mode	Available	Available
Ads	None	None
Purchase type	—	One-time, no subscription

Price shown in the Play Store — varies by region.

Frequently asked questions

Do I need internet?

To download models and app updates, yes. Once downloaded, local chat, image generation, and voice input work fully offline. Optional: Gemma 4 web search and API mode require a connection when used.

Is OfflineGPT free?

Yes. The free tier covers local chat, image generation, voice input, and basic writing tools. Pro is a one-time Google Play purchase — no subscription.

What is Gemma 4 E2B?

Google's multimodal on-device model — it understands text, images, and audio attachments in a single conversation. It runs via LiteRT-LM, not llama.cpp, and supports an optional web search tool you can toggle on or off.

Is there an iPhone or iPad version?

Yes. On the App Store the app is listed as Offline AI Studio — same offline-first philosophy, built for iOS with Core ML.

How does image generation work?

Stable Diffusion models (Anything V5, AbsoluteReality, ChilloutMix, CuteYukiMix) run locally via MNN. On supported Snapdragon devices, NPU variants run on the Hexagon DSP for faster generation.

How does voice input work?

A compact Whisper model ships bundled with the app. Grant microphone access, tap the mic, and dictate. Audio is transcribed on-device — nothing leaves your phone.

What is text-to-speech?

Answers can be read aloud automatically or on tap. The system voice works with zero setup. Supertonic 3 is an optional ~380 MB download with 31 languages and 10 neural voices, and supports streaming TTS during generation.

How does privacy work?

By default, prompts and chat history stay on your device. Data only leaves your phone when you explicitly use optional features: model downloads, Gemma 4 web search, or cloud API mode (your keys, your provider). Full privacy policy.

Get OfflineGPT

Free download on Google Play · Android 8.0+