Runs 100% on your machine

Powerful AI that
never phones home.

Chaty loads any .gguf model and runs it locally — chat, a private knowledge base, multi-round Deep Research, and voice. No account, no cloud, no telemetry. Your conversations stay on your disk.

Free & open source · All releases ↗

Runs Llama 3 Gemma 3 / 4 Qwen 3 · 3.5 · 3.6 + any GGUF

What's inside

A full AI workstation,
contained on your device.

Everything a cloud chatbot does — and several things it can't — without a single byte leaving your computer.

Local · Private · GPU

Any model. All local.
On your GPU.

Load any .gguf and talk to it — tokenizer and chat template come straight from the file. Nothing is uploaded, nothing is logged to a server. Cross-vendor Vulkan on Windows and Metal on Apple Silicon offload it to your GPU, auto-tuned to your VRAM with a graceful CPU fallback.

  • Persistent context + KV-cache reuse for fast multi-turn
  • Auto-fit context window with non-destructive summarization
  • Full sampling controls & saveable prompt presets

Knowledge base · RAG

Answers grounded in your documents.

Drop in PDFs, Markdown, code, even scanned images — Chaty chunks and embeds them locally with multilingual bge-m3 vectors. When the knowledge base is on, the model answers only from what it retrieved, and says so plainly when something isn't covered.

  • Hybrid dense + keyword retrieval, fused and diversified
  • Inline citations with hover-preview of the source passage
  • Per-document scope — pick exactly which files are searched

Deep Research

From a question to a cited report.

Hand Chaty a topic and watch it work: it plans queries, runs several rounds of web search interleaved with its own reasoning about what's still missing, then synthesizes a structured long-form report. References list only the sources it actually cited — and you can export the whole thing to PDF.

  • Topic-anchored so results never drift off subject
  • Free, key-less search — resilient multi-provider chain
  • Live progress: planning → searching → reasoning → writing
  • One-click export to PDF or Markdown

Live mode

Talk to it. Hands-free.

Hold a real conversation out loud. An animated orb listens, thinks and answers — local speech-to-text in, local text-to-speech out. Silence auto-sends your turn; replies are read back sentence by sentence. Eleven voices, all on CPU, never touching the model's VRAM.

  • Continuous, hands-free voice conversation
  • Silence auto-send & sentence-by-sentence read-aloud
  • Runs fully offline — zero GPU cost

Deep-dive podcast

Your documents, as a two-host show.

NotebookLM-style audio: turn a knowledge base into a natural English conversation between two hosts, written by the model and read aloud by alternating female and male voices. Watch progress and time remaining, lock the app while it works, then export the finished episode as a WAV.

  • Two distinct hosts, grounded in your own sources
  • Live progress with estimated time remaining
  • Cancel anytime · export the audio to WAV

Rich chat canvas

It renders what the model makes.

A foldable think panel that follows the reasoning as it streams. KaTeX math, Mermaid diagrams, syntax-highlighted code with per-block copy, and a live HTML preview that even runs single-file web games. Full-text search and Markdown / JSON export across every conversation.

  • Mermaid & KaTeX inline
  • Playable HTML preview
  • Light · dark · system
  • Tray, hotkey · EN / 简体中文
your machine

Privacy is the architecture

Your data never leaves your device.

This isn't a setting you toggle — it's how Chaty is built. The model, your chats, your documents, and your knowledge base all live in local storage on your computer. There's no account to create and no server to trust.

The only time the network is touched is when you ask for it: optional web search, or a one-time model download. Turn those off and Chaty is fully offline.

Platforms

Native on the hardware you already own.

macOS

Chip
Apple Silicon (M-series)
GPU
Metal · offload-all on unified memory
Package
Signed .dmg

Windows

Arch
x64 · Windows 10 / 11
GPU
Vulkan · cross-vendor, auto-tuned
Package
Per-user installer · no admin

Under the hood

Engine
llama.cpp · Rust
Shell
Tauri 2 · React
Storage
Local SQLite + vector store

Bring the model home.

Free, open source, and yours to run forever. Pick your platform — the download tracks the latest release.

macOS is ad-hoc signed (not notarized). First launch needs one Gatekeeper step — see the FAQ.

Questions

Good to know.

Is Chaty really free?

Yes — Chaty is free and open source. You bring your own GGUF models (downloadable in-app from Hugging Face), and everything runs on your own machine. There's no subscription and no account.

Does it work fully offline?

Once you've downloaded a model, yes. The network is only used when you explicitly enable web search / Deep Research, or to fetch a model or the optional voice and embedding files. Disable those and Chaty never touches the internet.

Which models can I run?

Any .gguf file. There's first-class handling for Llama 3, Gemma 3 / 4, and Qwen 3 / 3.5 / 3.6 (including thinking control), plus a robust template fallback chain so unusual community models still chat.

What are the hardware requirements?

A 64-bit Windows 10/11 PC or an Apple-Silicon Mac. Smaller quantized models run comfortably in 8 GB of RAM; larger models want more. Chaty auto-tunes GPU offload to your VRAM and refuses models that can't physically fit, so it won't freeze your system.

On macOS it says the app "can't be verified." What do I do?

Chaty is ad-hoc signed but not notarized (there's no paid Apple Developer account behind it). Clear the download quarantine once in Terminal:

xattr -dr com.apple.quarantine /Applications/Chaty.app

Then open it normally — or right-click the app, choose Open, and confirm via System Settings → Privacy & Security → Open Anyway.

Where are my conversations stored?

In a local SQLite database in your user app-data folder. Nothing is synced anywhere. Delete the app data and it's gone — it was only ever on your disk.