Tacita

Gemma‑4‑Tacita‑E4B · LiteRT‑LM

The on‑device brain of Tacita — a private assistant that runs entirely on your phone.


What this is

A QLoRA fine‑tune of Gemma 4 E4B (≈4B effective, Google's per‑layer‑embedding line) that bakes Tacita's per‑turn behaviors directly into the weights — so the on‑device model needs far fewer serial "director" calls and answers faster, fully offline.

Instead of orchestrating 4–5 separate model passes per turn (preamble → search plan → relevance → answer), Gemma‑4‑Tacita does them inline.

Tier Model Target
light Gemma‑4‑Tacita‑E2B mobile, low‑end
pro Gemma‑4‑Tacita‑E4Byou are here mobile, standard

What's baked in

  • 🗣️ Inline preamble + native thinking — a short route‑declaration preamble, then enable_thinking honored natively (off / normal / extended); emits nothing when thinking is off (fixes the stock‑Gemma llama.cpp #21338 class of bugs).
  • 🔎 End‑to‑end web search — decides when to search, carries a multilingual query plan inside the tool call ({queries:[{q,lang}]}), grounds answers in the snippets, and retries off‑topic results with sharper keywords — all in the weights.
  • 🤝 Honesty — refuses live‑data questions it can't answer (no tools) without inventing numbers.
  • 🪪 Tacita identity — knows it is Tacita, built on Gemma 4, running privately on‑device.
  • 🎭 Multi‑persona roleplay (gated) — director plan + strict single‑speaker turns (no cross‑character bleed).

Capability metadata

The bundle is stamped with tacita.* metadata so the Tacita runtime knows which behaviors are inline and bypasses the matching director call:

tacita.model         = gemma-4-tacita
tacita.variant       = E4B
tacita.tier          = mobile-pro
tacita.capabilities  = [preamble_inline, thinking_native, search_plan_inline,
                        search_relevance_inline, honesty_no_tool, identity, ...]

A stock runtime that ignores this metadata still runs the model normally.

Format & speed

LiteRT‑LM (Google AI Edge) with Multi‑Token Prediction — ~2× faster decode than GGUF on mobile GPUs. Runs via flutter_gemma / flutter_litert_lm (dart:ffi, GPU/NPU delegate). A GGUF build for desktop/llama.cpp lives at Gemma‑4‑Tacita‑E4B‑GGUF.

License

The base Gemma 4 is Apache 2.0 (Google removed the Gemma Terms of Use for the Gemma 4 family). This derivative inherits Apache 2.0.

Provenance

Training labels are adapted permissively‑licensed open datasets + an open‑weights teachernever frontier‑API model outputs.


Built with ❤️ for on‑device privacy · TacitaModels
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TacitaModels/Gemma-4-Tacita-E4B-litert-lm

Finetuned
(75)
this model