Instructions to use TacitaModels/Gemma-4-Tacita-E4B-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use TacitaModels/Gemma-4-Tacita-E4B-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=TacitaModels/Gemma-4-Tacita-E4B-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
Gemma‑4‑Tacita‑E4B · LiteRT‑LM
The on‑device brain of Tacita — a private assistant that runs entirely on your phone.
What this is
A QLoRA fine‑tune of Gemma 4 E4B (≈4B effective, Google's per‑layer‑embedding line) that bakes Tacita's per‑turn behaviors directly into the weights — so the on‑device model needs far fewer serial "director" calls and answers faster, fully offline.
Instead of orchestrating 4–5 separate model passes per turn (preamble → search plan → relevance → answer), Gemma‑4‑Tacita does them inline.
| Tier | Model | Target |
|---|---|---|
| light | Gemma‑4‑Tacita‑E2B | mobile, low‑end |
| pro | Gemma‑4‑Tacita‑E4B ← you are here | mobile, standard |
What's baked in
- 🗣️ Inline preamble + native thinking — a short route‑declaration preamble, then
enable_thinkinghonored natively (off / normal / extended); emits nothing when thinking is off (fixes the stock‑Gemma llama.cpp #21338 class of bugs). - 🔎 End‑to‑end web search — decides when to search, carries a multilingual query plan inside the tool call (
{queries:[{q,lang}]}), grounds answers in the snippets, and retries off‑topic results with sharper keywords — all in the weights. - 🤝 Honesty — refuses live‑data questions it can't answer (no tools) without inventing numbers.
- 🪪 Tacita identity — knows it is Tacita, built on Gemma 4, running privately on‑device.
- 🎭 Multi‑persona roleplay (gated) — director plan + strict single‑speaker turns (no cross‑character bleed).
Capability metadata
The bundle is stamped with tacita.* metadata so the Tacita runtime knows which behaviors are inline and bypasses the matching director call:
tacita.model = gemma-4-tacita
tacita.variant = E4B
tacita.tier = mobile-pro
tacita.capabilities = [preamble_inline, thinking_native, search_plan_inline,
search_relevance_inline, honesty_no_tool, identity, ...]
A stock runtime that ignores this metadata still runs the model normally.
Format & speed
LiteRT‑LM (Google AI Edge) with Multi‑Token Prediction — ~2× faster decode than GGUF on mobile GPUs. Runs via flutter_gemma / flutter_litert_lm (dart:ffi, GPU/NPU delegate). A GGUF build for desktop/llama.cpp lives at Gemma‑4‑Tacita‑E4B‑GGUF.
License
The base Gemma 4 is Apache 2.0 (Google removed the Gemma Terms of Use for the Gemma 4 family). This derivative inherits Apache 2.0.
Provenance
Training labels are adapted permissively‑licensed open datasets + an open‑weights teacher — never frontier‑API model outputs.
- Downloads last month
- 19