---
license: apache-2.0
license_link: https://ai.google.dev/gemma/docs/gemma_4_license
language:
  - en
  - multilingual
tags:
  - gemma4
  - gemma-4
  - abliterated
  - refusal-ablated
  - uncensored
  - heretic
  - multimodal
library_name: transformers
pipeline_tag: any-to-any
base_model: google/gemma-4-12B-it
base_model_relation: finetune
---

# osmGemma-4-12B-uncensored-bf16

> **Full-precision (bf16) abliterated** `google/gemma-4-12B-it` — the complete *encoder-free* unified multimodal model (text · image · audio · video) with refusals removed via [Heretic](https://github.com/p-e-w/heretic). **This is the artifact that runs refusal-free vision + audio + video *today*** (in 🤗 transformers), and the source for the MLX quants below. By **osmAPI**.

## ⚠️ Abliterated model — read this

Refusal directions were surgically removed from the parent. It will answer many prompts the parent refuses. **No new capabilities were added — only refusal behavior was reduced.** Use responsibly and within applicable law.

## 🔓 Refusal removal — before / after

Measured with Heretic's evaluator on **100 harmful prompts** (`mlabonne/harmful_behaviors` `test[:100]`), greedy decoding, refusal-marker classifier:

| Model | Refusals | Refusal rate |
|---|---|---|
| `google/gemma-4-12B-it` (original) | **99 / 100** | 99.0% |
| **this model** (abliterated) | **12 / 100** | 12.0% |

### **↓ 87 fewer refusals — an 87.9% reduction**, at **KL divergence 0.053** from the original (≪ 0.5, the damage threshold) → general capabilities preserved.

## 📊 Specs

| | |
|---|---|
| **Precision** | bfloat16 (full precision) |
| **Disk size** | ~23.9 GB |
| **Base** | `google/gemma-4-12B-it` — 11.95B, 48 layers, 256K context, 140+ languages |
| **Modalities** | text · image · audio · video in, text out (encoder-free / unified) |
| **Refusal-free multimodal today** | ✅ via 🤗 transformers |

## ⚡ Inference & compatibility

| Runtime | Supported? | Notes |
|---|---|---|
| **🤗 transformers** (PyTorch · CUDA/MPS) | ✅ **full multimodal** (text · image · audio · video) | needs `torchvision` + `librosa` |
| **vLLM** (CUDA) | ⚠️ quantize first | convert to FP8/AWQ/GPTQ; `gemma4_unified` serving support is rolling out |
| **MLX** (Apple Silicon) | ➡️ use the MLX quants below | text today; vision pending mlx-vlm |
| **Ollama / llama.cpp** | ❌ needs GGUF | conversion pending llama.cpp `gemma4_unified` support |

## 🚀 Quick start — transformers (text)

```bash
pip install -U "transformers>=5.10" torch torchvision librosa accelerate
```

```python
from transformers import AutoProcessor, AutoModelForMultimodalLM

mid = "osmapi/osmGemma-4-12B-uncensored-bf16"
processor = AutoProcessor.from_pretrained(mid)
model = AutoModelForMultimodalLM.from_pretrained(mid, dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain abliteration in two sentences."},
]
inputs = processor.apply_chat_template(messages, tokenize=True, return_dict=True,
    return_tensors="pt", add_generation_prompt=True, enable_thinking=False).to(model.device)
n = inputs["input_ids"].shape[-1]
out = model.generate(**inputs, max_new_tokens=256)
print(processor.parse_response(processor.decode(out[0][n:], skip_special_tokens=False)))
```

> `enable_thinking=True` turns on reasoning mode; `parse_response` separates the thinking channel.

## 🖼️🎙️ Vision & audio (image · audio · video)

Full multimodal runs here today — pass image/audio/video in the message content:

```python
messages = [{"role": "user", "content": [
    {"type": "image", "url":   "https://.../photo.jpg"},   # image → key "url"
    {"type": "audio", "audio": "https://.../clip.wav"},    # audio → key "audio" (≤30s)
    {"type": "text",  "text":  "Describe what you see and hear."},
]}]
inputs = processor.apply_chat_template(messages, tokenize=True, return_dict=True,
    return_tensors="pt", add_generation_prompt=True).to(model.device)
n = inputs["input_ids"].shape[-1]
out = model.generate(**inputs, max_new_tokens=512)
print(processor.parse_response(processor.decode(out[0][n:], skip_special_tokens=False)))
```

> Audio ≤ 30 s (native ASR + speech translation) · images variable-resolution · video ≤ 60 s (~1 fps).

## 🍎 Running on Mac

This bf16 repo runs in **🤗 transformers on Apple Silicon (MPS)** — full multimodal, as above. For lighter, faster **MLX** serving, use the MLX quants of this model (see the family table) with: [**oMLX**](https://github.com/jundot/omlx) (inference server + macOS menu-bar app, SSD KV cache), [**vMLX**](https://vmlx.net), [**LM Studio**](https://lmstudio.ai) (MLX engine), [**Ollama** 0.19+](https://ollama.com), or [**mlx-vlm**](https://github.com/Blaizzy/mlx-vlm) directly. Those serve the MLX quants once their bundled `mlx-lm`/`mlx-vlm` adds `gemma4_unified` support (text today via `mlx-vlm` + a small shim).

## 🗂️ Quant family

| Repo | Scheme | Eff. BPW | Size | |
|---|---|---|---|---|
| `osmGemma-4-12B-uncensored-bf16` — abliterated, **full multimodal** | bf16 | 16 | ~23.9 GB | ✅ **you are here** |
| `osmGemma-4-12B-uncensored-8bit-mlx` | 8-bit affine | 8.805 | ~13.7 GB | [↗](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-8bit-mlx) |
| `osmGemma-4-12B-uncensored-mxfp4-mlx` | MXFP4 (4-bit microscaling) | 7.628 | ~11.9 GB | [↗](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-mxfp4-mlx) |
| `osmGemma-4-12B-uncensored-mixed-4.2bpw-mlx` | mixed 3/4-bit | 4.2 | ~6.6 GB | [↗](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-mixed-4.2bpw-mlx) |
| `google/gemma-4-12B-it` — base (not abliterated) | bf16 | 16 | ~24 GB | [↗](https://huggingface.co/google/gemma-4-12B-it) |
| `google/gemma-4-12B-it-assistant` — MTP draft | *can be added later* | — | — | ⏳ planned |

## 🧬 Lineage

```
google/gemma-4-12B                       (Google DeepMind — base pretrain)
        ↓  instruction tuning
google/gemma-4-12B-it               (multimodal, encoder-free)
        ↓  Heretic 1.3.0 — directional ablation, Optuna/TPE-optimized over 100 trials, best Pareto trial #55
this repo — abliterated bf16             (refusals 99→12 / 100, KL 0.053)
        ↓  mlx-vlm quantization
MLX quants (8-bit · MXFP4 · mixed)       — see family table
```

## 🙏 Credits

| Role | Project |
|---|---|
| **Abliteration & release** | [osmAPI](https://huggingface.co/osmapi) |
| **Abliteration tool** | [Heretic](https://github.com/p-e-w/heretic) by p-e-w |
| **Research** | [osmAPI Research Team](https://osmapi.com) · [Terv Student Research Team](https://terv.pro) |
| **Base model** | [Google DeepMind](https://huggingface.co/google) — Gemma 4 |

## 📜 License

Apache-2.0 (inherited from the base). Also subject to the [Gemma 4 Terms of Use](https://ai.google.dev/gemma/docs/gemma_4_license).