--- license: apache-2.0 license_link: https://ai.google.dev/gemma/docs/gemma_4_license language: - en - multilingual tags: - gemma4 - gemma-4 - abliterated - refusal-ablated - uncensored - heretic - multimodal library_name: transformers pipeline_tag: any-to-any base_model: google/gemma-4-12B-it base_model_relation: finetune --- # osmGemma-4-12B-uncensored-bf16 > **Full-precision (bf16) abliterated** `google/gemma-4-12B-it` โ€” the complete *encoder-free* unified multimodal model (text ยท image ยท audio ยท video) with refusals removed via [Heretic](https://github.com/p-e-w/heretic). **This is the artifact that runs refusal-free vision + audio + video *today*** (in ๐Ÿค— transformers), and the source for the MLX quants below. By **osmAPI**. ## โš ๏ธ Abliterated model โ€” read this Refusal directions were surgically removed from the parent. It will answer many prompts the parent refuses. **No new capabilities were added โ€” only refusal behavior was reduced.** Use responsibly and within applicable law. ## ๐Ÿ”“ Refusal removal โ€” before / after Measured with Heretic's evaluator on **100 harmful prompts** (`mlabonne/harmful_behaviors` `test[:100]`), greedy decoding, refusal-marker classifier: | Model | Refusals | Refusal rate | |---|---|---| | `google/gemma-4-12B-it` (original) | **99 / 100** | 99.0% | | **this model** (abliterated) | **12 / 100** | 12.0% | ### **โ†“ 87 fewer refusals โ€” an 87.9% reduction**, at **KL divergence 0.053** from the original (โ‰ช 0.5, the damage threshold) โ†’ general capabilities preserved. ## ๐Ÿ“Š Specs | | | |---|---| | **Precision** | bfloat16 (full precision) | | **Disk size** | ~23.9 GB | | **Base** | `google/gemma-4-12B-it` โ€” 11.95B, 48 layers, 256K context, 140+ languages | | **Modalities** | text ยท image ยท audio ยท video in, text out (encoder-free / unified) | | **Refusal-free multimodal today** | โœ… via ๐Ÿค— transformers | ## โšก Inference & compatibility | Runtime | Supported? | Notes | |---|---|---| | **๐Ÿค— transformers** (PyTorch ยท CUDA/MPS) | โœ… **full multimodal** (text ยท image ยท audio ยท video) | needs `torchvision` + `librosa` | | **vLLM** (CUDA) | โš ๏ธ quantize first | convert to FP8/AWQ/GPTQ; `gemma4_unified` serving support is rolling out | | **MLX** (Apple Silicon) | โžก๏ธ use the MLX quants below | text today; vision pending mlx-vlm | | **Ollama / llama.cpp** | โŒ needs GGUF | conversion pending llama.cpp `gemma4_unified` support | ## ๐Ÿš€ Quick start โ€” transformers (text) ```bash pip install -U "transformers>=5.10" torch torchvision librosa accelerate ``` ```python from transformers import AutoProcessor, AutoModelForMultimodalLM mid = "osmapi/osmGemma-4-12B-uncensored-bf16" processor = AutoProcessor.from_pretrained(mid) model = AutoModelForMultimodalLM.from_pretrained(mid, dtype="auto", device_map="auto") messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain abliteration in two sentences."}, ] inputs = processor.apply_chat_template(messages, tokenize=True, return_dict=True, return_tensors="pt", add_generation_prompt=True, enable_thinking=False).to(model.device) n = inputs["input_ids"].shape[-1] out = model.generate(**inputs, max_new_tokens=256) print(processor.parse_response(processor.decode(out[0][n:], skip_special_tokens=False))) ``` > `enable_thinking=True` turns on reasoning mode; `parse_response` separates the thinking channel. ## ๐Ÿ–ผ๏ธ๐ŸŽ™๏ธ Vision & audio (image ยท audio ยท video) Full multimodal runs here today โ€” pass image/audio/video in the message content: ```python messages = [{"role": "user", "content": [ {"type": "image", "url": "https://.../photo.jpg"}, # image โ†’ key "url" {"type": "audio", "audio": "https://.../clip.wav"}, # audio โ†’ key "audio" (โ‰ค30s) {"type": "text", "text": "Describe what you see and hear."}, ]}] inputs = processor.apply_chat_template(messages, tokenize=True, return_dict=True, return_tensors="pt", add_generation_prompt=True).to(model.device) n = inputs["input_ids"].shape[-1] out = model.generate(**inputs, max_new_tokens=512) print(processor.parse_response(processor.decode(out[0][n:], skip_special_tokens=False))) ``` > Audio โ‰ค 30 s (native ASR + speech translation) ยท images variable-resolution ยท video โ‰ค 60 s (~1 fps). ## ๐ŸŽ Running on Mac This bf16 repo runs in **๐Ÿค— transformers on Apple Silicon (MPS)** โ€” full multimodal, as above. For lighter, faster **MLX** serving, use the MLX quants of this model (see the family table) with: [**oMLX**](https://github.com/jundot/omlx) (inference server + macOS menu-bar app, SSD KV cache), [**vMLX**](https://vmlx.net), [**LM Studio**](https://lmstudio.ai) (MLX engine), [**Ollama** 0.19+](https://ollama.com), or [**mlx-vlm**](https://github.com/Blaizzy/mlx-vlm) directly. Those serve the MLX quants once their bundled `mlx-lm`/`mlx-vlm` adds `gemma4_unified` support (text today via `mlx-vlm` + a small shim). ## ๐Ÿ—‚๏ธ Quant family | Repo | Scheme | Eff. BPW | Size | | |---|---|---|---|---| | `osmGemma-4-12B-uncensored-bf16` โ€” abliterated, **full multimodal** | bf16 | 16 | ~23.9 GB | โœ… **you are here** | | `osmGemma-4-12B-uncensored-8bit-mlx` | 8-bit affine | 8.805 | ~13.7 GB | [โ†—](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-8bit-mlx) | | `osmGemma-4-12B-uncensored-mxfp4-mlx` | MXFP4 (4-bit microscaling) | 7.628 | ~11.9 GB | [โ†—](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-mxfp4-mlx) | | `osmGemma-4-12B-uncensored-mixed-4.2bpw-mlx` | mixed 3/4-bit | 4.2 | ~6.6 GB | [โ†—](https://huggingface.co/osmapi/osmGemma-4-12B-uncensored-mixed-4.2bpw-mlx) | | `google/gemma-4-12B-it` โ€” base (not abliterated) | bf16 | 16 | ~24 GB | [โ†—](https://huggingface.co/google/gemma-4-12B-it) | | `google/gemma-4-12B-it-assistant` โ€” MTP draft | *can be added later* | โ€” | โ€” | โณ planned | ## ๐Ÿงฌ Lineage ``` google/gemma-4-12B (Google DeepMind โ€” base pretrain) โ†“ instruction tuning google/gemma-4-12B-it (multimodal, encoder-free) โ†“ Heretic 1.3.0 โ€” directional ablation, Optuna/TPE-optimized over 100 trials, best Pareto trial #55 this repo โ€” abliterated bf16 (refusals 99โ†’12 / 100, KL 0.053) โ†“ mlx-vlm quantization MLX quants (8-bit ยท MXFP4 ยท mixed) โ€” see family table ``` ## ๐Ÿ™ Credits | Role | Project | |---|---| | **Abliteration & release** | [osmAPI](https://huggingface.co/osmapi) | | **Abliteration tool** | [Heretic](https://github.com/p-e-w/heretic) by p-e-w | | **Research** | [osmAPI Research Team](https://osmapi.com) ยท [Terv Student Research Team](https://terv.pro) | | **Base model** | [Google DeepMind](https://huggingface.co/google) โ€” Gemma 4 | ## ๐Ÿ“œ License Apache-2.0 (inherited from the base). Also subject to the [Gemma 4 Terms of Use](https://ai.google.dev/gemma/docs/gemma_4_license).