Run DiffusionGemma with Ollama
Ollama support is coming via llama.cpp PR #24427. Until then, here's the bridge workaround to run DiffusionGemma through Ollama today.
Current Ollama Support Status
Direct Ollama support is not available yet. DiffusionGemma requires a custom diffusion decoding pipeline that isn't merged into llama.cpp mainline. The key blocker: PR #24427 adds diffusion model support to llama.cpp. Once merged, Ollama (which builds on llama.cpp) can integrate it.
Timeline: weeks to months. The PR is under active review as of June 2026. Follow the PR for updates.
Workaround: Run DiffusionGemma Through Ollama Today
While waiting for official support, you can bridge DiffusionGemma into Ollama using a custom Modelfile that wraps the llama.cpp diffusion branch.
Step 1: Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows — download from https://ollama.com/download
# Or use WSL2 + Linux install above
Step 2: Build llama.cpp with Diffusion Support
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# Checkout the diffusion PR branch
git fetch origin pull/24427/head:diffusion-support
git checkout diffusion-support
mkdir build && cd build
cmake .. -DGGML_CUDA=ON # if you have NVIDIA GPU
make -j$(nproc)
Step 3: Download a GGUF Model
# Check /gguf/ for available GGUF downloads
# Place the model in Ollama's model directory
mkdir -p ~/.ollama/models/
# See our GGUF guide for download links
Step 4: Create a Modelfile
# Modelfile for DiffusionGemma
FROM ./diffusiongemma-26b-q4_k_m.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_predict 256
# Diffusion-specific parameters
PARAMETER diffusion_steps 8
PARAMETER block_size 256
TEMPLATE """{{ .Prompt }}"""
Step 5: Create and Run the Model
ollama create diffusiongemma -f Modelfile
ollama run diffusiongemma
Troubleshooting
"model format not supported" error
This means your GGUF file doesn't include the diffusion architecture flag. Make sure you're using a GGUF converted from the diffusion branch of llama.cpp, not the mainline release.
OOM (Out of Memory) on 16GB GPU
Use Q4_0 or Q4_K_M quantization. The 26B model at Q4 needs ~16GB VRAM. Close other GPU processes first.
Slow generation on CPU-only
DiffusionGemma's 4x speed advantage only applies with GPU acceleration. On CPU-only, performance will be comparable to or slower than autoregressive models of similar size.
Ollama says "model not found" after create
# Verify the model was created
ollama list
# If missing, check Modelfile path is correct
ollama create diffusiongemma -f ./Modelfile