How to run Gemma 4 locally with llama-server and access it via API

Sun, 05 Apr 2026 00:00:00 +0000

We’ll be using llama-server to serve the Gemma 4

Depedency

If you don’t have llama-server, then you can install it using brew (macOS/Linux), refer docs for other ways to install it.

brew install llama.cpp

And we can download the LLM by running this command.

llama-server -hf ggml-org/gemma-4-26b-a4b-it-GGUF:Q4_K_M

Make sure to change LLM varient based on the available GPU/memory in your machine.

That’s pretty much it. It’ll serve the LLM on port 8080 by default.