AI Engineer Guide

How to run LLMs locally with Docker

In order to run AI models locally, I generally use Ollama

But TIL, you can actually run it via docker itself.

How to get started?

This feature is comes enabled by default if you’re on Docker Desktop 4.40+ for macOS on Apple silicon chips.

If not, you can enable it by running the following command

docker desktop enable model-runner

How to use?

Just pull the model and use it

{model}:{parameters}-{quantization}
docker model pull ai/smollm2:360M-Q4_K_M
docker model run ai/smollm2:360M-Q4_K_M "Why sky is Blue?"

How to use it with OpenAI compatible sdk ?

Just set the base url to

http://localhost:12434/engines/v1

And set the model it as whatever model that you’re running. In the above case, it’ll be ai/smollm2:360M-Q4_K_M

Here is a good video by Travis Media about it

References

Happy local LLM!

#Docker

Stay Updated

Get the latest AI engineering insights delivered to your inbox.

No spam. Unsubscribe at any time.