Ollama: Install and Run Local LLMs

Ollama

Ollama is a local runtime for Large Language Models that lets you run AI models directly on your computer.
You can interact with Ollama either through its graphical interface or by making code-based API calls.

Installation
https://ollama.com/download

 

What model to choose?

Here are some questions that you can ask when picking a model:

  • What is my goal?
  • What hardware do I have?
  • Do I want more speed or more intelligence?
  • Do I need multimodality (images)?
  • Do I need strong programming capabilities?
  • Do I want to operate 100% offline with full privacy?
  • Do I want the option to fine-tune the model in the future
     

Available Models
https://ollama.com/search

 

Installing and Running a Model

For example, installing the model: gemma3:4b

ollama pull gemma3:4b

Once the model is installed, you can run it with:

ollama run gemma3:4b

 

Other useful commands

List all installed models

ollama list

Remove a model

ollama rm gemma3:4b

 

 

Example

import requests

def generate(prompt):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "gemma3:4b",
            "prompt": prompt,
            "stream": False     
        }
    )

    data = response.json()
    return data["response"]

response = generate("Write a short poem about the ocean.")
print(response)


 

Output





13 Nov. 2025 | Last Updated: 03 Dec. 2025 | jaimedcsilva

Related
  • Using the OpenAI API with Python
  • Ollama: Install and Run Local LLMs
  • Audio to Text with Python

  • Buy Me a Coffee