Testing a Local LLM API with Ollama and Ngrok

Test a local LLM API with Ollama and temporarily access it from outside your network using Ngrok.


Ollama logo
+
Ngrok logo

This tutorial shows how to run a local language model with Ollama and expose its API temporarily using Ngrok.

This can be useful when you want to test a local AI model from another device, a mobile app, a web application, or an IoT project without deploying anything to a cloud server.

 

What This Setup Is Useful For

  • Testing a local LLM API from another device.
  • Calling a local AI model from a mobile app or web app.
  • Creating quick demos without deploying a server.
  • Experimenting with AI integrations in personal projects.
  • Testing API calls before building a more permanent backend.

 

What This Setup Is Not For

This setup is not intended to be used as a permanent public API. It is mainly useful for temporary testing, experiments, and development.

If you want to expose an AI API permanently, you should use proper authentication, rate limiting, logging, access control, and a secure backend.

 

Important Security Note

When you expose a local API through Ngrok, anyone with the public forwarding URL may be able to send requests to it while the tunnel is active.

This means they could use your local machine resources, send prompts to your model, or overload your computer with requests.

Only keep the tunnel open while testing, do not share the URL publicly, and close Ngrok when you no longer need it.

 

Ollama

Ollama is a local runtime for Large Language Models. It allows you to download and run AI models directly on your own computer.

Once Ollama is running, it provides a local API at:

http://localhost:11434/api

You can interact with Ollama from the command line, from its local API, or from your own applications.

Installation
https://ollama.com/download

Available Models
https://ollama.com/search

 

Installing and Running a Model

In this example, I am using translategemma:4b, but you can replace it with another model available in Ollama.

ollama pull translategemma:4b

Alternatively, you can download and run the model directly:

ollama run translategemma:4b

Make sure the model is working locally before trying to expose it with Ngrok.

 

Testing Ollama Locally

Before using Ngrok, test the Ollama API locally.

curl http://localhost:11434/api/generate -d '{
  "model": "translategemma:4b",
  "prompt": "Translate from English to Portuguese Portugal: Hello, how are you?",
  "stream": false
}'

If everything is working, Ollama should return a JSON response containing the generated text.

The "stream": false option makes Ollama return the response as a single JSON object instead of streaming multiple response chunks.

 

Ngrok

Ngrok creates a secure tunnel between the internet and a service running on your local machine. In this case, it allows external requests to reach the local Ollama API while the tunnel is active.

Copy your Ngrok authentication command and run it:

ngrok config add-authtoken **************************************************

Then start a tunnel to the Ollama API port:

ngrok http 11434

Ngrok will generate a temporary forwarding URL similar to:

https://your-random-url.ngrok-free.app

You can now use this URL to access your local Ollama API from outside your machine.

For example, the remote Ollama endpoint becomes:

https://your-random-url.ngrok-free.app/api/generate

 

Python Script

Make sure Ollama is running and that the Ngrok tunnel is active before running the Python script.

import requests

url = "https://your-random-url.ngrok-free.app/api/generate"

payload = {
    "model": "translategemma:4b",
    "prompt": "Translate from English to Portuguese (Portugal). Return only the translation.\n\nHello, how are you?",
    "stream": False
}

try:
    response = requests.post(url, json=payload, timeout=120)

    if response.status_code == 200:
        data = response.json()
        print(data["response"])
    else:
        print("Error:", response.status_code)
        print(response.text)

except requests.exceptions.RequestException as e:
    print("Request failed:", e)

This script sends a prompt to your local model through the public Ngrok URL. The model runs on your computer, but the request can come from another device or application.

 

Possible Use Cases

  • Testing a local translation API.
  • Connecting a mobile app to a local LLM.
  • Testing a chatbot interface without cloud deployment.
  • Calling a local model from an ESP32, ESP8266, or Raspberry Pi project.
  • Creating a quick prototype before building a proper backend.

 

Limitations

  • The Ngrok URL is temporary unless you configure a reserved domain.
  • Performance depends on your computer and the model size.
  • Large models may be slow without a capable GPU.
  • The API has no authentication in this basic example.
  • Anyone with the tunnel URL may be able to send requests while it is active.

 

Final Thoughts

This is not the best way to deploy a production AI service, but it is a simple and useful way to test a local LLM API from outside your computer.

Ollama makes it easy to run models locally, and Ngrok makes it easy to create a temporary public URL for testing. Together, they are useful for experiments, prototypes, and quick integrations.

For real applications, use a proper backend with authentication, request limits, logging, and secure deployment.