Before we start

In recent days, a novel trend has emerged in the realm of AI models, DeepSeek, which is revolutionizing the way we think about large language models. Everyone is trying it out, many are downloading and deploying it locally.

However there is one flaw of self-deploying an AI model locally, we can only use it in the computer where the model runs, at most share it in the same local network, typically behind a NAT. But if we really want to take advantage of it, we need to make it accessible on the Internet.

Usually, to make a service behind a NAT accessible on the Internet requires some techniques of reverse proxy, which often requires an ECS server, which will cost us. But the very thought of deploying AI models of our own is that it’s free and we have total control of privacy, otherwise, we would have subscribed to them online from AI providers.

A solution

So, I developed a web-based reverse proxy, webrp, that can be deployed on Deno Deploy, which is free of charge, to connect to a local AI model run by Ollama, and I’ve been using it for a whole week. I’ve been using it on my laptop at home and the workstation in the office, as well as on my cellphone.

Today I’m going to share it and elaborate on how everyone can use it to bypass NAT and reverse proxy an AI model deployed at home so we can use it everywhere, and there is no cost at all.

The remaining article requires some knowledge of Node.js and Deno, since the program is written in TypeScript for the Node.js/Deno platform, and the proxy server is run on Deno Deploy.

Prepare

First, install Ollama and pull and run a model of course:

# for macOS
brew install ollama
# for Linux
curl -fsSL https://ollama.com/install.sh | sh # for Linux
# For Windows, goto https://ollama.com/download/windows and follow the instructions.

# Then pull and run a model, say
ollama run deepseek-r1

ShellScript

Wait for these commands to complete, and we’ll have an AI model running locally.

Then, go to GitHub and fork webrp, this will make your own copy of this program.

Deploy the proxy server

Now, go to Deno Deploy, sign in with the GitHub account, and create a new project with the newly forked webrp repository. Set the entrypoint to server/main.ts, like this:

It’s that simple, the server should start in a few minutes.

Having a problem?

If you haven’t used GitHub Actions before, it’s likely that your deployment will fail or hang up as Deno Deploy won’t be able to initiate the deployment workflow for you on GitHub automatically. To solve this problem, go to your GitHub repository, and switch to the Actions page, GitHub will ask you to enable GitHub Actions if you haven’t enabled it before.

Then, recreate the project in Deno Deploy, now the auto-deployment workflow should work properly.

We can then open the URL of the new project in the browser, it should respond No proxy client, that’s because we haven’t started the proxy client yet, which we will be doing next.

Start the proxy client

In our computer where Ollama is running, clone the GitHub repository, then add an .env file to configure the client program, like this:

git clone https://github.com/ayonli/webrp # You can replace with your own repo URL.
cd webrp
vim .env

ShellScript

In the Vim editor, add these settings:

CLIENT_ID=mac@home # A unique id of your computer
REMOTE_URL=https://ayonli-webrp.deno.dev # Replace with your own URL
LOCAL_URL=http://localhost:11434 # This is the default Ollama API URL

INI

Then start the client with one of these commands:

deno task client # make sure Deno is installed
# or
npm install # makre sure Node.js installed
npm run client

ShellScript

A message Connected to the server should be printed once the connection is established.

Now refresh the web page, we should see Ollama is running, indicating the proxy is properly set up.

Use the AI everywhere

For computers

I recommend using Page Assist plugin in Google Chrome for computers, as Chrome will synchronize plugins and their settings across different devices, so we can use our AI model everywhere without setting it up every time we change a device. And more importantly, Page Assist allows us to customize the Ollama settings with additional Headers, which is important, as we can provide an authentication process before proxying API calls to Ollama.

We’re likely to face a 403 Forbidden error when using Page Assist the first time, as it’s a browser plugin that runs on localhost and calls a remote URL which will cause CORS policy to effect, and Ollama forbids CORS API calls by default. To solve this problem, open the Settings of Page Assist, switch to the Ollama Settings, then expand the Advance Ollama URL Configuration, enable Custom Origin URL and set the Origin URL to http://localhost:11434.

This will fix the CORS problem, now our AI model is ready to be chatted with.

For cellphones

Actually I only have an iPhone, I recommend using the Enchanted app, which also allows us to set up additional authorization information when calling Ollama APIs. Android phones may have similar apps, feel free to explore and share in the comment.

Secure the service

Server-side

As mentioned above, if our Ollama clients support additional authentication settings, we should use them and configure the proxy server to prevent API calls from anyone else other than ourselves. As running an AI model can take a lot of system resources of our computers, we wouldn’t want anyone else to call our Ollama service whether intentionally or accidentally.

To configure the server, go back to Deno Deploy and open the proxy server project, switch to the Settings page, under the Environment Variables section, and add these settings:

CONN_TOKEN=your_private_token # For client connection, will talk about it later.
AUTH_TOKEN=your_private_token # For API calls
AUTH_RULE=^\/api\/ # Configure that only /api/ endpoints will require
                   # authentication. This is important, don't require
                   # authentication for other routes, otherwise the helth
                   # checker of Ollama clients will stop working.

ShellScript

Once saved, Deno Deploy will restart our server automatically.

Ollama clients

Now the server will require authentication to function, we need to set up our Ollama clients properly.

In Page Assist, open Settings, switch to Ollama Settings, expand the Advance Ollama URL Configuration, and add a header Authorization with your private token.

In Enchanted, open Settings, and fill the Bearer Token with your private token.

Proxy client

In addition to authentication settings for Ollama API calls, we also set an CONN_TOKEN on the server, this setting is to restrict which proxy client can connect to our server and become one of its upstream services. We certainly don’t want other people to connect to our server and fool around with our service.

We need to update the client .env file to conform to this new setting, just add an CONN_TOKEN to the same value as the server’s and restart the client.

Now our proxy server is secure enough for real-life usage.

For more information?

The webrp repository mentioned other information that is not covered in this article, don’t forget to read its README if you face any problems when setting up the proxy.

Deploy AI at Home and Use It Everywhere for Free

Before we start

A solution

Prepare

Deploy the proxy server

Having a problem?

Start the proxy client

Use the AI everywhere

For computers

For cellphones

Secure the service

Server-side

Ollama clients

Proxy client

For more information?

Leave a commentCancel reply