Ollama is a convenient platform for the local development of open-source AI models.
Why should you use open-source AI? \
Today, we have access to powerful Large Language Models, such as GPT-4o or Clause 3.5 Sonnet.
But they come with 4 major problems:
Data Privacy. When “talking” to GPT-4 you always send your data to the OpenAI server. For most companies, this is the #1 reason NOT to use AI.
Cost. The best-performing LLMs are expensive, especially for high-volume applications.
Dependency. Using GPT-4 or Claude means you rely on OpenAI or Anthropic. Most businesses prefer independence.
Limited Customization. Every business has unique needs and problems. Custom solutions are crucial for many. But customizing the biggest models is possible only through Prompt Engineering.
Let’s compare it to the open-source models:
Full Privacy.
We run open-source models locally. Which means, we don’t send the data anywhere. They can work offline!
Lower Cost. You can use many “local” models for free. You pay for more powerful ones, but they’re much cheaper than GPT-4.
Independence & Control. You’ve got full control over the model. Once you download it to your computer, you “own” it.
Customization. You can fine-tune, re-train, and modify open-source LLMs to fit your specific needs.
But of course, open-source LLMs have their own limitations:
Worse Performance. Reasoning and general performance of open-source LLMs always lag behind GPT-4.
Integration Challenges. Integrating them requires more expertise and effort.
Hardware costs. LLMs require high computational power. To run them for high-volume applications, you need your own GPUs.
Running local Llama 3 with Ollama.
All you need:
Download Ollama on your local system.
Download one of the local models on your computer using Ollama. For example, if I want to use Llama3, I need to open the terminal and run:
$ ollama run llama3
If it’s the first time you use the model, Ollama will first download it.
Because it has 8B parameters, it’ll take a while.
Once you download the model, you can use it through Ollama API.
To install Ollama API, run the following command:
$ pip install ollama
And with these steps, you’re ready to run the code from this article.
In this article, we’ll explore how to use open-source Large Language Models (LLMs) with Ollama.
We’ll go through the following topics:
Using open-source models with Ollama.
The importance of the system prompt.
Streaming responses with Ollama.
The practical applications of the LLM temperature.
The usage and limitations of the max tokens parameter.
Replicating “creative” responses with the seed parameter
Getting the Simple Response
Now it’s time to test our model. Let’s ask a simple question to see how it works.
import ollama
model = "llama3"
response = ollama.chat(
model=model,
messages=[
{"role": "user", "content": "What's the capital of Poland?"}
]
)
print(response["message"]["content"])
## Prints: The capital of Poland is Warsaw (Polish: Warszawa).
import ollama
to use Ollama APImodel = "llama3
to define the model we want to useollama.chat()
to get the response. We used 2 parameters:
-model
that we defined before
-messages
where we keep the list of messages
To get the response, we dig in the response
object for ["message"]["content"]
.