Sunday, September 15, 2024

Run Llama 3 on your laptop

Ollama is a convenient platform for the local development of open-source AI models. 

Why should you use open-source AI? \


Today, we have access to powerful Large Language Models, such as GPT-4o or Clause 3.5 Sonnet. 

But they come with 4 major problems: 



Data Privacy. When “talking” to GPT-4 you always send your data to the OpenAI server. For most companies, this is the #1 reason NOT to use AI. 


Cost. The best-performing LLMs are expensive, especially for high-volume applications. 


Dependency. Using GPT-4 or Claude means you rely on OpenAI or Anthropic. Most businesses prefer independence. 


Limited Customization. Every business has unique needs and problems. Custom solutions are crucial for many. But customizing the biggest models is possible only through Prompt Engineering. 


Let’s compare it to the open-source models: 

Full Privacy
We run open-source models locally. Which means, we don’t send the data anywhere. They can work offline! 

Lower Cost. You can use many “local” models for free. You pay for more powerful ones, but they’re much cheaper than GPT-4. 
Independence & Control. You’ve got full control over the model. Once you download it to your computer, you “own” it. 
Customization. You can fine-tune, re-train, and modify open-source LLMs to fit your specific needs.

But of course, open-source LLMs have their own limitations: 

Worse Performance. Reasoning and general performance of open-source LLMs always lag behind GPT-4. 
Integration Challenges. Integrating them requires more expertise and effort. 
Hardware costs. LLMs require high computational power. To run them for high-volume applications, you need your own GPUs. 

Running local Llama 3 with Ollama. 
All you need: Download Ollama on your local system. 
Download one of the local models on your computer using Ollama. For example, if I want to use Llama3, I need to open the terminal and run: 

$ ollama run llama3 

If it’s the first time you use the model, Ollama will first download it. 
Because it has 8B parameters, it’ll take a while. 
 Once you download the model, you can use it through Ollama API. 
To install Ollama API, run the following command: 
 $ pip install ollama 

And with these steps, you’re ready to run the code from this article. 

 In this article, we’ll explore how to use open-source Large Language Models (LLMs) with Ollama. We’ll go through the following topics: Using open-source models with Ollama. 

The importance of the system prompt. Streaming responses with Ollama. The practical applications of the LLM temperature. 

The usage and limitations of the max tokens parameter. Replicating “creative” responses with the seed parameter

Getting the Simple Response

import ollama

model = "llama3"
response = ollama.chat(
model=model,
messages=[
{"role": "user", "content": "What's the capital of Poland?"}
]
)
print(response["message"]["content"])

## Prints: The capital of Poland is Warsaw (Polish: Warszawa).
  1. import ollama to use Ollama API
  2. model = "llama3 to define the model we want to use
  3. ollama.chat() to get the response. We used 2 parameters:
    model that we defined before
    messages where we keep the list of messages