Getting Started with Local Language Models: A Tutorial

[This article was first published on Posts | Joshua Cook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

With so much hype around LLMs (e.g. Chat-GPT), I’ve been playing around with various models in the hope that when I come up with a use case, I will have the skill-set to actually build the tool.
For privacy and usability reasons, I’m particularly interested in running these models locally, especially since I have a fancy MacBook Pro with Apple Silicon that can execute inference on these giant models relatively quickly (usually just a couple of seconds).
With yesterday’s release of a new version of Code Llama, I figured it could be helpful to put together a short post on how to get started playing with these models so others can join in on the fun.

The following tutorial will show you how to:

get set up with Ollama,
create a Python virtual environment,
and provide and explain a simple Python script for interacting with the model using LangChain.

Setting up Ollama

Ollama is the model provider.
Another popular option is HuggingFace, but I have found using Ollama to be very easy and fast.

There are multiple installation options.
The first is to just download the application from the Ollama website, https://ollama.ai/download, but this comes with an app icon and status bar icon that I really don’t need cluttering up my workspace.
Instead, I opted to install it with homebrew, a popular package manager for Mac:

brew install ollama

With Ollama installed, you just need to start the server to interact with it.

ollama serve

The Ollama server will run in this terminal, so you’ll need to open another to continue with the tutorial.
You’ll need to start up the server anytime you want to interact with Ollama (e.g. downloading a new model, running inference).

We can now interact with Ollama, including downloading models with the pull command.
The available models are listed here.
Some models have different versions that are larger or for specific use cases.
Here, we’ll download the Python-fine tuned version of Code Llama.
Note that there are also larger versions of this model that may improve it’s quality.

ollama pull codellama:python

That’s it!
We now have Ollama running and ready to execute inference on the latest Python Code Llama model.

Setting up the Python virtual environment

This is a routine process, not specific to LLMs, but I figured I’d include it here for those unfamiliar.
Below, I create a Python virtual environment, activate it, and then install the necessary LangChain libraries from PyPI.

python -m venv .env
source .env/bin/activate
pip install --upgrade pip
pip install langchain langchain-community

The above commands use the default version of Python installed on your system.
To exercise more control over the versions of Python, I use ‘pyenv’, though this is a bit more complicated and I won’t cover using it here.
It is worth mentioning though for those with a bit more experince.

Interacting with Code Llama using LangChain

“LangChain is a framework for developing applications powered by language models.”
It is a powerful tool for interacting with LLMs – scaling from very simple to highly complex use cases and easily swapping out LLM backends.
I’m still learning how to use it’s more advanced features, but LangChain is very easy to get started with.
The documentation has plenty of examples and is a great place to start with for learning more about the tool.

Here, I’ll provide the code for a simple Python script using LangChain to interact with the Python Code Llama model downloaded above.
I hope this offers a starting point for those wishing to explore playing with these models, but are overwhelmed by the myriad options available.

Note, that you need to have the Ollama server running in the background by executing ollama serve in another terminal (or already running from the previous step).

Below is the code for those who want to take it and run.
Following it, I have more information about what it is actually doing.

"""Demonstration of using the Python Code Llama LLM."""

from langchain.prompts import PromptTemplate
from langchain_community.llms import Ollama
from langchain_core.output_parsers import StrOutputParser



def main() -> None:
 prompt = PromptTemplate.from_template(
 "You are a Python programmer who writes simple and concise code. Complete the"
 " following code using type hints in function definitions:"
 "nn# {input}"
 )
 llm = Ollama(model="codellama:python")
 output_parser = StrOutputParser()

 chain = prompt | llm | output_parser

 response = chain.invoke(
 {"input": "Request a wikipedia article and pull out the tables."}
 )
 print(response)


if __name__ == "__main__":
 main()

If the above code is copied to a file app.py, this script can be executed with the following:

python app.py

There are three sections to this script:

the import statements that load the relevant LangChain libraries,
the main() function that executes the demo (described in detail below),
and the if statement that executes the main() function if this file is run as a script.

The main() function holds the actual code for interacting with the LLM.
It starts by creating prompt, a LangChain Prompt that will take the input from the user and pass it to the model, first wrapping it with some instructions for the LLM.
The LLM object is then created, specifying the model with the same name we used to download it earlier in the tutorial.
The last component is just a simple output parser that converts the model’s output to a string for easy printing.

These three components are then combined into a chain using the pipe (|) operator that LangChain has overloaded to support it’s clever chaining syntax.
The chain’s invoke() method is then executed to pass a request to the LLM.
Note that a dictionary is passed with a key matching the keyword input in the prompt template.
The text passed as “input” will be inserted into the template and the result will then be sent to the LLM.

Conclusion

That’s pretty much it.
These few lines of simple code can get you up a running with an LLM running on your local machine!
I hope this has provided you with some guidance for getting started and was relatively easy to follow.
I would recommend getting the demo running and then perhaps playing with some variables such as:

experimenting with different prompts,
trying different types of tasks such as having the model inspect code for bugs for writing tests,
comparing the results from different models, such as larger Code Llama options, the general vs. Python-specific models, try swapping in a ChatGPT backend, or even figure out a use case for multi-modal models (e.g. llava).

For fun, here is the output from running the above code on the codellama:13b-python model and input “Download a wikipedia article on marmots and extract any tables it contains.”

# Save each table as a separate csv file in your working directory.

from bs4 import BeautifulSoup
import requests, re
def get_marmot_article(url):
 r = requests.get(url)
 html_content = r.text
 soup = BeautifulSoup(html_content,'lxml')
 return soup


def find_tables_in_soup(soup):
 tables = []
 for table in soup.findAll('table'):
 if 'class' in table.attrs:
 if table['class'] == 'wikitable':
 tables.append(table)
 return tables

def extract_from_tables(tables, filename):
 dfs = []
 for i, t in enumerate(tables):
 headers =
 rows = [row.text.split('n')[0].strip()
 if len(row.text.split('n')) >=2 else ''
 for row in t.findAll('tr', recursive=False)][1:]

 data = list(zip(* + ))

 dfs.append(pd.DataFrame(data, columns=['Species','Fur color', 'Range']))
 dfs.to_csv('marmot_{}.csv'.format(i), index=False)
 return dfs

To leave a comment for the author, please follow the link and comment on their blog: Posts | Joshua Cook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Quickstart for playing with LLMs locally

The Future Potential of Local Language Models (LLMs): An Analysis

The rise of LLMs in software programming and machine learning has drawn significant attention recently. Such models, which include Chat GPT, for example, have multi-purpose applications and a rising relevance in diverse fields. In this follow-up analysis, we delve into the long-term implications and future developments around LLMs such as Code Llama and Ollama.

The Evolution of Language-Learning Models

The development of language-learning models has been significant over the last few years. With the possibility to run these models locally on hardware like Apple’s Silicon, their potential has only increased.

Increased focus on privacy and usability

As more organizations and individual developers recognize the importance of privacy and usability, LLM solutions that can be run locally are becoming increasingly popular. The localization creates a strong confidentiality barrier as compared to cloud-based tools and enhances model response times.

Potential for increased adoption

Given their simple setup process and the availability of resources like Code Llama and Ollama for beginners to learn from, LLMs have greater potential for wider adoption among new machine learning enthusiasts. The relative speed advantage presented by local machine execution enhancements like Apple Silicon could also be a major selling point down the line.

Future Developments to Watch For

Advancements in local model execution: As hardware manufacturers increase focus on developing chips with large scale AI processing abilities, we could expect models that can be executed locally to become massively more efficient and time-saving than they are currently.
Breadth of applications: As developers spend more time experimenting with LLMs and as their capabilities continue to grow, we could expect to see these models branching out into niche and specific use-cases beyond what’s currently imaginable.
Improvements in LLM Frameworks: Tools like LangChain, which allow for user-friendly interaction with LLMs, can be expected to see advancements in terms of their functionality and ease of use, further propelling adoption among beginners.

Actionable Advice for Developers

Experiment: Actively experimenting with these tools should be a priority for developers and students interested in machine learning. A hands-on experience with developing solutions using LLMs can prove highly advantageous as their relevance continues to grow.
Stay updated: As with any other field within technology, staying up to date with the latest developments and advancements is key. Regularly check the updates and new releases from authorities like Code Llama and Ollama.
Play with coding tasks: Coding tasks such as inspecting code for bugs or writing tests can be done using LLMs. This can help beginners improve their skill sets and grasp the practical applications of LLMs.
Explore different models: Do not limit your knowledge to just one model. Take time to compare results from different models and eventually even swap backends to get a full understanding of the capabilities of each.

LLMs offer a wide range of potential applications and present a burgeoning field of study for software developers and machine learning enthusiasts. By taking a hands-on approach and plunging into the world of LLMs now, developers can future-proof their skillsets and stay at the forefront of technological innovation.

Read the original article