Practical session: introduction to LLMs

This is the first practical session accompanying the LLM introduction course at the IDESSAI 2024 summer school.

Contact: christophe.cerisara@loria.fr

Choice of library

There exist several libraries to manipulate LLM. The one you shall choose mainly depends on two factors: your target task, and your available hardware. You can find below a small selection of libraries grouped by task. Of course, there exist several other powerful libraries for LLMs, but these are the most common ones as of August 2024. Thereafter, we will focus on the ollama library.

Inference libraries

HF transformers: the reference/swiss-army-knife library for LLMs
torchchat: enables running LLMs on servers and mobiles
llama.cpp: core library used for fast inference locally
ollama: easy-to-use for personal usage
vllm: designed to serve multiple users
llamafiles: efficient on CPU, low-resources
Lightning/litserve: highly customizable, a bit lower level
…

Finetuning libraries

HF transformers
Lightning
axolotl
llama-factory
…

Pretraining libraries

HF transformers
Lightning
DeepSpeed
Megatron deepspeed
…

Practical session: ollama

ollama is designed to be easy to use locally to try out open source LLMs. with just one or a few commands, it downloads a quantized LLM locally and launch an OpenAI-compatible server, which you may interact with using one of the many of the available chatGPT-compatible clients. (Personal note: my preferred client is quite geeky, pure linux command-line: charm.sh mods). Ollama also gives simple command-line scripts to immediatly start chatting with the LLM, without any server. As of August 2024, it’s one of the preferred way to quickly start using an LLM.

Q1: Use Llama3.1 locally with ollama

Install ollama locally: follow the instructions at https://ollama.com.
Check that you can start chatting with Llama3.1-7b-chat:

ollama run llama3.1

Note: if you have less than 4GB of RAM, you may want to use a smaller model, like gemma2b, or tinydolphin.

ollama run gemma2:2b

If nothing works, you may run your commands on a free google colab tier.
Exercice: Ask for a summary of some text file by prepending the string “Summarize this file:” to the actual file content
Note that the result is very sensitive to the prompt you use: try with various prompts (be creative!) and observe the differences.

Q2: Use Llama3.1 to interact with tools

Any LLM is limited to the knowledge it has been trained on (and so its date cutoff), and to only interact through text. A major trend mid-2024 is to let LLMs interact with external tools, such as a calculator, a web search engine, a python script execution sandbox… The underlying principle is to finetune the LLM to generate a special structured text format, where the LLM writes the ID of some external tool and its arguments. Then, the program that is calling the LLM can interpret this structured text format and execute the call to the actual external tool specified. Then, we can continue our converation with the LLM by feeding it with the answer from the tool.

One important missing part: before doing all this, you must give ollama a list of available external tools that can be used. This is done by installing the ollama pip library, which enables you tou call ollama in python and define one python method for each tool.

Important: When listing the tools/python methods for ollama, it’s important to clearly describe what each method is doing as well as each of its arguments in plain English, as the LLM will decide to call a given tool based on its description!

Let’s now put in into practice:

Here is a code snippet that shows you how to chat with Llama3.1 and makes it use a tool. In this case, the tool is a simple function that queries a free website for headline news for a requested country:

import ollama
import requests
import json

messages = [{'role': 'user', 'content': 'What is the main news right now in the USA?'}]

def getnews(c):
    c=c.lower().strip()
    if c=='france': s='fr'
    elif c=='india': s='in'
    elif c=='usa': s='us'
    elif c=='australia': s='au'
    elif c=='russia': s='ru'
    elif c=='united kingdom': s='gb'
    else:
        print("unknown country",c)
        s='fr'
    url="https://saurav.tech/NewsAPI/top-headlines/category/general/"+s+".json"
    print("calling fct")
    response = requests.get(url)
    res = response.text
    print("tool res",res)
    print("\n"*5)

    n=json.loads(res)
    r=n['articles'][0]['title']+": "+n['articles'][0]['content']
    print("extracting news",r,"\n"*3)
    return r

def main():
    response = ollama.chat(
        model='llama3.1',
        messages=messages,
        tools=[
          {
            'type': 'function',
            'function': {
              'name': 'getnews',
              'description': 'Get recent news from a country',
              'parameters': {
                'type': 'object',
                'properties': {
                    'country': {
                        'type': 'string',
                        'description': 'The name of the country',
                        },
                },
                'required': ['country'],
              },
            },
          },
        ],
    )

    # Add the model's response to the conversation history
    messages.append(response['message'])
    print("first answer",response['message'])

    # Check if the model decided to use the provided function
    if not response['message'].get('tool_calls'):
        print("The model didn't use the function. Its response was:")
        print(response['message']['content'])
        return

    # Process function calls made by the model
    if response['message'].get('tool_calls'):
        available_functions = {
          'getnews': getnews,
        }
        for tool in response['message']['tool_calls']:
          function_to_call = available_functions[tool['function']['name']]
          function_response = function_to_call(tool['function']['arguments']['country'])
          # Add function response to the conversation
          messages.append(
            {
              'role': 'tool',
              'content': function_response,
            }
          )

    # Second API call: Get final response from the model
    final_response = ollama.chat(model='llama3.1', messages=messages)
    print(final_response['message']['content'])

main()

# adapted from https://github.com/ollama/ollama-python/blob/main/examples/tools/main.py

Test this code, requesting top news from several countries, and ensure that the LLM does indeed make a call to the news scrapper and does not hallucinate fake news.
Add another for the current weather: I recommend https://wttr.in
Add another tool of your liking: be creative!
Modify the code so that multiple news, instead of the first news only, are passed to the LLM.