There exist several libraries to manipulate LLM. The one you shall choose mainly depends on two factors: your target task, and your available hardware. You can find below a small selection of libraries grouped by task. Of course, there exist several other powerful libraries for LLMs, but these are the most common ones as of August 2024. Thereafter, we will focus on the ollama library.
ollama is designed to be easy to use locally to try out open source LLMs. with just one or a few commands, it downloads a quantized LLM locally and launch an OpenAI-compatible server, which you may interact with using one of the many of the available chatGPT-compatible clients. (Personal note: my preferred client is pure linux command-line, for CLI afficionados: charm.sh mods). Ollama also gives simple command-line scripts to immediatly start chatting with the LLM, without any server. As of August 2024, it’s one of the preferred way to quickly start using an LLM.
ollama run qwen2.5
Note: if you have less than 4GB of RAM, you may want to use a smaller model
If nothing works, you may run your commands on a free google colab tier.
Exercice: Ask for a summary of some text file by prepending the string “Summarize this text:” to the actual file content
Note that the result is very sensitive to the prompt you use: try with various prompts (be creative!) and observe the differences.
Any LLM is limited to the knowledge it has been trained on (and so its date cutoff), and to only interact through text. A major trend mid-2024 is to let LLMs interact with external tools, such as a calculator, a web search engine, a python script execution sandbox… The underlying principle is to finetune the LLM to generate a special structured text format, where the LLM writes the ID of some external tool and its arguments. Then, the program that is calling the LLM can interpret this structured text format and execute the call to the actual external tool specified. Then, we can continue our conversation with the LLM by feeding it with the answer from the tool.
One important part: before doing all this, you must give ollama a list of available external tools that can be used. This is done by installing the ollama pip library, which enables you tou call ollama in python and define one python method for each tool.
Important: When listing the tools/python methods for ollama, it’s important to clearly describe what each method is doing as well as each of its arguments in plain English, as the LLM will decide to call a given tool based on its description!
Let’s now put in into practice:
import ollama
import requests
import json
messages = [{'role': 'user', 'content': 'What is the main news right now in the USA?'}]
def getnews(c):
c=c.lower().strip()
if c=='france': s='fr'
elif c=='india': s='in'
elif c=='usa': s='us'
elif c=='australia': s='au'
elif c=='russia': s='ru'
elif c=='united kingdom': s='gb'
else:
print("unknown country",c)
s='fr'
url="https://saurav.tech/NewsAPI/top-headlines/category/general/"+s+".json"
print("calling fct")
response = requests.get(url)
res = response.text
print("tool res",res)
print("\n"*5)
n=json.loads(res)
r=n['articles'][0]['title']+": "+n['articles'][0]['content']
print("extracting news",r,"\n"*3)
return r
def main():
response = ollama.chat(
model='qwen2.5',
messages=messages,
tools=[
{
'type': 'function',
'function': {
'name': 'getnews',
'description': 'Get recent news from a country',
'parameters': {
'type': 'object',
'properties': {
'country': {
'type': 'string',
'description': 'The name of the country',
},
},
'required': ['country'],
},
},
},
],
)
# Add the model's response to the conversation history
messages.append(response['message'])
print("first answer",response['message'])
# Check if the model decided to use the provided function
if not response['message'].get('tool_calls'):
print("The model didn't use the function. Its response was:")
print(response['message']['content'])
return
# Process function calls made by the model
if response['message'].get('tool_calls'):
available_functions = {
'getnews': getnews,
}
for tool in response['message']['tool_calls']:
function_to_call = available_functions[tool['function']['name']]
function_response = function_to_call(tool['function']['arguments']['country'])
# Add function response to the conversation
messages.append(
{
'role': 'tool',
'content': function_response,
}
)
# Second API call: Get final response from the model
final_response = ollama.chat(model='qwen2.5', messages=messages)
print(final_response['message']['content'])
main()
# adapted from https://github.com/ollama/ollama-python/blob/main/examples/tools/main.py