slides

Working with LLM

Challenges and solutions

Christophe Cerisara

CNRS, LORIA, Synalp team

LLM: post-Machine Learning area?

Limitations of ML approaches:
- Only understand vector inputs
- Unable to learn from 2 examples
LLMs solve these limitations:
- Understand English
- Thanks to “reasoning”, can learn from 2 examples

Ex: last letter concatenation

(from Denny Zhou, Google)

Elon Musk	nk
Bill Gates	ls

Obvious for humans with 2 examples
ML approach:
- enc-dec trained on tons of labeled data

Qwen2.5-7b:

Perform last letter concatenation, as shown in these two examples. Words: Elon Musk Answer: nk Words: Bill Gates Answer: ls Words: Barack Obama Answer:

[…] So, the concatenation would be ka

Requires more advanced prompting strategies with older models: CoT, analogical prompting…

Part 1: inside the LLM…

GPU
Data
Transformer
- No bottleneck of information

2017: the transformer

Reason over layer steps
Semi-Turing machine
Learns to learn (2nd order-GD, TD)
Reason over time steps

Scaling laws

The more data you train on
- the more the LLM knows about
- the better the LLM generalizes
scaling law = power law = \(y(x) = ax^{-\gamma} +b\)
\(y(x) =\) test loss
\(\gamma\) = slope

Baidu paper 2017

Scaling laws for Neural LM 2020

Open-AI 2020

RL, protein, chemistry…

Chinchilla paper 2022

GPT3 2020: inc. model capacity
Chinchilla 2022: inc. data

Google 2022: paper1, paper2 Flops, Upstream (pretraining), Downstream (acc on 17 tasks), Params

Emerging capabilities

Scaling laws exist in Machine Learning for a long time (cf. Paper on learning curves)
But it’s the first time they result from emerging capabilities!

Anthropic paper 2022

shows that the scaling law results from combination of emerging capabilities

Jason Wei has exhibited 137 emerging capabilities:

Emergence of structures

Training \(\rightarrow\) multiple phase transitions
When representations becomes structured, then generalization occurs:

Why do we observe phases during training?
- NYU paper 2023
- Because of competing sub-networks: dense for memorization and another sparse for generalization

Part 2: outside the LLM

Life cycle of LLM

Open-source community

Extremely important for LLMs:
- Main contributors in: pretraining, finetuning, model merging, dissemination, efficiency, evaluation
- “We have no moat” (Google, 2023)

Remaining challenges

LLM energy cost
- Training vs. using locally vs. LLMaaS
- LLM pruning, compression, distillation…
Integrating LLMs into systems
- RAG, LLM agents, tools using, function calling…

Join the discussion: cerisara@loria.fr, @cerisara@mastodon.online