synalp.loria.fr

  • Research team: CNRS, Université de Lorraine
  • LORIA (Nancy), dept NLPK
  • created in 2012
  • 9 permanents, 12 PhD students

Research topics

Task force focused on LLM fundamentals

  • 4x PhD students + 2x engineers

We are training LLMs: Bloom

  • Leaders for training Bloom: Teven Le Scao, Angela Fan

We are training LLMs: Lucie-7b

  • 7b LLM from scratch
    • On respectful/ethical data
    • Training just finished
    • Better than Falcon-7b, Mistral-7b & Llama3.1-8b on French grammar

We are finetuning LLMs: Claire-7b

We are competing with LLMs

We obtained with BloomZ the best results without finetuning in the DEFT’2023 French Medical-QA evaluation campaign

Parmi les propositions suivantes, laquelle (lesquelles) est (sont) exacte(s)? Les chylomicrons plasmatiques:

  • a: Sont plus riches en cholestérol estérifié qu’en triglycérides
  • b: Sont synthétisés par le foie
  • c: Contiennent de l’apolipoprotéine B48
  • d: Contiennent de l’apolipoprotéine E
  • e: Sont transformés par action de la lipoprotéine lipase

We are compressing LLMs

  • Compress Whisper by 37%, speed X 1.46 with better results
  • Works with Phi3, Llama3, Mixtral, Mamba…

We are formalizing LLM properties

Proof that growing models during training improve generalization

We are explaining LLM algorithms

e.g. iterative magnitude pruning:

Chaire

  • Previous application domains:
    • French
    • Summarization
    • Emergency calls
  • A new domain: fragrance
    • Based on data available
  • Work on LLM fundamental challenges for adapation: cost, size, forgetting, generalization…

Current projects about LLMs:

  • PLM4All (leader): best practices for training LLMs on Jean Zay
  • LLM4All (leader): efficient LLM training (w/ LIX, Linagora, APHP, Huggingface)
  • OpenLLM-FR: training foundation LLMs
  • ENACT: France 2030: multimodal LLMs

Major industrial collaborations

  • Linagora: LLM training
  • Deezer: reading comprehension
  • Alcatel-Lucent: LLM integration
  • Crédit Mutuel: chatbots
  • Continental: predictive maintenance