Slides: www.cerisara.fr
Christophe Cerisara: cerisara@loria.fr
Year | Authors | Contribution |
---|---|---|
2014 | Graves et al | attention for Neural Turing Machines |
2014 | Bahdanau attention | application to NLP |
2015 | Luong attention | application to NLP |
2015 | Xu et al. | soft/global & hard/local |
2016 | Cheng, Dong and Lapata | self-att LSTMN |
2017 | Vaswani et al. | transformer, 120k citations |
KQV translation
\[\begin{equation} {\scriptsize{ \alpha_i = \frac {\exp(\text{score}(q,k_i))} {\sum_j \exp(\text{score}(q,k_j))} }} \end{equation}\] \[\begin{equation} {\scriptsize{ v' = \sum_i \alpha_i v_i }} \end{equation}\]
name | score | ref |
---|---|---|
content-based | cosine\((q,k)\) | Graves14 |
additive | \(v^T \tanh (W[q,k])\) | Bahdanau15 |
location-based | \(\alpha = \text{softmax}(Wq)\) | Luong15 |
general | \(q^T W k\) | Luong15 |
dot-product | \(q^T k\) | Luong15 |
scaled \(\cdot\) | \(\frac {q^t k} {\sqrt{d}}\) | Vaswani17 |
Self-attention with scaled dot-product:
\[V'=\text{softmax}\left(\frac {QK^T}{\sqrt{d}}\right)V\]
from https://slds-lmu.github.io/seminar_nlp_ss20/attention-and-self-attention-for-nlp.html
Layer | Complexity | Seq. op |
---|---|---|
recurrent | \(O(nd^2)\) | \(O(n)\) |
conv | \(O(knd^2)\) | \(O(1)\) |
transformer | \(O(n^2d)\) | \(O(1)\) |
sparse transf | \(O(n\sqrt{n})\) | \(O(1)\) |
reformer | \(O(n\log n)\) | \(O(\log (n))\) |
linformer | \(O(n)\) | \(O(1)\) |
linear transf. | \(O(n)\) | \(O(1)\) |
\[Q,K \in R^{N\times d}\] \[QK^T \in R^{N\times N}\]
matmul on python: 0.042 GFLops
numpy (FORTRAN): 29 GFlops
reimplementation of numpy in C++: 47 GFlops
BLAS with multithreading: 85 GFlops
llama.cpp (focus matrix-vec): 233 GFlops
Intel’s MKL (closed source): 384 GFlops
OpenMP(512x512 matrix): 810 GFlops
exported in llamafile: 790 GFlops
(Vaswany et al., Google, 2017)
\[p_t^{(i)} = \begin{cases} sin(w_k \cdot t), \text{if }i=2k \\ cos(w_k \cdot t), \text{if }i=2k+1 \end{cases}\]
with \(d\) encoding dim and
\[w_k = \frac 1 {10000^{2k/d}}\]
Baidu paper 2017
Open-AI 2020
Chinchilla paper 2022
\(L=\) pretraining loss
Google 2022: paper1, paper2 Flops, Upstream (pretraining), Downstream (acc on 17 tasks), Params
GPT3 paper 2020
"manger" devient "mangera"
"parler" devient "parlera"
"voter" devient
Anthropic paper 2022
Jason Wei has exhibited 137 emerging capabilities:
During training, they may abruptly reorganize their latent representation space
Principle:
Key ingredients to success:
Data P., model sharding, Tensor P., Sequence P., pipeline P…
(from huggingface)
\[h_t = Ah_{t-1} + Bx_t\] \[y_t = Ch_t + D x_t\]
(from PMC24)
See course on Wednesday afternoon!
Workflow:
<OBJECTIVE_AND_PERSONA>
You are a [insert a persona, such as a "math teacher" or "automotive expert"]. Your task is to...
</OBJECTIVE_AND_PERSONA>
<INSTRUCTIONS>
To complete the task, you need to follow these steps:
1.
2.
...
</INSTRUCTIONS>
------------- Optional Components ------------
<CONSTRAINTS>
Dos and don'ts for the following aspects
1. Dos
2. Don'ts
</CONSTRAINTS>
<CONTEXT>
The provided context
</CONTEXT>
<OUTPUT_FORMAT>
The output format must be
1.
2.
...
</OUTPUT_FORMAT>
<FEW_SHOT_EXAMPLES>
Here we provide some examples:
1. Example #1
Input:
Thoughts:
Output:
...
</FEW_SHOT_EXAMPLES>
<RECAP>
Re-emphasize the key aspects of the prompt, especially the constraints, output format, etc.
</RECAP>
TASK:
Classify the OBJECTS.
CLASSES:
- Large
- Small
OBJECTS:
- Rhino
- Mouse
- Snail
- Elephant
What is the most likely interpretation of this sentence? Explain your reasoning. The sentence: “The chef seasoned the chicken and put it in the oven because it looked pale.”
Extract the main issues and sentiments from the customer feedback on our telecom services.
Focus on comments related to service disruptions, billing issues, and customer support interactions.
Please format the output into a list with each issue/sentiment in a sentence, separated by semicolon.
Input: CUSTOMER_FEEDBACK
Classify the extracted issues into categories such as service reliability, pricing concerns, customer support quality, and others.
Please organize the output into JSON format with each issue as the key, and category as the value.
Input: TASK_1_RESPONSE
Generate detailed recommendations for each category of issues identified from the feedback.
Suggest specific actions to address service reliability, improving customer support, and adjusting pricing models, if necessary.
Please organize the output into a JSON format with each category as the key, and recommendation as the value.
Input: TASK_2_RESPONSE
- Vanilla prompting
- Chain-of-thought (CoT)
- Self-consistency
- Ensemble refinment
- Automatic chain-of-thought (Auto-CoT)
- Complex CoT
- Program-of-thoughts (PoT)
- Least-to-Most
- Chain-of-Symbols (CoS)
- Structured Chain-of-Thought (SCoT)
- Plan-and-solve (PS)
- MathPrompter
- Contrastive CoT/Contrastive self-consistency
- Federated Same/Different Parameter self-consistency/CoT
- Analogical reasoning
- Synthetic prompting
- Tree-of-toughts (ToT)
- Logical Thoughts (LoT)
- Maieutic Prompting
- Verify-and-edit
- Reason + Act (ReACT)
- Active-Prompt
- Thread-of-thought (ThOT)
- Implicit RAG
- System 2 Attention (S2A)
- Instructed prompting
- Chain-of-Verification (CoVe)
- Chain-of-Knowledge (CoK)
- Chain-of-Code (CoC)
- Program-Aided Language Models (PAL)
- Binder
- Dater
- Chain-of-Table
- Decomposed Prompting (DeComp)
- Three-Hop reasoning (THOR)
- Metacognitive Prompting (MP)
- Chain-of-Event (CoE)
- Basic with Term definitions
- Basic + annotation guideline + error-analysis