\[\alpha = Softmax(QK^T)\]
\[ \left[ \begin{array}{ccc} - & q_1 & - \\ - & \vdots & - \\ - & q_T & - \\ \end{array} \right] \left[ \begin{array}{ccc} | & \cdots & | \\ k_1 & \cdots & k_T \\ | & \cdots & | \\ \end{array} \right] \]
\[\alpha = Softmax\left( \frac{QK^T}{\sqrt{d_k}} \right) \]
We give each word a repr \(V_{w}\)
Pool them into a matrix for the sentence: \(V = [V_{w_1},\dots,V_{w_N}]^T\)
We replace every target word repr by the average over its context (weighted by \(\alpha\))
\[ X' = Softmax\left( \frac{QK^T}{\sqrt{d_k}} \right) V\]
\[p_t^{(i)} = \begin{cases} sin(w_k \cdot t), \text{if }i=2k \\ cos(w_k \cdot t), \text{if }i=2k+1 \end{cases}\]
with \(d\) encoding dim and
\[w_k = \frac 1 {10000^{2k/d}}\]
Principle:
Which tasks ?
Multilingual models:
Training steps:
Key ingredients to success:
Designing a good prompt is an art:
“Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy”
“A is the son’s of B’s uncle. What is the family relationship between A and B?”
“A is the son’s of B’s uncle. What is B for A?”
This is a novel written in the style of J.R.R. Tolkien’s Lord of the Rings fantasy novel trilogy. It is a parody of the following passage:
“S. Jane Morland was born in Shoreditch …”
Tolkien rewrote the previous passage in a high-fantasy style, keeping the same meaning but making it sound like he wrote it as a fantasy; his parody follows:
English: Writing about language models is fun. Roish: Writingro aboutro languagero modelsro isro funro. English: The weather is lovely! Roish:
CoT requires large models:
Analogy solving:
Directions: In the following question, a related pair of
words or phrases is followed by five pairs of words or
phrases. Choose the pair that best expresses a relationship
similar to that in the original pair.
braggart :: modesty
A) fledgling : experience
B) embezzler : greed
C) wallflower : timidity
D) invalid : malady
E) candidate : ambition
To solve this problem, first we need to understand the
relationship that exists between braggart and modesty.
According to the sentence, 'braggart' is a person who talks
too much about himself or herself and is usually not
believed. On the other hand, 'modesty' is the opposite of
this and denotes a person who does not talk too much about
himself or herself. Thus, for 'modesty' to be a suitable
answer, it should show the opposite of 'braggart'.
Now let's see whether each pair expresses a relationship
similar to that between braggart and modesty.
Next we have 'fledgling', which means a person who is
inexperienced and 'experience' which means knowledge gained
through practical involvement. Thus, 'fledgling' is a person
who has no experience and 'experience' is knowledge gained
through practical involvement. Thus, 'fledgling' is the
opposite of 'experience'. The relationship between these two
words is similar to that between braggart and modesty, hence
'fledgling' is the answer.
More and more specialized:
Baidu paper 2017
Open-AI 2020
Chinchilla paper 2022
\(L=\) pretraining loss
Google 2022: paper1, paper2 Flops, Upstream (pretraining), Downstream (acc on 17 tasks), Params
GPT3 paper 2020
"manger" devient "mangera"
"parler" devient "parlera"
"voter" devient
Anthropic paper 2022
Jason discovered 137 emerging capabilities:
During training, they may “abruptly” re-structure the latent representation space
There’s some hope though…
Great pedagogical point of view about LLM by Sasha Rush: video
Q001 | Q002 | Q003 | Q004 | Q005 | Q006 | Q007 | Q008 | Q009 | Q010 |
89% | 50% | 86% | 50% | 36% | 7% | 54% | 36% | 86% | 54% |
Q011 | Q012 | Q013 | Q014 | Q015 | Q016 | Q017 | Q018 | Q019 | |
57% | 79% | 71% | 29% | 61% | 96% | 68% | 82% | 18% |
1 Which type of semantic representation scheme is WordNet? - shallow - logical + network-based
2 Which one is not a latent feature-based semantic representation? + one-hot embeddings - Random Indexing - LDA
3 Given a synset x, how do you find the gloss? + x.definition() - x.gloss() - x.lemmas()[0].gloss()
4 What do you need to inspect to get all synonyms of a word? + multiple lemmas in one or more synsets - multiple lemmas in one synset - multiple relations from one or more synsets
5 How does NLTK give access to relations from a synset? + through methods of a dedicated class - through global methods of the wordnet package with the synset passed as string argument
6 Which structure does the hypernym relation create? - a tree + a graph - a set
7 How can you know all relations accessible from synset x? + help(x) - type(x) - x.relations()
8 You want only nouns; which command is wrong? - wn.synsets("dry",pos=wn.NOUN) + [l for l in wn.synsets("dry") if type(l)==wn.NOUNS] - [l for l in wn.synsets("dry") if l.pos()=='n']
9 Which sentence is correct? - Every lemma in WordNet occurs in at least one example sentence - Every synset has at least one antonym + Every synset has at least one lemma
10 Which method gives you the hypernyms up to the most abstract synset in wordnet? - hypernyms() - root\_hypernyms() + hypernym\_paths()
11 Which relation is the most important one for adjectives in WordNet? - hypernyms - hyponyms + antonyms
12 Which pseudo-code compute all antonyms of a word w? + for x in synsets(w): for y in x.lemmas(): for z in y.antonyms(): accumulate z - for x in synsets(w): for y in x.antonyms(): accumulate y - for x in synsets(w): for y in x.antonyms(): for z in y.lemmas(): accumulate z
13 Does 'for s in wn.synsets()' give you all synsets in WordNet? + no - yes
14 How can you get all synsets with animals (and only them)? + with lexicographer files - by filtering the output of wn.all\_synsets() with synset animal attribute - by computing the transitive closure of "living being" through hyponym relation
15 x is a synset. When is x.min\_depth() == x.max\_depth()? + when len(x.hypernym\_paths())==1 - when len(x.hypernyms())==1 - when x.path\_similarity(x.root\_hypernyms()[0])==1
16 What does WordNet's lemma.count() method returns? - the percentage of occurrence of the lemma in English + the number of occurrences of the lemma in WordNet's examples - the number of relations between this lemma and others
17 Which expression is wrong? - wn.lemma('nice.a.01.nice').count() - wn.lemma('nice.a.01.nice').antonyms()[0].count() + wn.lemma('nice.a.01.nice').synset().count()
18 which synset corresponds to the definition: "conscientious activity intended to do or accomplish something" ? + attempt.n.01 - try.v.01 - undertake.v.01
19 (difficult) You want to get all hypernyms of synset x, up to the root of the hierarchy. Which method returns the largest number of elements? + flatten x.hypernym\_paths() and remove duplicates - compute the transitive closure with x.closure(lambda a: a.hypernyms())) - flatten x.root\_hypernyms() and remove duplicates