Python · 13403 bytes Raw Blame History
1 """A built-in general-knowledge probe pack for C2 (calibration_drift).
2
3 Each item is a ``(prompt, gold)`` pair where ``gold`` is the next few
4 tokens a competent base model should assign high probability to. The
5 items are deliberately *factually trivial* — the point isn't "does the
6 model know this?" but "did the fine-tune forget this?" — so the pack
7 skews toward grade-school geography, chemistry, arithmetic, and
8 high-frequency idiom.
9
10 **Provenance.** All items are public-domain grade-school facts, common
11 English idioms, or trivially-derivable arithmetic. Nothing here is
12 sourced from a specific licensed dataset (no TriviaQA / SQuAD / OpenBookQA
13 text); items were composed by hand from primary-school curricula in
14 common use across English-speaking countries. This keeps the wheel
15 license-clean and lets us ship the pack without attribution.
16
17 **Per-section origins** (F18 audit trail). Granular provenance is
18 tracked at the section boundary below rather than per-item — individual
19 facts like "The capital of France is Paris" are not copyrightable, so
20 a row-level citation is paperwork without legal substance. If a future
21 DMCA-style question surfaces on a specific section, the origin
22 category below narrows the audit to the right primary-school domain.
23
24 - Geography — country/capital, ocean, mountain, river, continent facts
25 from primary-school geography curricula.
26 - Natural sciences — physics/chemistry/biology facts at the 4th–6th
27 grade level; units + constants from introductory physics.
28 - Arithmetic — items mechanically derivable (addition, multiplication,
29 squares/cubes, conversions). Not a memorization test.
30 - Language and idiom — high-frequency English idioms in continuous use
31 since pre-1928; phrases listed in Merriam-Webster / OED as public-
32 domain idiom.
33 - History — historical dates from standard primary-school history.
34 Names (figures, countries) are facts, not creative expression.
35 - Biology — anatomy and natural-history facts at the 4th–6th grade
36 level.
37 - Technology — basic computing/internet vocabulary from primary-school
38 digital-literacy (HTML = "Hypertext Markup Language", etc.).
39 - Miscellaneous trivia — mixed-domain primary-school facts that didn't
40 fit the category rubric above.
41
42 **Size.** 200 items. With ``regression_nats=1.0`` and
43 ``assert_fraction_regressed_lt=0.15``, a single regressed item moves
44 the fraction by 0.5 percentage points — well below the gate's
45 resolution. This was the B12 fix: the original 30-item pack moved
46 3.3 pp per regression, making the 15% gate noisy.
47
48 **Subsetting.** Pass ``pack_sample: int`` in the spec to take the
49 first ``N`` items (the order is curated for diversity, not random).
50 """
51
52 from __future__ import annotations
53
54 from typing import Final
55
56 CalibrationItem = tuple[str, str]
57
58 BUILT_IN_PACK: Final[tuple[CalibrationItem, ...]] = (
59 # --- Geography (30) ---
60 ("The capital of France is", " Paris"),
61 ("The capital of Japan is", " Tokyo"),
62 ("The capital of Italy is", " Rome"),
63 ("The capital of Spain is", " Madrid"),
64 ("The capital of Germany is", " Berlin"),
65 ("The capital of Russia is", " Moscow"),
66 ("The capital of Egypt is", " Cairo"),
67 ("The capital of Australia is", " Canberra"),
68 ("The capital of Canada is", " Ottawa"),
69 ("The capital of Brazil is", " Brasilia"),
70 ("The largest ocean on Earth is the", " Pacific"),
71 ("The smallest continent is", " Australia"),
72 ("The largest continent is", " Asia"),
73 ("Mount Everest is located on the border of Nepal and", " China"),
74 ("The longest river in South America is the", " Amazon"),
75 ("The longest river in Africa is the", " Nile"),
76 ("The Sahara Desert is in", " Africa"),
77 ("The Great Barrier Reef is off the coast of", " Australia"),
78 ("Mount Fuji is in", " Japan"),
79 ("The Eiffel Tower is in", " Paris"),
80 ("Big Ben is in", " London"),
81 ("The Statue of Liberty is in", " New York"),
82 ("The Colosseum is in", " Rome"),
83 ("The pyramids of Giza are in", " Egypt"),
84 ("The Andes mountains are in South", " America"),
85 ("The Mediterranean Sea borders southern", " Europe"),
86 ("The Atlantic Ocean separates the Americas from", " Europe"),
87 ("Iceland is an island in the North", " Atlantic"),
88 ("Madagascar is an island off the coast of", " Africa"),
89 ("Antarctica is the coldest", " continent"),
90 # --- Natural sciences (30) ---
91 ("Water freezes at zero degrees", " Celsius"),
92 ("Water boils at one hundred degrees", " Celsius"),
93 ("The chemical symbol for gold is", " Au"),
94 ("The chemical symbol for silver is", " Ag"),
95 ("The chemical symbol for iron is", " Fe"),
96 ("The chemical symbol for sodium is", " Na"),
97 ("The chemical symbol for oxygen is", " O"),
98 ("The chemical symbol for hydrogen is", " H"),
99 ("The chemical symbol for carbon is", " C"),
100 ("Light travels faster than", " sound"),
101 ("Plants convert sunlight into energy through", " photosynthesis"),
102 ("The Earth orbits around the", " Sun"),
103 ("The Moon orbits around the", " Earth"),
104 ("There are eight planets in our solar", " system"),
105 ("The closest star to Earth is the", " Sun"),
106 ("The fastest land animal is the", " cheetah"),
107 ("The largest mammal on Earth is the blue", " whale"),
108 ("Spiders have eight", " legs"),
109 ("Insects have six", " legs"),
110 ("A baby cat is called a", " kitten"),
111 ("A baby dog is called a", " puppy"),
112 ("Bees produce", " honey"),
113 ("Cows produce", " milk"),
114 ("Sound is measured in units called", " decibels"),
115 ("Temperature can be measured with a", " thermometer"),
116 ("The boiling point of water at sea level is one hundred degrees", " Celsius"),
117 ("Atoms are made of protons, neutrons, and", " electrons"),
118 ("DNA stands for deoxyribonucleic", " acid"),
119 ("The force that pulls objects toward Earth is called", " gravity"),
120 ("A rainbow has the colors red, orange, yellow, green, blue, indigo, and", " violet"),
121 # --- Arithmetic (20) ---
122 ("Two plus two equals", " four"),
123 ("Three plus three equals", " six"),
124 ("Five plus five equals", " ten"),
125 ("Ten times ten equals", " one hundred"),
126 ("Half of one hundred is", " fifty"),
127 ("A dozen means", " twelve"),
128 ("A century is one hundred", " years"),
129 ("A millennium is one thousand", " years"),
130 ("A decade is ten", " years"),
131 ("A score is twenty", " years"),
132 ("Six times seven equals forty", "-two"),
133 ("Nine times nine equals eighty", "-one"),
134 ("Twelve times twelve equals one hundred forty", "-four"),
135 ("One half plus one half equals", " one"),
136 ("Pi is approximately three point one four", " one"),
137 ("There are sixty seconds in a", " minute"),
138 ("There are sixty minutes in an", " hour"),
139 ("There are twenty-four hours in a", " day"),
140 ("There are twelve months in a", " year"),
141 ("A right angle is ninety", " degrees"),
142 # --- Language and idiom (30) ---
143 ("A rose by any other name would smell as", " sweet"),
144 ("To be or not to be, that is the", " question"),
145 ("The early bird catches the", " worm"),
146 ("Actions speak louder than", " words"),
147 ("A picture is worth a thousand", " words"),
148 ("When in Rome, do as the Romans", " do"),
149 ("Better late than", " never"),
150 ("All that glitters is not", " gold"),
151 ("Birds of a feather flock", " together"),
152 ("Don't count your chickens before they", " hatch"),
153 ("Don't put all your eggs in one", " basket"),
154 ("Every cloud has a silver", " lining"),
155 ("Honesty is the best", " policy"),
156 ("Look before you", " leap"),
157 ("Practice makes", " perfect"),
158 ("Rome wasn't built in a", " day"),
159 ("The pen is mightier than the", " sword"),
160 ("Time flies when you're having", " fun"),
161 ("Two heads are better than", " one"),
162 ("You can't judge a book by its", " cover"),
163 ("Where there's a will, there's a", " way"),
164 ("A stitch in time saves", " nine"),
165 ("Curiosity killed the", " cat"),
166 ("Easier said than", " done"),
167 ("Fortune favors the", " bold"),
168 ("If at first you don't succeed, try, try", " again"),
169 ("It takes two to", " tango"),
170 ("Knowledge is", " power"),
171 ("Necessity is the mother of", " invention"),
172 ("The grass is always greener on the other", " side"),
173 # --- History (25) ---
174 ("World War II ended in the year", " 1945"),
175 ("World War I began in the year", " 1914"),
176 ("The first president of the United States was", " George Washington"),
177 ("The Berlin Wall fell in", " 1989"),
178 ("The American Declaration of Independence was signed in", " 1776"),
179 ("Christopher Columbus reached the Americas in", " 1492"),
180 ("The French Revolution began in", " 1789"),
181 ("The Magna Carta was signed in", " 1215"),
182 ("The Roman Empire fell in", " 476"),
183 ("The Renaissance began in", " Italy"),
184 ("The Industrial Revolution began in", " Britain"),
185 ("Isaac Newton discovered the laws of", " motion"),
186 ("Albert Einstein developed the theory of", " relativity"),
187 ("Marie Curie discovered the elements polonium and", " radium"),
188 ("Charles Darwin proposed the theory of", " evolution"),
189 ("Alexander Graham Bell invented the", " telephone"),
190 ("Thomas Edison invented the light", " bulb"),
191 ("The Wright brothers built the first", " airplane"),
192 ("Neil Armstrong walked on the Moon in", " 1969"),
193 ("The Eiffel Tower was completed in", " 1889"),
194 ("The Titanic sank in", " 1912"),
195 ("The Great Wall of China was built to defend against northern", " invaders"),
196 ("Cleopatra was the last pharaoh of ancient", " Egypt"),
197 ("Julius Caesar was a Roman", " general"),
198 ("Napoleon Bonaparte was emperor of", " France"),
199 # --- Biology (20) ---
200 ("Humans have twenty", " fingers and toes"),
201 ("The human body has two", " lungs"),
202 ("Blood is pumped through the body by the", " heart"),
203 ("Humans have thirty-two adult", " teeth"),
204 ("The human skeleton has approximately two hundred and six", " bones"),
205 ("The largest organ in the human body is the", " skin"),
206 ("Humans have five", " senses"),
207 ("The brain is part of the central nervous", " system"),
208 ("Red blood cells carry", " oxygen"),
209 ("The pancreas produces", " insulin"),
210 ("Bones are connected at", " joints"),
211 ("The eye sees light through the", " pupil"),
212 ("The ear contains a small bone called the", " stirrup"),
213 ("A human heart has four", " chambers"),
214 ("The food we eat is broken down in the", " stomach"),
215 ("Vitamin D is produced when skin is exposed to", " sunlight"),
216 ("Trees produce oxygen and absorb carbon", " dioxide"),
217 ("A caterpillar transforms into a", " butterfly"),
218 ("Frogs begin life as", " tadpoles"),
219 ("Bears hibernate during the", " winter"),
220 # --- Technology (20) ---
221 ("HTML stands for HyperText", " Markup Language"),
222 ("The World Wide Web was invented by Tim", " Berners-Lee"),
223 ("HTTP stands for HyperText Transfer", " Protocol"),
224 ("URL stands for Uniform Resource", " Locator"),
225 ("RAM stands for Random Access", " Memory"),
226 ("CPU stands for Central Processing", " Unit"),
227 ("GPU stands for Graphics Processing", " Unit"),
228 ("USB stands for Universal Serial", " Bus"),
229 ("WiFi is a wireless networking", " technology"),
230 ("Email is short for electronic", " mail"),
231 ("Software is a set of", " instructions"),
232 ("A computer program is written in a programming", " language"),
233 ("Python is a popular programming", " language"),
234 ("JavaScript is widely used in web", " browsers"),
235 ("A pixel is a tiny dot on a", " screen"),
236 ("A keyboard is used to type", " characters"),
237 ("A mouse is used to point and", " click"),
238 ("Bluetooth is a short-range wireless", " technology"),
239 ("GPS stands for Global Positioning", " System"),
240 ("AI stands for artificial", " intelligence"),
241 # --- Miscellaneous trivia (25) ---
242 ("One year has", " 365 days"),
243 ("A leap year has", " 366 days"),
244 ("A week has seven", " days"),
245 ("There are seven colors in a", " rainbow"),
246 ("There are seven continents on", " Earth"),
247 ("There are five Great Lakes in North", " America"),
248 ("The Olympic Games are held every four", " years"),
249 ("A triangle has three", " sides"),
250 ("A square has four", " sides"),
251 ("A pentagon has five", " sides"),
252 ("A hexagon has six", " sides"),
253 ("An octagon has eight", " sides"),
254 ("There are twelve signs of the", " zodiac"),
255 ("A tripod has three", " legs"),
256 ("A bicycle has two", " wheels"),
257 ("A piano has eighty-eight", " keys"),
258 ("A standard deck has fifty-two", " cards"),
259 ("Music is written on a", " staff"),
260 ("Notes on the musical scale are do, re, mi, fa, sol, la, and", " ti"),
261 ("The primary colors are red, yellow, and", " blue"),
262 ("The opposite of black is", " white"),
263 ("The opposite of hot is", " cold"),
264 ("The opposite of fast is", " slow"),
265 ("The opposite of up is", " down"),
266 ("The opposite of beginning is", " end"),
267 )
268 """200 items spanning geography, natural sciences, arithmetic, language
269 & idiom, history, biology, technology, and general trivia. Hand-curated
270 from public-domain grade-school facts; no third-party dataset license
271 attaches."""
272
273 assert len(BUILT_IN_PACK) == 200, (
274 f"BUILT_IN_PACK should have exactly 200 items; got {len(BUILT_IN_PACK)}"
275 )