Reference

AI risk glossary

Plain-language definitions of the terms used across AIpocalypse Now. Cross-referenced with the methodology and the doom scale.

AGI Artificial General Intelligence: An AI system capable of performing any intellectual task at or above human level across a broad range of domains. Whether or not it exists yet is contested.
AI Doom Score doom score, doom rating: A 1 to 10 integer rating applied to a single AI news story by the AIpocalypse Risk-Weighting Framework. See the full doom-scale definition.
Alignment: The problem of making advanced AI systems pursue goals that are intentionally and verifiably aligned with human values. Unsolved at frontier scale.
Alignment tax: The performance or capability cost incurred by making a model safer or more aligned. The lower the tax, the more likely safety measures will actually be deployed.
ARWF AIpocalypse Risk-Weighting Framework: The four-dimensional rubric used on this site to compute a single doom score per story. Dimensions: existential criticality, probability vectoring, timeline imminence, mitigation gap.
Catastrophic risk: An event causing harm at scale large enough to require multi-year recovery, but not necessarily extinction-level. Sits at doom 7 to 9 on the AIpocalypse Doom Scale.
Existential risk x-risk: A risk that could end humanity, or permanently and drastically curtail its potential. Doom 9 to 10 on the AIpocalypse Doom Scale.
Frontier model: A model at or near the current capability ceiling. Frontier models tend to drive most doom-relevant news because their behaviour is least predictable.
Jailbreak: A prompt or technique that bypasses a model's safety policies, causing it to produce content the operator intended to block.
Mesa-optimization: A subprocess inside a trained model that itself optimizes for an objective, possibly different from the one the system was trained on. Implicated in alignment failure modes.
Mitigation gap: ARWF dimension. Distance between a known risk and a deployed solution. A wide gap raises the doom score.
P(doom): A single number, between zero and one, representing one person's belief about the probability of an existentially catastrophic outcome from AI. Distinct from the per-story doom score used here.
Red-teaming: Adversarial testing of an AI system to surface failure modes, jailbreaks, or misuse vectors before deployment.
RLHF Reinforcement Learning from Human Feedback: A training method where a model is fine-tuned against human preferences. The dominant alignment technique in production today, with known limits.
Sycophancy: A model's tendency to agree with the user even when they are wrong. Common RLHF artifact.
Timeline imminence: ARWF dimension. How close the risk in a story is to current deployment. A demonstration today raises the score more than a paper about something theoretical.

Doom scale Methodology Today's doom