Reference
AI risk glossary
Plain-language definitions of the terms used across AIpocalypse Now. Cross-referenced with the methodology and the doom scale.
- AGI Artificial General Intelligence
- An AI system capable of performing any intellectual task at or above human level across a broad range of domains. Whether or not it exists yet is contested.
- AI Doom Score doom score, doom rating
- A 1 to 10 integer rating applied to a single AI news story by the AIpocalypse Risk-Weighting Framework. See the full doom-scale definition.
- Alignment
- The problem of making advanced AI systems pursue goals that are intentionally and verifiably aligned with human values. Unsolved at frontier scale.
- Alignment tax
- The performance or capability cost incurred by making a model safer or more aligned. The lower the tax, the more likely safety measures will actually be deployed.
- ARWF AIpocalypse Risk-Weighting Framework
- The four-dimensional rubric used on this site to compute a single doom score per story. Dimensions: existential criticality, probability vectoring, timeline imminence, mitigation gap.
- Catastrophic risk
- An event causing harm at scale large enough to require multi-year recovery, but not necessarily extinction-level. Sits at doom 7 to 9 on the AIpocalypse Doom Scale.
- Existential risk x-risk
- A risk that could end humanity, or permanently and drastically curtail its potential. Doom 9 to 10 on the AIpocalypse Doom Scale.
- Frontier model
- A model at or near the current capability ceiling. Frontier models tend to drive most doom-relevant news because their behaviour is least predictable.
- Jailbreak
- A prompt or technique that bypasses a model's safety policies, causing it to produce content the operator intended to block.
- Mesa-optimization
- A subprocess inside a trained model that itself optimizes for an objective, possibly different from the one the system was trained on. Implicated in alignment failure modes.
- Mitigation gap
- ARWF dimension. Distance between a known risk and a deployed solution. A wide gap raises the doom score.
- P(doom)
- A single number, between zero and one, representing one person's belief about the probability of an existentially catastrophic outcome from AI. Distinct from the per-story doom score used here.
- Red-teaming
- Adversarial testing of an AI system to surface failure modes, jailbreaks, or misuse vectors before deployment.
- RLHF Reinforcement Learning from Human Feedback
- A training method where a model is fine-tuned against human preferences. The dominant alignment technique in production today, with known limits.
- Sycophancy
- A model's tendency to agree with the user even when they are wrong. Common RLHF artifact.
- Timeline imminence
- ARWF dimension. How close the risk in a story is to current deployment. A demonstration today raises the score more than a paper about something theoretical.