AI safety takes

Home
Archive
About

Sitemap - 2024 - AI safety takes

September/October 2024 safety news: Jailbreaks on robots, Breaking unlearning, Forecasting evals

July/August 2024 safety news: Tamper resistance, Fluent jailbreaks, Scaling limits

May/June 2024 safety news: Out-of-context reasoning, Sparse autoencoders, Interpreting CLIP

March/April 2024 safety news: Latent training, Emergent abilities, Instruction hierarchy

January/February 2024 safety news: Sleeper agents, In-context reward hacking, Universal neurons

© 2025 Daniel Paleka
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More