Daniel Paleka's Newsletter

Home
Archive
About

Sitemap - 2024 - Daniel Paleka's Newsletter

September/October 2024 safety news: Jailbreaks on robots, Breaking unlearning, Forecasting evals

July/August 2024 safety news: Tamper resistance, Fluent jailbreaks, Scaling limits

May/June 2024 safety news: Out-of-context reasoning, Sparse autoencoders, Interpreting CLIP

March/April 2024 safety news: Latent training, Emergent abilities, Instruction hierarchy

January/February 2024 safety news: Sleeper agents, In-context reward hacking, Universal neurons

© 2026 Daniel Paleka · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture