AI safety takes

Share this post

User's avatar
AI safety takes
January-February 2025 safety news: Emergent misalignment, SAE sanity checks, Utility engineering
Copy link
Facebook
Email
Notes
More

January-February 2025 safety news: Emergent…

Daniel Paleka
Mar 9
7

Share this post

User's avatar
AI safety takes
January-February 2025 safety news: Emergent misalignment, SAE sanity checks, Utility engineering
Copy link
Facebook
Email
Notes
More
1

Some papers I’ve learned something from recently, or where I have takes.

Read →
Comments
User's avatar
© 2025 Daniel Paleka
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More