
3
Share this post

September/October 2023 safety news: Sparse autoencoders, A is B is not B is A, Image hijacks
newsletter.danielpaleka.com
New
Top
Community

3
Share this post

August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
newsletter.danielpaleka.com


2
Share this post

June/July 2023 safety news: Jailbreaks, Transformer Programs, Superalignment
newsletter.danielpaleka.com

Share this post

May 2023 safety news: Emergence, Activation engineering, GPT-4 explains GPT-2 neurons
newsletter.danielpaleka.com
3

3
Share this post

April 2023 safety news: Supervising AIs improving AIs, LLM memorization, OpinionQA
newsletter.danielpaleka.com
1
Share this post
March 2023 safety news: Natural selection of AIs, Waluigis, Anthropic agenda
newsletter.danielpaleka.com

AI safety takes
I read way too many papers.
Recommendations
Center for AI Safety
Share this publication
AI safety takes
newsletter.danielpaleka.com
By registering you agree to Substack's Terms of Service, our Privacy Policy, and our Information Collection Notice
© 2023 Daniel Paleka
Substack is the home for great writing