AI safety takes

Share this post

User's avatar
AI safety takes
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
Copy link
Facebook
Email
Notes
More

August 2023 safety news: Universal attacks…

Daniel Paleka
Aug 27, 2023
3

Share this post

User's avatar
AI safety takes
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
Copy link
Facebook
Email
Notes
More

Better version of the monthly Twitter thread.

Read →
Comments
User's avatar
© 2025 Daniel Paleka
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More