AI safety takes
Subscribe
Sign in
Share this discussion
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
newsletter.danielpaleka.com
Copy link
Facebook
Email
Note
Other
August 2023 safety news: Universal attacks…
Daniel Paleka
Aug 27, 2023
3
Share this post
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
newsletter.danielpaleka.com
Copy link
Facebook
Email
Note
Other
Better version of the monthly Twitter thread.
Read →
Comments
Share
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
August 2023 safety news: Universal attacks…
August 2023 safety news: Universal attacks, Influence functions, Problems with RLHF
Better version of the monthly Twitter thread.