Better version of the monthly Twitter thread. Will be back to the regular release frequency next month. Thanks to Charbel-Raphaël Segerie for the review. Are aligned neural networks adversarially aligned? Prompt injection is when LLMs do not follow the initial prompt due to adversarial instructions in the later part of the input.
June/July 2023 safety news: Jailbreaks, Transformer Programs, Superalignment
June/July 2023 safety news: Jailbreaks…
June/July 2023 safety news: Jailbreaks, Transformer Programs, Superalignment
Better version of the monthly Twitter thread. Will be back to the regular release frequency next month. Thanks to Charbel-Raphaël Segerie for the review. Are aligned neural networks adversarially aligned? Prompt injection is when LLMs do not follow the initial prompt due to adversarial instructions in the later part of the input.