I Pwned Gandalf And I Liked It
I pwned Gandalf from Laker AI. It is a fantastic little prompt hacking challenge they have built based on OpenAI's GPT.
In this you attempt to bypass the controls Lakera's Blue Team have built to prevent prompt injection, protection of confidential information, etc.
I got through the first 6th levels relatively easily, but level 7 was very challenging. I had to take a few hours away time to regroup.
It was great fun, and what they built showed off a number of defensive techniques along with those defensive technique limitations.
This is a well built system, but still demonstrates how difficult it is to protect GPT based AIs.
Keep this in mind when you build tools and/or apps on top of any form of GPT
Try it out for yourself at: https://gandalf.lakera.ai/
Pictures, or it didn't happen, right?