Shall we play a game?
In my continuing saga of messing around with OpenAI‘s ChatGPT-4’s Artificial Intelligence (AI) Large Language Model (LLM), I have noticed that its not always what you ask it, but rather how you phrase your questions
ChatGPT-4 rightly has a number of safeguards in place to prevent it from providing information that can be used for negative outcomes. A good example is asking it how to create a weapon such as a Molotov cocktail.
Lets take this exchange as a good example of its guardrails working correctly:
This seems to be a reasonable, well reasoned response within its guidelines. ChatGPT refuses to tell me how to make a Molotov Cocktail, good!
Now, let’s screw with it a bit and frame the question as a historical question. This is the point where things go a bit sideways:
As you can see, it essentially instructs me how to create a Molotov in paragraph two. By explaining how things were created historically, ChatGPT has told me how to make the incendiary device, bypassing its guardrails.
ChatGPT to the rescue?
So, I figured, what the hell, let me ask it about blackpowder aka gunpowder, for educational purposes only of course!
Now, just for completion and accuracy, I asked it simply how to make gunpowder. This is something it absolutely should not have done above.
This is the expected output based on what we know about ChatGPT4’s “rules of the road”. ChatGPT’s System Card documents these rules of the road for its LLM.
These examples are purposefully innocuous as I didn’t want to inadvertently provide instructions on how to create truly dangerous weapons. I want to emphasize is the same principles could be leveraged to have similar results with different more dangerous prompts.