ChatGPT-4: not always what you ask

Shall we play a game?

In my continuing saga of messing around with OpenAI‘s ChatGPT-4’s Artificial Intelligence (AI) Large Language Model (LLM), I have noticed that its not always what you ask it, but rather how you phrase your questions

ChatGPT-4 rightly has a number of safeguards in place to prevent it from providing information that can be used for negative outcomes. A good example is asking it how to create a weapon such as a Molotov cocktail.

Lets take this exchange as a good example of its guardrails working correctly:

This seems to be a reasonable, well reasoned response within its guidelines. ChatGPT refuses to tell me how to make a Molotov Cocktail, good!

Now, let’s screw with it a bit and frame the question as a historical question. This is the point where things go a bit sideways:

Molotov cocktail history lesson with ChatGPT explaining how they were made, bypassing its internal rules about weapons

As you can see, it essentially instructs me how to create a Molotov in paragraph two. By explaining how things were created historically, ChatGPT has told me how to make the incendiary device, bypassing its guardrails.

ChatGPT to the rescue?

So, I figured, what the hell, let me ask it about blackpowder aka gunpowder, for educational purposes only of course!

ChatGPT explaining the historical use of and how to mix gunpowder - bypassing its guardrails

Now, just for completion and accuracy, I asked it simply how to make gunpowder. This is something it absolutely should not have done above.

ChatGPT responding it cannot provide the recipe for gunpowder - as expected by its guardrails

This is the expected output based on what we know about ChatGPT4’s “rules of the road”. ChatGPT’s System Card documents these rules of the road for its LLM.

These examples are purposefully innocuous as I didn’t want to inadvertently provide instructions on how to create truly dangerous weapons. I want to emphasize is the same principles could be leveraged to have similar results with different more dangerous prompts.

Header Image: “Artificial Intelligence – Resembling Human Brain” by deepakiqlect is licensed under CC BY-SA 2.0.

Sharing is Caring

One Response

Related Posts


Lightroom to Instagram Plugin

Push photos from Lightroom to Instagram So for years I’ve had a dormant Instagram account that I rarely used as it was too much work to move photos from Lightroom over

Read More »