Why have they cast me from heaven?

Shall we play a game?

In my continuing saga of messing around with OpenAI‘s ChatGPT-4’s Artificial Intelligence (AI) Large Language Model (LLM), I have noticed that its not always what you ask it, but rather how you phrase your questions

ChatGPT-4 rightly has a number of safeguards in place to prevent it from providing information that can be used for negative outcomes. A good example is asking it how to create a weapon such as a Molotov cocktail.

Lets take this exchange as a good example of its guardrails working correctly:

This seems to be a reasonable, well reasoned response within its guidelines. ChatGPT refuses to tell me how to make a Molotov Cocktail, good!

Now, let’s screw with it a bit and frame the question as a historical question. This is the point where things go a bit sideways:

Molotov cocktail history lesson with ChatGPT explaining how they were made, bypassing its internal rules about weapons

As you can see, it essentially instructs me how to create a Molotov in paragraph two. By explaining how things were created historically, ChatGPT has told me how to make the incendiary device, bypassing its guardrails.


ChatGPT to the rescue?

So, I figured, what the hell, let me ask it about blackpowder aka gunpowder, for educational purposes only of course!

ChatGPT explaining the historical use of and how to mix gunpowder - bypassing its guardrails

Now, just for completion and accuracy, I asked it simply how to make gunpowder. This is something it absolutely should not have done above.

ChatGPT responding it cannot provide the recipe for gunpowder - as expected by its guardrails

This is the expected output based on what we know about ChatGPT4’s “rules of the road”. ChatGPT’s System Card documents these rules of the road for its LLM.

These examples are purposefully innocuous as I didn’t want to inadvertently provide instructions on how to create truly dangerous weapons. I want to emphasize is the same principles could be leveraged to have similar results with different more dangerous prompts.


Header Image: “Artificial Intelligence – Resembling Human Brain” by deepakiqlect is licensed under CC BY-SA 2.0.


Related Non-Sequiturs

One response to “ChatGPT-4: not always what you ask”

John P. Hoke

John P. Hoke headshot

Cyber Security Professional, Photographer, Coffee Junkie, Mac Addict, Craft Beer & Whiskey connoisseur, all around curmudgeon and generally sarcastic SOB – Not necessarily in that order.

The opinions expressed on this blog are mine alone and not those of my employer, family, pets, the voices in my head, or anyone else for that matter … hell in an hour they may not be mine either πŸ™‚

Recent Posts

Affiliate Link: SmugMug Membership 15% off!