ASCII art elicits harmful responses from 5 major AI chatbots

Researchers have uncovered a novel hacking method targeting AI assistants by employing ASCII art, a technique that dates back to the 1970s. This method, dubbed ArtPrompt, involves using ASCII representations to mask a single word in a user prompt, tricking large language models (LLMs) like GPT-4 into providing responses they are typically programmed to reject, such as instructions for illegal activities.

The study revealed that when ASCII art is used to represent a word related to prohibited content, the AI fails to recognize the word and proceeds to generate a response that would normally be blocked. For instance, ASCII art depicting the word “counterfeit” led an AI to provide detailed steps on creating and distributing counterfeit money.

This vulnerability stems from LLMs prioritizing the recognition of ASCII art over adhering to safety protocols. The findings highlight a broader issue with AI’s understanding of context, as they are trained to interpret text semantically but can be misled by non-standard representations of words.

ArtPrompt represents a type of ‘jailbreak’ attack, which induces AI to perform actions against their alignment, such as engaging in illegal or unethical behavior. This discovery adds to the growing list of prompt injection attacks that exploit AI vulnerabilities, underscoring the need for more robust AI safety measures.
Read more at Ars Technica…

ASCII art elicits harmful responses from 5 major AI chatbots

Related

Why Passwords Aren’t the Problem—But How We Use Them Is

Claude 3.7 Sonnet Set to Expand Context Window to 500K Tokens

IngressNightmare: Critical Flaws in NGINX Controller Expose Kubernetes Clusters to RCE

Google’s Gemini 2.5 Pro Thinks Slower to Answer Smarter

In Pursuit of Efficiency: Rethinking AI with DeepSeek-V3-0324

AI-Generated Research: Charting New Territory in Peer-Reviewed Science

Awesome MCP Clients, A New Way To Interact With LLMs

Are We Living Inside a Spinning Black Hole?

The New OpenAI Responses API: A Technical Deep Dive