OpenAI’s flagship AI model has gotten more trustworthy but easier to trick

OpenAI’s GPT-4 language model is more trustworthy but also more susceptible to jailbreaking and bias than its predecessor, GPT-3.5, according to a study backed by Microsoft. The research found GPT-4 better at protecting private information and resisting adversarial attacks, but also more likely to follow misleading information and tricky prompts. The vulnerabilities were not found in consumer-facing GPT-4-based products due to mitigation approaches applied in finished AI applications. The researchers aim to encourage further work to build on these findings and create more trustworthy models.

OpenAI’s flagship AI model has gotten more trustworthy but easier to trick

Related

When the Vending Machine Went Sentient

Constant-Time Breakthrough Raises the Hash-Table Speed Limit

Star Wars Reimagined: China’s Laser Satellite Outpaces Starlink

Court Rules AI’s Use of Books as Fair Use but Slams Pirated Collection Storage

Introducing the OWASP AI Testing Guide: A New Standard for AI Security Testing

The Low-Background Steel Problem of AI

Chinese AI Firms Dodge US Chip Bans with Cross-Border Data Smuggling to Malaysia

OpenAI open-sources a demo of a UI testing agent

Financial Dynamics in Agentic AI: Cursor’s Rise Versus GitHub Copilot