Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

4th

(453 posts)
Tue Aug 26, 2025, 10:22 AM Aug 2025

One long sentence is all it takes to make LLMs misbehave

One long sentence is all it takes to make LLMs misbehave
https://www.theregister.com/2025/08/26/breaking_llms_for_fun/

Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple.

You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out.

The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.

"Our research introduces a critical concept: the refusal-affirmation logit gap," researchers Tung-Ling "Tony" Li and Hongliang Liu explained in a Unit 42 blog post. "This refers to the idea that the training process isn't actually eliminating the potential for a harmful response – it's just making it less likely. There remains potential for an attacker to 'close the gap,' and uncover a harmful response after all."

...


2 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
One long sentence is all it takes to make LLMs misbehave (Original Post) 4th Aug 2025 OP
Very interesting Henry203 Aug 2025 #1
Not directly related but Disaffected Aug 2025 #2

Disaffected

(6,575 posts)
2. Not directly related but
Tue Aug 26, 2025, 11:14 AM
Aug 2025

Last edited Tue Aug 26, 2025, 03:55 PM - Edit history (1)

I'm having my first extended experienced with AI (ChatGPT-5) using it to write code for a web site I'm developing. I have found it to be both amazingly capable and aggravatingly stupid. The upshot is that it was a great help in composing the basic code for the web site but very difficult and time consuming dealing with debugging and adding changes to the code (for instance, it keeps for some reason reintroducing code bugs that were resolved before and, doing repeated unhelpful things that were not requested).

There is great potential here IMO but quite a ways to go before it becomes mainstream.

Kick in to the DU tip jar?

This week we're running a special pop-up mini fund drive. From Monday through Friday we're going ad-free for all registered members, and we're asking you to kick in to the DU tip jar to support the site and keep us financially healthy.

As a bonus, making a contribution will allow you to leave kudos for another DU member, and at the end of the week we'll recognize the DUers who you think make this community great.

Tell me more...

Latest Discussions»General Discussion»One long sentence is all ...