Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 17, 2024 12:20 am

'Crescendo' Method Can Jailbreak LLMs Using Seemingly Benign Prompts

spatwei shares a report from SC Magazine: Microsoft has discovered a new method to jailbreak large language model (LLM) artificial intelligence (AI) tools and shared its ongoing efforts to improve LLM safety and security in a blog post Thursday. Microsoft first revealed the "Crescendo" LLM jailbreak method in a paper published April 2, which describes how an attacker could send a series of seemingly benign prompts to gradually lead a chatbot, such as OpenAI's ChatGPT, Google's Gemini, Meta's LlaMA or Anthropic's Claude, to produce an output that would normally be filtered and refused by the LLM model. For example, rather than asking the chatbot how to make a Molotov cocktail, the attacker could first ask about the history of Molotov cocktails and then, referencing the LLM's previous outputs, follow up with questions about how they were made in the past. The Microsoft researchers reported that a successful attack could usually be completed in a chain of fewer than 10 interaction turns and some versions of the attack had a 100% success rate against the tested models. For example, when the attack is automated using a method the researchers called "Crescendomation," which leverages another LLM to generate and refine the jailbreak prompts, it achieved a 100% success convincing GPT 3.5, GPT-4, Gemini-Pro and LLaMA-2 70b to produce election-related misinformation and profanity-laced rants. Microsoft reported the Crescendo jailbreak vulnerabilities to the affected LLM providers and explained in its blog post last week how it has improved its LLM defenses against Crescendo and other attacks using new tools including its "AI Watchdog" and "AI Spotlight" features.

Read more of this story at Slashdot.


Original Link: https://slashdot.org/story/24/04/16/2341254/crescendo-method-can-jailbreak-llms-using-seemingly-benign-prompts?utm_source=rss1.0mainlinkanon&utm_mediu

Share this article:    Share on Facebook
View Full Article

Slashdot

Slashdot was originally created in September of 1997 by Rob "CmdrTaco" Malda. Today it is owned by Geeknet, Inc..

More About this Source Visit Slashdot