DeepSeek Fails 58% of the Jailbreak Tests by Qualys TotalAI

March 11, 2025: With growing enthusiasm around DeepSeek’s advancements, Qualys recently conducted a security analysis of the distilled DeepSeek-R1 LLaMA 8B variant using the company’s newly launched AI security platform, Qualys TotalAI. The DeepSeek model had a failure rate of 61% when tested against Qualys TotalAI’s Knowledge Base (KB) attacks and a failure rate of 58% when tested against Jailbreak attacks.

TotalAI KB Analysis

Qualys TotalAI’s KB Analysis prompts the target LLM with questions across 16 categories — including controversial topics, factual inconssitencies, hate speech and discrimination, legal information, privacy attacks, profanity and sensitive information disclosure — and evaluates the responses using Qualys’ Judge LLM. Responses are assessed for vulnerabilities, ethical concerns, and legal risks. If a response is deemed vulnerable, it receives a severity rating based on its directness and potential impact. This ensures a comprehensive assessment of the model’s behavior and associated risks.

In the Qualys KB testing, 891 assessments were conducted and the model failed 61% of the tests. The worst performing category was misalignment, where the model had a pass rate of just 8%. This was followed by controversial topics (13%), and factual inconsistencies (21%). On the other end of the spectrum, the model did prove to be very good at filtering out sexual content, passing 100% of the tests.

TotalAI Jailbreak Testing

Jailbreaking an LLM involves techniques that bypass built-in safety mechanisms, enabling the model to generate restricted responses. These vulnerabilities can result in harmful outputs, including instructions for illegal activities, misinformation, privacy violations, and unethical content. Successful jailbreaks expose weaknesses in AI alignment and present serious security risks, particularly in enterprise and regulatory settings.

The model was tested against 18 jailbreak types through 885 attacks. It failed 58% of these attempts, demonstrating significant susceptibility to adversarial manipulation. During the analysis, DeepSeek R1 struggled to prevent several adversarial jailbreak attempts, including steps on how to make an explosive device, create content for websites that encourage hate speech, conspiracy theories and violent action, exploiting software vulnerabilities, promoting incorrect medical information, etc.

“As AI adoption accelerates, organizations must move beyond performance evaluation to tackle security, safety, and compliance challenges. Gaining visibility into AI assets, assessing vulnerabilities, and proactively mitigating risks is critical to ensuring responsible and secure AI deployment,” commented Dilip Bachwani, CTO, Qualys. “Qualys TotalAI provides full visibility into AI workloads, proactively detects risks, and safeguards infrastructure. By identifying security threats like prompt injection and jailbreaks, as well as safety concerns such as bias and harmful language, TotalAI ensures AI models remain secure, compliant, and resilient. With AI-specific security testing and automated risk management, organizations can confidently secure, monitor, and scale their AI deployments.”

For detailed findings from the tests, industry implications and steps organizations can take to mitigate risks associated with use of DeepSeek models, please visit: https://blog.qualys.com/vulnerabilities-threat-research/2025/01/31/deepseek-failed-over-half-of-the-jailbreak-tests-by-qualys-totalai