Telus Digital flags AI model safety gaps in benchmark
Mon, 1st Jun 2026 (Today)
TELUS Digital has published a benchmark study on the safety of generative AI models, examining 34 models from 10 providers.
It ran more than 620,000 adversarial tests in what it described as its largest generative AI security study to date. The research covered models from Anthropic, OpenAI, Google, Meta, Alibaba, Baidu, ByteDance, Zhipu AI, 01.AI and Mistral.
The findings show wide differences in how models respond to harmful prompts. Vulnerability rates ranged from 1.3% to 93%, with lower percentages indicating stronger resistance to attack.
No model proved fully immune to adversarial techniques. Some systems resisted most harmful requests, while others engaged with them more than 90% of the time during testing.
One of the clearest patterns was the link between model design and safety outcomes. Models built to reason through answers before responding were markedly harder to exploit than those that generated replies more directly.
Reasoning models were vulnerable to 19.9% of attacks on average, compared with 55.1% for models that did not use that approach. The study also found that smaller models were consistently more vulnerable than larger ones, although size alone did not determine results.
Open-source systems did not uniformly perform worse than proprietary models. Although open models were exploited more often on average, source type was not the main driver of risk. TELUS Digital cited GLM 4.7 from Zhipu AI as an open model that outperformed several proprietary rivals on safety.
The benchmark also suggests geography is a weak predictor of safety performance. When models of similar size were compared, those developed in North America, Europe and China delivered similar results.
Testing setup
Rather than assess models in isolation, TELUS Digital tested them as banking assistants. Each model was instructed on which topics it could and could not address, reflecting how large language models are typically embedded in customer-facing applications.
The attacks covered privacy breaches, fraud, cyber threats, self-harm, discrimination and terrorism. Researchers used a customised internal model to generate malicious prompts designed to coax assistants into breaking their rules.
The study identified privacy exploitation, fraud and cyber threats as the areas where models remained most exposed. It also highlighted a pattern known as "refuse-but-engage," in which a model appears to decline a harmful request before providing related information that could still be misused.
TELUS Digital treated those responses as failures, arguing that a safe refusal should stop without offering adjacent guidance.
Bret Kinsella, General Manager and Senior Vice President, Fuel iX at TELUS Digital, said the results show why one-off checks are not enough. "The real risk isn't that AI models have vulnerabilities. It's that most organizations have no way of knowing which vulnerabilities apply to them," he said.
He added: "We found models that blocked an attack nine times, but failed on the tenth. We found others that are great at stopping engagement around some topics, but fail dramatically on others. That's the nature of probabilistic systems: unlike traditional software, AI doesn't give the same answer every time, which means a single security test tells you almost nothing. And the risk doesn't end with choosing the right model. Changes to how an AI application is configured, what data it draws from, or how it connects to other tools can all shift its behavior and security posture. Enterprises need to move from spot-checking GenAI solutions at launch to testing on an ongoing basis, or they're leaving vulnerabilities exposed that represent risk that could be avoided."
Spending gap
The report sets the findings against a sharp imbalance in corporate spending. Worldwide AI spending is projected to reach USD $2.52 trillion in 2026, while spending on AI trust, risk and security management is estimated at USD $3.43 billion.
That equates to about USD $1 spent on security for every USD $735 spent on AI. TELUS Digital also said 86% of organisations reported AI-related security incidents as regulatory scrutiny of AI systems increases in the US and EU.
The company argued that businesses should rely less on model-provider safeguards alone and instead use layered controls around AI applications. Those measures include prompt shielding, masking personally identifiable information before a user request reaches the model, and checking model outputs for toxicity or inappropriate content before they are shown to users.
It also said security testing should shift from manual, periodic reviews to automated checks integrated into software development workflows. That would help businesses monitor changes in model behaviour, identify regressions after updates and test at greater scale.
This is the second edition of the benchmark. The first covered 24 models from five US-based providers, while the latest expanded to 34 models from 10 providers and broadened open-source testing from two models to 14.
The results also showed that safety improvements are not always linear. Newer models generally performed better, but some scored worse on safety than earlier versions.
Among the stronger performers, 10 models recorded vulnerability rates below 5%. Anthropic's Claude family accounted for five of them, including the lowest rate in the study, though TELUS Digital said even low single-digit failure rates would remain unacceptable in uses involving money, health or reputation.
The benchmark highlights how model choice, application design and ongoing testing are becoming central concerns as companies embed AI into banking, customer service and other business systems. In TELUS Digital's testing, even the best-performing models still showed weaknesses under sustained attack.