Arizona Tribune - AI in Compliance Moves From Hype to Results - Revealing Clear Advances in Latest-generation Models

NYSE - LSE

RBGPF	-19.57%	69	$
BCE	-0.29%	24.09	$
NGG	-0.69%	86.92	$
CMSC	0.66%	22.77	$
RYCEF	3.17%	17.66	$
BP	-6.82%	44.59	$
GSK	2.09%	58.35	$
VOD	-1.42%	15.48	$
BTI	0.95%	56.68	$
AZN	2.11%	204.8	$
BCC	5.11%	83.04	$
RELX	1.28%	36.68	$
CMSD	0.78%	23.08	$
RIO	0.44%	100.15	$
JRI	1.38%	13.09	$

AI in Compliance Moves From Hype to Results - Revealing Clear Advances in Latest-generation Models

ECONOMY 20.10.2025

New benchmark by EQS Group and the BCM evaluates six leading AI models across 120 real-world compliance scenarios

Text size:

MUNICH, DE / ACCESS Newswire / October 20, 2025 / Artificial intelligence is rapidly entering corporate workflows - but not all models deliver equally. To assess how well AI can handle the realities of compliance, the new 'EQS Benchmark Report: AI Performance in Compliance & Ethics' tested six leading AI models with 120 real-world compliance scenarios - from risk assessments and conflict-of-interest evaluations to third-party screening. The results: near-perfect precision on structured tasks such as classification and decision-making, with accuracy rates above 95%, but steep drops when complexity or ambiguity increases. Produced in collaboration with the German association Berufsverband der Compliance Manager e.V. (BCM), the benchmark also highlights the pace of progress, with 2025 models significantly outperforming those from 2024.

"For many compliance practitioners, AI is still unfamiliar territory," said Moritz Homann, Director of Product Innovation and AI at EQS Group. "Understanding how to apply it effectively and what it can be trusted with can be difficult - especially in a field as sensitive as compliance, where accuracy, accountability, and integrity are non-negotiable."

"AI can offer compliance new levels of insight, but our responsibility is to ensure its use stays within clear ethical and legal boundaries," said Dr. Gisa Ortwein, President of BCM. "Initiatives like this benchmark help us distinguish between what AI can genuinely deliver and where human judgment remains irreplaceable. That is how we safeguard integrity while embracing innovation - ensuring AI adoption enhances, rather than undermines, our profession."

The EQS benchmark is the first to assess AI performance in the compliance domain, using tasks that reflect day-to-day responsibilities of compliance and ethics professionals. It measures model accuracy, reliability, and practical usefulness across structured, semi-structured, and open-ended tasks.

Latest models significantly outperform those released only months earlier

The benchmark results highlight how quickly model capabilities are evolving. Google's Gemini 2.5 Pro achieved the highest overall score at 86.7%, demonstrating robust performance across all task types and compliance areas. With an overall score of 86.5%, OpenAI's GPT-5 (ChatGPT's default model since August 2025) matched Gemini in most categories, underscoring how quickly model capabilities are converging at the top. GPT-5 performed particularly well on open-ended content creation, while Gemini led in complex analytical and decision-making tasks.

OpenAI's o3 followed with a performance of 83.3%, illustrating both the progress of GPT-5 over its predecessor and the fast iteration cycle shaping the field. Anthropic's Claude Opus 4.1 reached a score of 81.5%, underperforming in structured evaluations and analytical reasoning, while GPT-4o (72.9%) and Mistral Large 2 (70.1%) ranked last. This reflects the significant generational leap between models released in 2024 and those launched in 2025.

In compliance, AI excels with clear rules, but struggles when ambiguity rises

Overall, AI models delivered their strongest results on straightforward, structured compliance tasks. For example, performance averaged 90.8% in decision-making scenarios based on a defined situation and a set of rules or policies. In exercises involving matching or mapping data sets, models reached an average score of 91.8%, with four of six models exceeding 95%.

By contrast, performance on more complex tasks varied more widely between models. For tasks involving data analysis, the spread was particularly large - a 60-point difference between the best and worst performers. In this category, Gemini 2.5 Pro achieved an 88% score, followed by GPT-5 with 62% - while GPT-4o ranked lowest with only 28%.

Open-ended tasks - such as drafting executive briefings or reports on internal investigations - proved more challenging even for the most recent models. The best performer in this category, GPT-5, reached a score of 67.4%. Unlike structured tasks, these assignments were evaluated by a human jury.

"There are some high-stakes tasks compliance professionals would not fully outsource to AI - nor should they," said Moritz Homann. "The strength of AI tools lies in acting as a force multiplier, supporting compliance workflows while leaving ultimate responsibility and judgment with professionals. Even for highly complex tasks, AI can take on much of the groundwork, saving valuable time on routine preparation and allowing experts to focus where their judgment is indispensable."

High consistency and low hallucination rate

The benchmark also tested reliability by repeating multiple-choice tasks three times per model. Consistency was high, with most models returning the same result in more than 95% of cases. Hallucinations - one of the most criticized risks of AI - were rare: across all tasks and models, only three clear instances were recorded, amounting to a rate of just 0.71%. This indicates that when tasks are clearly defined and contextualized, today's models can deliver stable and fact-based results in compliance scenarios. However, since hallucinations cannot be entirely ruled out, human oversight remains essential - especially for sensitive content with regulatory implications.

Model selection and prompt design influence outcomes

The report also highlights the importance of prompt specificity. In tasks where AI models were asked to extract red flags from third-party screening data, results varied depending on how narrowly the question was framed - for instance, whether to include affiliated entities or rate the severity of findings. Newer models - GPT-5 and Gemini 2.5 Pro - showed a better ability to follow complex instructions and return structured outputs, offering a clear advantage for compliance teams working with nuanced policies and large datasets.

Moritz Homann: "AI is here to stay - and the way we implement and use it today will shape its role in the compliance field for years to come. Compliance and ethics teams should not only govern AI risks, but also apply the technology themselves. Only by working hands-on with AI can we gain the insight to ask the right questions, design effective guardrails, and build trust. Our goal is to support this journey with practical tools, transparency, and dialogue."

The full EQS AI Benchmark Report is available to download here: https://www.eqs.com/compliance-wpapers/ai-performance-compliance-ethics-eqs/

Methodology

The EQS AI Benchmark Report tested six large language models - OpenAI's GPT-5, GPT-4o, and o3; Google's Gemini 2.5 Pro; Anthropic's Claude Opus 4.1; and Mistral Large 2 - across 120 tasks representing ten core compliance domains. These included areas such as risk assessment, speak-up case review, training effectiveness, policy evaluation, and regulatory gap analysis.

The task set was designed with input from compliance professionals and includes both real-world and synthetic content, such as HR datasets, training results, and policy texts. Some tasks had an objectively correct answer, while some required a more subjective and human-centered approach to scoring. For this reason, open-ended outputs were assessed with the support of the Berufsverband der Compliance Manager (BCM), whose members contributed professional evaluation and feedback on the quality and usefulness of model-generated responses.

Press contact

Christina Jahn
Tel.: +49 89 444430133
E-Mail: [email protected]

About EQS Group

EQS Group is a leading international cloud provider for compliance & ethics, data privacy, sustainability management, and investor relations. More than 14,000 companies across the world use EQS Group's products to build trust by reliably and securely meeting complex regulatory requirements, minimizing risks and transparently reporting on business performance and its impact on society and the environment.

EQS Group's solutions are bundled in a cloud-based platform. This allows compliance processes for whistleblower protection and case handling, policy management, and approval processes to be managed just as professionally as business partners, third parties and risks, insider lists and reporting obligations. In addition, EQS Group provides software to fulfill human rights due diligence requirements across corporate supply chains, ensure compliance with data privacy regulations like GDPR and EU AI Act, and support efficient ESG management and compliant sustainability reporting. Listed companies also benefit from a global newswire, investor targeting and contact management, as well as IR websites and webcasts for efficient and secure investor communication.

EQS Group was founded in Munich in 2000. Today, the group employs around 600 professionals worldwide.

https://www.eqs.com/

About the BCM

As the leading professional association exclusively for in-house compliance officers from companies, associations, and other organizations, the BCM represents the interests of its members in dealings with policymakers, business, and society. The BCM focuses on providing information, fostering networks, and strengthening the compliance profession. It offers a wide range of free services designed to keep members informed about current compliance issues and to promote and continuously develop knowledge-sharing within its network.

www.compliance-verband.de

SOURCE: EQS Group GmbH

View the original press release on ACCESS Newswire

D.Lopez--AT

Arizona Tribune - AI in Compliance Moves From Hype to Results - Revealing Clear Advances in Latest-generation Models

AI in Compliance Moves From Hype to Results - Revealing Clear Advances in Latest-generation Models

Featured

Trade ships hit in Hormuz as Iran recloses strait

Paramount's CinemaCon charm offensive gets lukewarm reception

Iran threatens to again close Hormuz, if US blockade continues

Churches to the rescue of Cuba's legions of poor