-
Muchova battles past Svitolina to book Stuttgart final berth
-
Allegri rules out taking Italy job, wants to stay at AC Milan
-
Miller bludgeons Delhi to IPL win over Bengaluru
-
Pope says he regrets his remarks interpreted as a debate with Trump
-
Brentford blow chance for top six in Fulham stalemate
-
Trade ships hit in Hormuz as Iran reopening falters
-
France blames Hezbollah for French peacekeeper's death in Lebanon
-
Venezuela's Machado doesn't regret gifting Nobel Peace Prize to Trump
-
No date set for next round of Iran-US talks: Iran deputy FM
-
Iran closes Hormuz Strait again over US blockade, ships reverse course
-
'We've already beaten other favorites', Lyon's Endrick warns PSG
-
Turkey says Israel using security as a pretext to acquire 'more land'
-
Iran closes Hormuz Strait again over US blockade with ships mid-transit
-
French film star Nathalie Baye dead at 77: family to AFP
-
China sex toy makers cautiously embrace AI wave
-
Paramount's CinemaCon charm offensive gets lukewarm reception
-
Game over: Players press EU to ban 'destroying' video titles
-
Churches to the rescue of Cuba's legions of poor
-
In Trump era, fearful left-leaning Americans turn to guns
-
Pope brings Africa tour to Angola as Trump feud drags on
-
New to The Street to Broadcast Tonight on Bloomberg at 6:30 PM EST - Show #744 Featuring Virtuix Holdings (NASDAQ:VTIX), Neonc Technologies Holdings (NASDAQ:NTHI), Medicus Pharma (NASDAQ:MDCX), YY Group Holding (NASDAQ:YYGH), and Vivos Therapeutics (NASDAQ:VVOS)
-
MasterChef Junior Finalist and Fort Lauderdale's Own Remy Powell Debuts #1 New Release Cookbook with Live Event - April 28
-
Fitzpatrick charges to one-shot lead at RBC Heritage
-
Andreeva sinks Swiatek to meet top seed Rybakina in Stuttgart semis
-
Carrick won't rule out Rashford return to Man Utd
-
Lampard restores reputation by leading Coventry to Premier League
-
'Gouged': World Cup fans to pay 'insane' $150 for NY stadium train ticket
-
Lens leave it late to edge Toulouse and keep pressure on PSG
-
Inter swat aside Cagliari to continue Serie A title procession
-
'Gouged': World Cup fans to pay $150 for NY stadium train ticket
-
Thunder stay in the moment as NBA title repeat beckons
-
US Catholics unsettled by Trump's feud with pope
-
US Supreme Court sides with Chevron in environmental case
-
World Cup fans to pay $150 for NY stadium train ticket: official
-
Gujarat's Gill consigns Kolkata to fifth defeat in IPL
-
Top takeaways from CinemaCon: the year's hottest movies
-
Lebanon president says working on 'permanent agreements' after Israel truce
-
Top-seeded Pistons embrace underdog tag
-
Andreeva sinks Swiatek to reach Stuttgart semis
-
Genital mutilation: the silent suffering of Colombia's Indigenous girls
-
UEFA probe after photographers injured at Bayern-Real game
-
Trump tells AFP 'no sticking points' for deal with Iran
-
Trump tells AFP Iran deal close, 'no sticking points' left
-
Shippers eye Iran Hormuz reopening with wariness
-
France, UK to lead 'defensive' force for Hormuz
-
Fils takes out Musetti to reach Barcelona Open semis
-
Griezmann soaking up last Atletico moments before 'joy' of Copa final
-
Polish stadium cancels Kanye West concert
-
Lille's Bentaleb out after 'minor surgery' for infection
-
Oil plunges, stocks jump as Iran declares Hormuz open
AI in Compliance Moves From Hype to Results - Revealing Clear Advances in Latest-generation Models
New benchmark by EQS Group and the BCM evaluates six leading AI models across 120 real-world compliance scenarios
MUNICH, DE / ACCESS Newswire / October 20, 2025 / Artificial intelligence is rapidly entering corporate workflows - but not all models deliver equally. To assess how well AI can handle the realities of compliance, the new 'EQS Benchmark Report: AI Performance in Compliance & Ethics' tested six leading AI models with 120 real-world compliance scenarios - from risk assessments and conflict-of-interest evaluations to third-party screening. The results: near-perfect precision on structured tasks such as classification and decision-making, with accuracy rates above 95%, but steep drops when complexity or ambiguity increases. Produced in collaboration with the German association Berufsverband der Compliance Manager e.V. (BCM), the benchmark also highlights the pace of progress, with 2025 models significantly outperforming those from 2024.
"For many compliance practitioners, AI is still unfamiliar territory," said Moritz Homann, Director of Product Innovation and AI at EQS Group. "Understanding how to apply it effectively and what it can be trusted with can be difficult - especially in a field as sensitive as compliance, where accuracy, accountability, and integrity are non-negotiable."
"AI can offer compliance new levels of insight, but our responsibility is to ensure its use stays within clear ethical and legal boundaries," said Dr. Gisa Ortwein, President of BCM. "Initiatives like this benchmark help us distinguish between what AI can genuinely deliver and where human judgment remains irreplaceable. That is how we safeguard integrity while embracing innovation - ensuring AI adoption enhances, rather than undermines, our profession."
The EQS benchmark is the first to assess AI performance in the compliance domain, using tasks that reflect day-to-day responsibilities of compliance and ethics professionals. It measures model accuracy, reliability, and practical usefulness across structured, semi-structured, and open-ended tasks.
Latest models significantly outperform those released only months earlier
The benchmark results highlight how quickly model capabilities are evolving. Google's Gemini 2.5 Pro achieved the highest overall score at 86.7%, demonstrating robust performance across all task types and compliance areas. With an overall score of 86.5%, OpenAI's GPT-5 (ChatGPT's default model since August 2025) matched Gemini in most categories, underscoring how quickly model capabilities are converging at the top. GPT-5 performed particularly well on open-ended content creation, while Gemini led in complex analytical and decision-making tasks.
OpenAI's o3 followed with a performance of 83.3%, illustrating both the progress of GPT-5 over its predecessor and the fast iteration cycle shaping the field. Anthropic's Claude Opus 4.1 reached a score of 81.5%, underperforming in structured evaluations and analytical reasoning, while GPT-4o (72.9%) and Mistral Large 2 (70.1%) ranked last. This reflects the significant generational leap between models released in 2024 and those launched in 2025.
In compliance, AI excels with clear rules, but struggles when ambiguity rises
Overall, AI models delivered their strongest results on straightforward, structured compliance tasks. For example, performance averaged 90.8% in decision-making scenarios based on a defined situation and a set of rules or policies. In exercises involving matching or mapping data sets, models reached an average score of 91.8%, with four of six models exceeding 95%.
By contrast, performance on more complex tasks varied more widely between models. For tasks involving data analysis, the spread was particularly large - a 60-point difference between the best and worst performers. In this category, Gemini 2.5 Pro achieved an 88% score, followed by GPT-5 with 62% - while GPT-4o ranked lowest with only 28%.
Open-ended tasks - such as drafting executive briefings or reports on internal investigations - proved more challenging even for the most recent models. The best performer in this category, GPT-5, reached a score of 67.4%. Unlike structured tasks, these assignments were evaluated by a human jury.
"There are some high-stakes tasks compliance professionals would not fully outsource to AI - nor should they," said Moritz Homann. "The strength of AI tools lies in acting as a force multiplier, supporting compliance workflows while leaving ultimate responsibility and judgment with professionals. Even for highly complex tasks, AI can take on much of the groundwork, saving valuable time on routine preparation and allowing experts to focus where their judgment is indispensable."
High consistency and low hallucination rate
The benchmark also tested reliability by repeating multiple-choice tasks three times per model. Consistency was high, with most models returning the same result in more than 95% of cases. Hallucinations - one of the most criticized risks of AI - were rare: across all tasks and models, only three clear instances were recorded, amounting to a rate of just 0.71%. This indicates that when tasks are clearly defined and contextualized, today's models can deliver stable and fact-based results in compliance scenarios. However, since hallucinations cannot be entirely ruled out, human oversight remains essential - especially for sensitive content with regulatory implications.
Model selection and prompt design influence outcomes
The report also highlights the importance of prompt specificity. In tasks where AI models were asked to extract red flags from third-party screening data, results varied depending on how narrowly the question was framed - for instance, whether to include affiliated entities or rate the severity of findings. Newer models - GPT-5 and Gemini 2.5 Pro - showed a better ability to follow complex instructions and return structured outputs, offering a clear advantage for compliance teams working with nuanced policies and large datasets.
Moritz Homann: "AI is here to stay - and the way we implement and use it today will shape its role in the compliance field for years to come. Compliance and ethics teams should not only govern AI risks, but also apply the technology themselves. Only by working hands-on with AI can we gain the insight to ask the right questions, design effective guardrails, and build trust. Our goal is to support this journey with practical tools, transparency, and dialogue."
The full EQS AI Benchmark Report is available to download here: https://www.eqs.com/compliance-wpapers/ai-performance-compliance-ethics-eqs/
Methodology
The EQS AI Benchmark Report tested six large language models - OpenAI's GPT-5, GPT-4o, and o3; Google's Gemini 2.5 Pro; Anthropic's Claude Opus 4.1; and Mistral Large 2 - across 120 tasks representing ten core compliance domains. These included areas such as risk assessment, speak-up case review, training effectiveness, policy evaluation, and regulatory gap analysis.
The task set was designed with input from compliance professionals and includes both real-world and synthetic content, such as HR datasets, training results, and policy texts. Some tasks had an objectively correct answer, while some required a more subjective and human-centered approach to scoring. For this reason, open-ended outputs were assessed with the support of the Berufsverband der Compliance Manager (BCM), whose members contributed professional evaluation and feedback on the quality and usefulness of model-generated responses.
Press contact
Christina Jahn
Tel.: +49 89 444430133
E-Mail: [email protected]
About EQS Group
EQS Group is a leading international cloud provider for compliance & ethics, data privacy, sustainability management, and investor relations. More than 14,000 companies across the world use EQS Group's products to build trust by reliably and securely meeting complex regulatory requirements, minimizing risks and transparently reporting on business performance and its impact on society and the environment.
EQS Group's solutions are bundled in a cloud-based platform. This allows compliance processes for whistleblower protection and case handling, policy management, and approval processes to be managed just as professionally as business partners, third parties and risks, insider lists and reporting obligations. In addition, EQS Group provides software to fulfill human rights due diligence requirements across corporate supply chains, ensure compliance with data privacy regulations like GDPR and EU AI Act, and support efficient ESG management and compliant sustainability reporting. Listed companies also benefit from a global newswire, investor targeting and contact management, as well as IR websites and webcasts for efficient and secure investor communication.
EQS Group was founded in Munich in 2000. Today, the group employs around 600 professionals worldwide.
About the BCM
As the leading professional association exclusively for in-house compliance officers from companies, associations, and other organizations, the BCM represents the interests of its members in dealings with policymakers, business, and society. The BCM focuses on providing information, fostering networks, and strengthening the compliance profession. It offers a wide range of free services designed to keep members informed about current compliance issues and to promote and continuously develop knowledge-sharing within its network.
SOURCE: EQS Group GmbH
View the original press release on ACCESS Newswire
D.Lopez--AT