How to Evaluate Legal AI Tools: What Actually Matters in a Noisy Market

We recently ran a survey of legal professionals to test something we’d been hearing anecdotally from our customers for a while: evaluating whether AI tools will actually deliver on their promises is genuinely difficult. The results have been confirmed. 3 out of 4 general counsels and legal leaders that we surveyed agree that it is very challenging to assess the performance of legal AI tools, and over half of the survey respondents have been asked to do exactly that.

For legal teams already stretched thin, this is an obstacle to making good technology decisions.

So what makes evaluating AI tools so hard? And what should legal leaders actually be looking for?

Overpromising and underdelivering on AI capabilities

Most respondents to the survey noted that their frustrations with evaluating AI vendors fell into three major areas:

Vendors promise too much: One respondent told us, “Many companies overstate the AI capabilities. The ideas are there and they may be starting down the road to development, but the reality is not there." In addition, we heard that many vendors show nice demos, but haven’t really dug into the actual use cases legal teams need. “With most companies, you really need a proof of concept to attempt to actually evaluate their product,” one respondent said. “Their usefulness in a demo or on a website just doesn't show how they would work for your use case.”

Every vendor sounds the same: If every vendor says they can do the same thing, how can anyone differentiate one from another? One survey respondent said, “The accuracy of AI is hard to define. Results vary dramatically based on prompt quality, document structure, data cleanliness, and user expertise. And, after a while, I get AI vendor merge where they all seem to offer the same software functions."

Verifying accuracy is difficult: Over a fifth of respondents mentioned this. Lawyers, rightly, are very worried about accuracy and hallucinations, and don’t want to do the manual work of checking and cross-checking AI. We heard from one senior counsel “Sometimes these products do not include the right information when trying to really narrow down a specific law or case. Sometimes I've found fake cases."

The stakes are high and legal teams are under pressure

We know that attorneys are under pressure. In-house legal teams are leaner than ever and contract volumes are growing. Our customers tell us that increasingly, their leadership expects AI to be part of the solution. But that means that the stakes of adopting the wrong tool are very high.

The data reflects this tension. According to the ABA's 2024 Legal Technology Survey, 74.7% of attorneys identified accuracy as their top concern with AI implementation. And a Paragon Legal study found that over a third of legal professionals have relied on AI-generated outputs they don't fully trust.

Choosing the wrong AI legal tool isn’t just a waste of budget. In the worst-case scenario, it could introduce real legal risk. And when something goes wrong, and the person who championed the tool also has to explain the errors, the stakes become personal, not just professional.

No wonder lawyers are reacting strongly to a crowded AI legal tech market full of vendors making claims that may or may not be relevant to real-life use cases. The cost of failure is very high.

Four key considerations for choosing an AI tool

In a market where every vendor claims to have “AI,” here are the true differentiators:

Architecture. Is this product truly AI-native or does it simply have a few AI capabilities bolted on? AI-native products tend to perform better, are more reliable, and release new features faster.
Workflow. Is this actually integrated into a legal workflow or does it require teams to change their habits and toggle between platforms?
Speed-to-insight. How much setup is required for the tool to be useful? Will tools require an extensive metatagging operation to set up? The best tools don’t require an entirely new project to deliver insight.
Contextual intelligence. Simple contract review is fast becoming a commodity. The differentiator is in reviewing those contracts in context over time. How much do contracts deviate from market or company standards? How often are non-standard provisions agreed to? And how can we make sure new agreements minimize risk?

The most prominent tools in this space are redefining what’s possible by collapsing implementation timelines, surfacing patterns across entire contract bases, and keeping playbooks automated and evergreen.

A critical part of the evaluation process: knowing your needs

Quora’s approach to solving this problem was both comprehensive and well-suited to their particular needs. They identified seven criteria that were important to them as they considered how an AI tool would fit into their workflow: everything from the UI, to AI features, to customer support capabilities, to security.

Adrie Christensen, Legal Operations Lead at Quora, noted that the process involved defining clear success criteria with her general counsel, which they organized into a detailed scorecard for consistent vendor evaluation.

This evaluation framework gave them a good baseline to agree on what was important to them as a business. This gave them clarity and specificity about adopting technology to serve their needs and integrate into their existing ways of working.

For a starting point to develop your own framework, take a look at our whitepaper, The State of Legal AI: How to Futureproof your Tech Stack. It contains a simple decision-making infrastructure for you to use and customize when choosing AI solutions, as well as a checklist of what capabilities you should expect from your AI tool.

How to make sense of the noisy legal AI tech market

Overpromising and underdelivering on AI capabilities

The stakes are high and legal teams are under pressure

Four key considerations for choosing an AI tool

A critical part of the evaluation process: knowing your needs

Continue reading

The Tao of Engineering

Paper creates possibilities: The next evolution of Ivo’s brand

Ivo Raises $55M to Bring AI Contract Intelligence to Every In-House Legal Team

Want to see what Ivo can do for you?