vLex and Vecflow have given their takeaways on the results of a first-of-its-kind in-depth GenAI benchmarking study out yesterday (27 February), which analysed their tools alongside Harvey and CoCounsel.
The report was conducted by Vals AI, a company that independently evaluates and benchmarks the performance of large language models across industry specific tasks to assess their accuracy and efficacy on real-world scenarios, as well as highlight the strengths and weaknesses of different LLMs. Vals AI was supported by Legaltech Hub as well as alternative legal services company Cognia, which assisted in providing human review of the same documents analysed by the GenAI tools, to provide a human baseline for comparison.
AI tools and the human baseline were set seven tasks: data extraction; document Q&A; document summarisation; redlining; transcript analysis; chronology generation; and EDGAR research.
You can see the results below:
While Harvey and CoCounsel came out on top, vLex pointed out in a statement following the findings that Vincent AI has achieved a groundbreaking milestone, meeting or surpassing human lawyer benchmarks in four out of five key tasks.
vLex is a legal database of statutes and caselaw but the Vals Legal AI Report focused largely on document extraction tasks and Ed Walters, chief strategy officer of vLex, said: “We are quite pleased with Vincent AI’s performance, especially considering that document summary tasks do not leverage our vast legal database of global information.”
The report noted that “Although our evaluation focused on a small slice of Vincent AI’s capabilities in U.S. jurisdictions, its support for international matters is a significant strength. For global law firms, this capability may provide a level of utility unmatched by other tools, making Vincent AI an attractive choice.”
vLex points out that other takeaways from the Vals Legal AI Report include:
- Vincent “gave responses exceptionally quickly as generally one of the fastest products we evaluated.”
- “Vincent AI’s design is particularly noteworthy for its ability to infer the appropriate subskill to execute based on the user’s question, adapting to the user query. In cases where clarification was needed, Vincent AI would proactively ask follow-up questions to refine its understanding, ensuring tailored responses.”
- “When the legal research database did not have sufficient data to answer a question, Vincent AI refused to answer, rather than hallucinate an illegitimate response.”
- “The answers provided were impressively thorough . . . offering valuable additional context to aid their understanding and workflow.”
Vecflow, meanwhile, said that the study confirms that Oliver–with only six months on the market and far fewer resources than its peers– “often performed on par with or outperformed more established offerings.” This includes companies valued in the billions.
Notably, Oliver outperformed the human lawyer in Document Q&A and Document Summarization. Following Harvey’s drop-out, Oliver was also the sole competitor in SEC EDGAR Research–the only task involving multi-stage, complex reasoning.
“Intelligent AI workflows represent the future of legal work.” said Vecflow’s CTO, Joe Parker. “The evaluation highlights three critical findings: AI assistants are already surpassing lawyers in several critical areas; Oliver’s research capabilities stand unmatched in the legal tech sector; and AI performs best when complementing a lawyer’s existing workflow.”
“We’ve already seen our customers able to handle more cases in less time. And there’s evidence they can do this at even higher quality,” said Thomas Bueler-Faudree, CPO of Vecflow, in an interview with Vals AI. “You’re going to see smaller firms that are technologically forward being able to rapidly do ten, 100 times more material in cases than they used to. I think you’re going to see increasing competition in law, and hopefully you’ll see one of the big four law firms appear in the US.”
Since the benchmark was conducted six months ago, Vecflow says it has made significant improvements to Oliver based on customer feedback.
See also:
Harvey and CoCounsel receive top scores in first major industry GenAI benchmarking study