Professor Perrin and Lexis+ AI: The Latest

By lead analyst Neil Cameron 

After the on-line discussion and commotion over the last few weeks about professor Benjamin Perrin’s unfortunate experiences with the latest release of Lexis+ AI we followed up the trail with a series of discussions with both the affable Ben Perrin himself, as well as a very open and disarmingly frank Jeffrey Pfeifer, LexisNexis’ chief product officer for Lexis+ AI in the UK, US and Canada.  

To recap; earlier in November, British Columbia law professor Perrin wrote an article for the Canadian Bar Association’s National Magazine on his experiences with using Lexis+AI: National – Law professor gives Lexis+ AI a failing grade  

In summary, he found that the answers delivered to him by LexisNexis flagship Gen AI product were “riddled with mistakes” and should not be used by law students yet.  

Let us review what happened. Perrin essayed three different prompt queries and was disappointed with the results of each of them. Unfortunately, these were his first ever queries of Lexis+ AI – however, we are sure (after talking to him) that Perrin did not start out to try and trick or fool the AI into performing badly in any way.  

Prompt Number 1 

Perrin’s first prompt was to ask Lexis + AI to draft a motion for leave to intervene in a constitutional challenge to a drug possession offence. He says: “Its response referenced ‘Section 15.07 of the Canada Legislation,” which does not exist. When I pointed this out, Lexis+ AI failed to acknowledge the error and displayed an automated message instead. This was a disappointing start, especially given the platform’s promise of reliable, citable results.”   

He added: “However, I did notice that when Lexis+ AI provides a hyperlink to a case or statute, it’s not a hallucination. There was no such hyperlink for ‘the Canadian Legislation,’ so I learned to be aware of the lack of a hyperlink. What about the quality of the ‘draft motion’? Unfortunately, what was generated didn’t even qualify as a rough first draft.”  

In our conversation with Perrin, he said that one of the things Lexis+ AI said it could do is draft a motion. The next time he tried to ask it to draft a motion it answered that this was not a supported use case.  

LexisNexis’ position is that ‘drafting a motion’ was not a proffered, or supported, use case for Lexis+ AI, but that ‘drafting an argument for a motion’ was.   

Pfeifer noted that the response appears to have confused a court rule section 15.0 rather than the legislation section requested, commenting: “This was an error in our data sourcing and has been corrected based on the professor’s feedback”.   

It may be that the two key words ‘an argument’ describes the distance, and difference, between the two parties on this issue. In any event drafting a motion is not a Lexis+ AI use case, and it appears now that if you ask it to do it, it will clearly tell you that it does not do that. 

Prompt Number 2 

In his second prompt, Perrin asked Lexis+ AI to summarize the Supreme Court of Canada’s Reference re Senate Reform. Instead of generating an original summary, it simply copied verbatim the headnote from the case (including the Supreme Court Reports page numbers), offering no added value. When Perrin next requested a shorter summary, Lexis+ AI provided another verbatim summary, but this time of an entirely unrelated case involving a construction dispute from Alberta.  

When we quizzed him on this Pfeifer told us that Lexis+ AI is designed, if asked for a summary of a case, to use the curated headnote – if there is one – as opposed to generating one from scratch. LexisNexis takes the view that headnotes have been crafted by human lawyers and that there is no point in asking Gen AI to generate a new version.  

He has no explanation of the subsequent incorrect case reference, observing: “Our team cannot replicate this pattern; our citation matching is a long-tested process, and I cannot comment on what may have caused the issue described.”   

Prompt Number 3 

Professor Perrin then posed some legal questions to Lexis+ AI in areas of law that he teaches and knows well, such as “what is the test for causation in criminal law?” The responses were concise, confident and linked to actual cases, but in his view the content was riddled with mistakes. Its explanation of causation confused criminal law with tort law, citing several incorrect – albeit real – cases and getting the legal test wrong. He concludes “if a law student submitted this response, they would have failed”. 

Perrin later asked the same question to see if Lexis+ AI’s response had improved. He said that “this time Lexis+AI correctly focused on causation in criminal law—not causation in tort law (although one tort case slipped in). However, its response was very basic, and the cases it cited were not the leading authorities. It preferred to cite lower court decisions rather than the leading Supreme Court of Canada jurisprudence that first-year law students learn”. 

In an ideal world, one would hope that the response would begin with references to the relevant highest court judgements available, but maybe would also include references to lower court cases where there were perhaps more extensive and illuminating additional obiter dicta.  

Other Issues 

A number of other issues came up in our discussion with Perrin, which we then passed to Pfiefer for his views, although the pair have indicated they are also willing to speak direct: 

Perrin – What types of potential customers (law firms, academics, internal LN people etc) have undertaken the previous round of product testing for this version of Lexis+ AI? 

 

Pfeifer – “The service was tested over an extended period by law firms of all sizes and government institutions.  Faculty, at law schools, were granted access in August 2024”.   

Perrin – In relation to current Lexis+AI customer use, what kind of internal LN post-mortem qualitative analysis is taking place (if any) on the results that are provided by the AI? 

Pfeifer – “LexisNexis continuously tracks answer and citation quality via a proprietary scoring method, using English and French speaking, Canadian JDs.  These individuals work full-time and regularly track quality performance and identify any issues for data science teams to review.  LexisNexis also collects in-product feedback and answer assessment by users via a feedback form and “thumbs up and thumbs down” answer rating.  Customers have an opportunity to share direct feedback, which again, is reviewed by our data science team for service updates.”   

Perrin – Might it be useful to have a secondary AI double check on answers that do not include sources links?   

Pfeifer – LexisNexis is testing methods to leverage secondary AI models provide answer feedback.  However, improvement to answer quality has been mixed and the technique will not be deployed until it drives the desired quality targets. 

Perrin – Might it be more advantageous to treat the AI engine like a new law student, and give it a priming on basic legal principles (by jurisdiction), before throwing it at a massive collection of primary sources? 

Pfeifer’s response to this was thoughtful and deserves to be repeated here in full: “The description above mischaracterizes our approach to AI deployment.  LexisNexis does not ‘pre-train’ large language models.  Pre-training a large language model involves direct ingestion of source data.  Our testing indicates that doing so does not improve answer quality and risks currentness requirements of practicing lawyers.  The approach described also risks substantial increases in hallucination that we believe are a risk to practical use by lawyers.  

“LexisNexis fine-tunes large language models with instruction related to prompt processing.  We further employ a proprietary Retrieval Augmented Generation (RAG) service that is designed to expose the appropriate legal content to a large language model for synthesis and answer generation.   

“This service manages: semantic intent parsing of the prompt, query formulation for content retrieval from LexisNexis data sources, citation retrieval and validation and ultimately, answer formulation.  LexisNexis do not rely on the model directly to identify cited sources, text or legal analysis”.   

Conclusion 

In summary, what lessons are there for us all – Gen AI vendors and consumers alike – to learn from these exchanges? 

We can identify the following take-aways – this toothpaste is not going back into the tube regardless of any issues people may face using it. 

The kind of commercial Gen AI we are discussing as applied to legal practice is only a few years old. It has been developed as an astonishingly fast rate. The early promise has attracted a massive degree of capital investment which has, in turn, stimulated such a ‘gold rush’ mentality that means developers have been falling over themselves to develop and improve their products and get them to market as fast as possible.  

Arguably, too fast.  

We are currently in the wild west – we have hitherto unimaginable levels of legal research and work product functionality, but it is combined with two additional deadly features that we have yet to manage and control. On the one hand, Gen AI has a tendency to err, and to hallucinate, and on the other hand, it is being used by a customer base with drastically varying degrees of understanding and attitude for the tool. 

In an ideal world, the vendors and profession would have waited until the flaws in Gen AI had been bottomed out, and ironed out; and maybe that would have given the law schools and the law firms the time necessary to train potential users how to get the best results from the tools.  

But the thirst for these tools is inexorably driven by the dream of competitive advantage. For me, one of the most telling recent stories came from the Ashurst Vox PopulAI report “whilst the firm’s research did unearth instances of ‘hallucination’, the Gen AI [also] presented such a novel angle on a legal point in a court judgment that the participant refused to believe it could be correct (which it was)”.   

Gen AI therefore holds out the promise not only of producing legal outcomes faster and of higher calibre than humans (or lawyers), but of producing ‘thoughts’ that humans never could.  

As such, nothing is going to stop the current pace of experimentation and adoption. 

Given that, the pace of familiarisation and training of law students and lawyers has to be equally aggressively undertaken.  

Which leads us to a related question that has arisen since the publication of professor Perrin’s article; should law students be let loose on the current versions of tools such as Lexis+ AI? 

While Perrin stated in his Canadian Bar article that law students shouldn’t yet be given access to Lexis+ AI, he acknowledged to me that these tools are out there already and it’s too late to shut them off. Student should have access to them, but controlled, managed and trained access. He firmly believes that first year law students should be introduced to Gen AI in order for them to get a thorough grounding in how it works and what is does, as well as its shortcomings and how to deal with them.  

We can only hope that the same strictures will be applied to practising lawyers as well.  

Of course, the student ‘horse’ issue has already bolted – as Perrin points out, thousands of law students in the US are already using it. One can only hope that they are not just been ‘thrown at it’, but that they are also getting the appropriate complementary accompanying grounding in its operation and training as to how best to use it. In any event, as Perrin also points out, if specialist tools such as Lexis+ AI are not made available to them, they are already using, and will continue to use, the generic AI tools.  

Finally, two other interesting issues arose. The first arose when I asked Pfeifer whether one of the issues causing errors and hallucinations with Gen AI is that it is “too eager to please”.  His answer was an enthusiastic “yes!”.  

He pointed out that if you ask Gen AI to deliver a longer or expanded response, it has been found that this encourages hallucinations. Oddly, many users have already expressed the view that if you are polite in your prompts to Gen AI, you tend to get better responses.  

It may simply be that in the process of absorbing massive amounts of human writing and thinking Gen AI is ‘inheriting’, or is even, unconsciously, trying to simulate, some key human cognitive biases. 

After all, how can we purport to control the variety of ways that AI will seek to ‘mimic’ human behaviour?  

The principle of benchmarking was also discussed; everyone agrees that some form of comparative Gen AI benchmarking would be advantageous to consumers, and potential consumers. LexisNexis is in discussions with a number of benchmarking initiatives including Stanford and LegalTechHub, and Legal IT Insider is involved with the benchmarking initiative led by CMS head of knowledge and innovation John Craske, which is also worthy of mention. 

Perrin agrees that benchmarking is key, but foresees problems with any kind of simple vendor ranking system, as a comparative human review will also be problematic – after all, two equally eminent law professors can disagree.  

Better, he thinks, to specify a series of minimum standards and consider how the various Gen AI systems perform against them, such as: 

  • did it make mistakes? 
  • did it hallucinate? 
  • did it confuse jurisdictions? 
  • did it confuse crime and tort? 
  • did it quote appropriate source cases?  

  I have no doubt that, given the pace of change, we will continue to report on this over the next several weeks, months and years. 

 

By lead analyst Neil Cameron