If there was a dip in productivity in the legal tech world this week it may be because everyone was a) getting ready for ILTACON and b) busy testing and writing about OpenAI’s latest large language model GPT-5, which was launched to a predictably and some may say disproportionately huge fanfare.
OpenAI has said that GPT-5 is a “significant leap in intelligence over previous models” and will have fewer errors or hallucinations, significant advances in its ability to follow instructions, and a ‘levelling up’ in performance when it comes to writing.
Vals AI has been quick to publish the results of its early legal benchmarking of GPT-5, which it says comes out on top for performance when compared with 62 other public models including earlier GPT models, Claude, DeepSeek V3 and Gemini, with an accuracy rating of 84.6%. You can see the full results here: https://www.vals.ai/benchmarks/legal_bench-07-29-2025
Commenting on the results on LinkedIn, Vals AI say: “GPT 5 is the strongest in the family, ranking #1 across legal and mathematical reasoning task and is among the top performers on every benchmark we tested.”
They add: “Despite its smaller size and lower cost than GPT 5, GPT 5 Mini ranks top 10 on almost all benchmarks while beating GPT 5 on coding tasks and tax evaluation.” It is important to note that in the overall benchmark results, GPT 5 was given an accuracy rating of 84.6% and Gemini 2.5 Pro Exp was just 1% behind with 83.6% accuracy. Grok 4 was 0.2 of a percentage point behind that on 83.4%.
There was a flurry among vendors this week to talk about their own experiences of using GPT-5. Harvey that GPT-5 during early-access showed impressive performance across Harvey’s core product surfaces, Assistant, Vault and Workflows, while also demonstrating substantial improvements in legal reasoning, as measured by expert preference and it’s BigLaw Bench evaluation suite.
GPT-5 is available in the Lega AI Sandbox, where early comparisons between the GPT-5 family and corresponding GPT-4.1 relatives across a variety of tasks show the following (thanks to Lega’s CTO Rob Saccone for sharing):
🅰️ given the same task, GPT-5 uses 4-5x more tokens than GPT 4.1
💰 As a result, GPT-5 responses are on avg 3-4x more expensive
⏱️ GPT-5 time to respond is 45x slower than GPT 4.1
🏆 That said, responses are reported to be subjectively “better” w/ many tasks
Saccone also quite hilariously observes a correlation between the level of outrageous vendor funding and the length/quality of vendor’s GPT-5 announcements on LinkedIn – if you see that man, buy him a pint from us.
Speaking to Legal IT Insider, Lega’s founder and CEO Christian Lang said: “There was a lot of build up to this and we knew it would be a blockbuster release. The quality does seem to be outstanding. It does a great job in cutting right to the heart of the issue and speaking crisply. Sometimes models provide a lot over explanation but this is more succinct. Having said that, as Rob said in his post, it consumes more tokens and is very slow. In our sandbox you can run models side by side and I suspect there will be a lot of cases where people reach for the models that are snappier.”
He added: “OpenAI has said this is a material improvement on quality and reducing hallucination and for the late joining community that may help in overcoming their objections. For those leaning in, and who have already optimised solutions grounded in their own data, I think the improvements they see will be more modest than transformative.”
As both Saccone and Lang allude to, it’s important for the legal market to keep its head as new models are launched thick and fast. At AI marketplace Jylo, founder and CEO Shawn Curran told us: “The models and going to continue to advance and there will be an update next week from Claude that will surpass GPT-5. We need to focus on the use cases. We’ve tested GPT-5 and it’s good but is it better than GPT-4 with all of our system prompts that are very specific to our customers? It’s a hard thing to evaluate and quite subjective.”