
“The Gen AI LLM benchmarking war starts here”: Harvey releases new evaluation framework
Gen AI startup Harvey has released a benchmarking framework for quantitatively evaluating the performance of large language models on real-world legal tasks. The model supplements, it says, prior work that