Next level AI? Travers Smith launches open source contract labelling tool (and why legal AI vendors will probably hate it)

Travers Smith last week announced the launch of contract labelling tool Etatonna, which looks set to help law firms to take control of their own data and potentially build joint venture AI models. Editor Caroline Hill speaks to head of legal technology Shawn Curran.

I don’t know why it took me a while to get my head around the potential significance of Travers Smith’s open sourced contract labelling tool Etatonna, but it did, and apparently, I’m not alone. Travers’ head of legal technology, Shawn Curran prefaces our conversation on 16 October with the comment: “A lot of people don’t get it.”

The starting point is that Etatonna allows firms to label contracts and store the data in a structured format in order to train AI models. Curran says: “Instead of the firm partnering with all the different AI vendors to put data into their platform and then train the machine learning in their platform, with no ability for us to then export and retain the data when the relationship comes to an end, it makes much more sense to build and structure our data internally. So, for example, if it’s labelled ‘force majeure’ in our database, we can take a bundle of data to a provider to train their model and ask them to delete it at the end.”

Curran adds: “If I’m a big corporate and I have a secret sauce, I buy in eg water, add my recipe, and push out the finished product. Right now, law firms are licensing their recipe to the water company. We’re consuming the water and providing the secret sauce. With Etatonna, we’re shifting things so that so that use of our data is more on our own terms.”

Data in Etatonna can be used to generate and regenerate machine learning models for any purpose. It potentially provides a solution to one of the biggest problems in the legal industry:  ownership of the IP of trained data models.

It will also help law firms to understand exactly what data went into a model and Curran says: “There’s growing research outside of the industry around the ability to reverse engineer machine learning models and it’s important to know what data is in that model.” He adds: “Our concern is that inside that model is lots of data and nobody can attribute what was fed in.”

In a statement out last week Sam Lansley, a software engineer at Travers, put it well when he said: “The lack of transparency with machine learning models is a big concern for most industries. Models are simply a combination of “learned” algorithms and at the moment, this makes it extremely difficult to track them back to their source data. Etatonna completely solves this problem by linking the model back to the training data, and then linking that back to the original documents.”

Travers didn’t have to open source Etatonna but, if things go as Curran and his team plan, it has the potential to help take legal AI to the next level. Curran envisages that it will help law firms to share datasets to build joint venture-type AI models and optimise model training. He says: “If I’m a corporate lawyer, what is the point of Travers labelling a change of control clause and you at another firm labelling the same change of control clause? We need to train it so the AI is good enough for the industry as a whole.”

He adds: “The problem is not that AI is rubbish, it’s that it needs to be optimised and it’s hard to do that independently.”  

Outside of legal, labelling startups are big business and in an article in February Fortune quoted Alexandr Wang, the 23-year-old founder and CEO of Scale AI, which has worked with a number of self-driving car companies, saying that the “dirty secret” of artificial intelligence is that getting the software to work well in the real world requires a large amount of high-quality data. He said: “Where the rubber hits the road is what does the data these A.I. systems are trained on look like? Is that data biased? Is that data high quality? Does that data have noise? Is that data comprehensive?” This is precisely where Travers is going with Etatonna. AI is really still in its infancy in legal but could this tool help the industry to get to the next level?

Organisations that would like to access the open source code for Etatonna can register by emailing Curran at Etatonna is licenced under the GNU GPL v3 licence and is shared through the existing Travers Smith Open Source environment on Azure DevOps.

Let me know your thoughts good or bad. You can watch Travers YouTube video about Etatonna here (guess who was told to speak slowly):