The Mossack Fonseca leak: a good week for big data analytics

It’s being described as the biggest leak in history from a law firm that previously most people had never heard of and from which politicians, banks and high profile investors around the world are trying to distance themselves.

But amid the Mossack Fonseca scandal, which has already seen Iceland’s prime minister stand down and looks set to run on and on, one company that has rightly been keen to stress its involvement is Nuix, Australian eDiscovery and big data analytics provider.

Nuix, led by senior solutions consultant Carl Barron, worked with the International Consortium of Investigative Journalists (ICIJ) to analyse 11.5 million documents – totalling 2.6 terabytes of data – which were leaked to German newspaper Süddeutsche Zeitung over a year ago detailing the activities of Panamanian law firm Mossack Fonseca.

To retread, only very briefly, the background facts that have this week dominated the tabloids, broadsheets and websites of all major news outlets across the world, Mossack Fonseca is a law firm and ‘corporate service provider’ – a model that anyone with a passing knowledge of offshore law firms will be familiar with and invariably means they form and administer offshore trusts.

While these offshore entities are generally legal in the jurisdictions in which they are registered, the investigation revealed that some have been (ought we say ‘allegedly’ here?) used for unlawful purposes, including sovereign and individual fraud, drug trafficking, and tax evasion.

For each shell firm, Mossack Fonseca created a folder and each folder contains e-mails, contracts, transcripts, and scanned documents. In some instances, there are several thousand pages of documentation.

It was when it realised the scale of the leak that Süddeutsche Zeitung contacted the ICIJ to help it analyse the data and got Nuix involved. In an article simply called ‘About the Panama Papers‘, Süddeutsche Zeitung journalists Frederik Obermaier, Bastian Obermayer, Vanessa Wormer and Wolfgang Jaschensky explain how the analysis took place: “First, the data had to be systematically indexed to make searching through this sea of information possible. To this end, the Süddeutsche Zeitung used Nuix, the same program that international investigators work with. Süddeutsche Zeitung and ICIJ uploaded millions of documents onto high-performance computers. They applied optical character recognition (OCR) to transform data into machine-readable and easy to search files. The process turned images – such as scanned IDs and signed contracts – into searchable text. This was an important step: it enabled journalists to comb through as large a portion of the leak as possible using a simple search mask similar to Google.”

Around 400 journalists from over 100 media organisations in over 80 countries were then involved in researching the documents. The journalists compiled lists of politicians, international criminals and well-known professional athletes and used Nuix’s named entity extraction to identify and cross-reference the names of Mossack Fonseca clients.

“This is a huge trove of data by investigative journalism standards—around 10 times the data volume and five times the number of documents of ICIJ’s Offshore Leaks investigation in 2013,” said Eddie Sheehy, CEO of Nuix.

“At the same time, this is only a medium-sized document set in the worlds of eDiscovery or regulatory investigations – some of our customers handle similar volumes of data every day. Nuix is the only technology in the world that can handle this much data and that many documents with speed and precision.”

Nuix donated the software to Süddeutsche Zeitung and ICIJ for the purposes of the investigation, while Barron advised the investigators on hardware configurations and workflows. Nuix employees never saw or handled any of the leaked data – that task was undertaken by the journalists involved in the investigation.

“Nuix technology was an indispensable part of our work on the Panama Papers investigation, as it has been with Offshore Leaks and many of our other in-depth investigative stories,” said Gerard Ryle, director of the ICIJ.

The confidentiality surrounding most disputes and investigations means that technology suppliers are rarely able to enjoy the often dubious ‘limelight’ of those they are investigating. This week, perhaps this month, hell, let’s get over-excited and say this year, is a very good time to say you’re in eDiscovery and big data analytics.

Last year Legal IT Insider published a video featuring Nuix CTO Stephen Stewart explaining how you can take control of processing tasks with a new distributed worker framework, click here to watch.