Latest News

Guest post: But what about the unstructured content?

In this post on data & legal engineering,’s chief scientific officer, Dr Ben Gardner, focuses on exploiting unstructured data. Show this to your lawyers: in addition to helping you form a data strategy, it provides a really clear explanation of what structured and unstructured data is.

In this post on data & legal engineering,’s chief scientific officer, Dr Ben Gardner, focuses on exploiting unstructured data.

In this series of blog posts, we have so far examined the idea of building an Enterprise Knowledge Map that combines data from information silos across a law firm or legal department.  The focus to date has been on structured data (i.e. data in a database), however in the legal sector, the vast majority of information is held in unstructured form, such as documents, wiki pages, email, etc.

So how do we access this information and incorporate it into our Enterprise Knowledge Map?

The short answer is we need to extract information and create structured data using the type of workflow described below.

What is structured data and how do we create it from unstructured information?

Structured data is information that is recorded in a table or series of tables. Typically, a row describes an item and the column provides information about that item. In the diagram below, each row describes information about a piece of land, while the columns contain information set out in the column header (i.e. document ID, Title, Date, etc.).  Structuring data in this way allows us to easily see that for document A3456780 the Title is MR23563.

It can be seen in the diagram to the right that we can create structured data from an unstructured source by extracting key information and placing it into a table. In the past, this information extraction might have been performed by asking authors to fill in multiple metadata fields about the document. However, gathering much of this information can be automated by the use of text extraction (Artificial Intelligence) technologies.

Freeing people to add value

It is clear in the example above how typical metadata that the document author might have been asked to submit in the past when saving the document, can now be extracted automatically.  This approach saves the author time and typically results in an improvement in the quality of the basic metadata captured. Furthermore, because the author is no longer asked for basic metadata, she can now focus on adding high valued information that describes the document. For example, capturing the author’s interpretation of the position taken within the document, e.g. that a contract is buyer or seller friendly.

Creating Clause Databases

By applying text extraction in this way, it is possible to break a document down in a more granular way and create databases of clauses. If this data is added into an Enterprise Knowledge Map, then the user can search for specific clauses, rather than the documents that contain them. By capturing clauses and integrating them with the wider metadata available through an Enterprise Knowledge Map it is possible to build a clause recommendation engine.

Using these techniques, it is possible to access a huge amount of information about lawyers in the team and documents they are working on. By combining this information with what we know about the matter from the Enterprise Knowledge Map, we can automatically construct a query in the form of “Find clauses of type X from document type Y form matters like the one being worked on”. Using this approach, recommended clauses could be presented to the lawyer in a side bar in their drafting tool, which they could utilise to streamline the drafting process. This is an example of moving away from a pull model, where the lawyer has to break their thought processes and go and find the information, to one where the information they need is delivered to them promptly and in real time

Wrapping up a focus on data

In this series of posts, we have focused on describing how building an Enterprise Knowledge Map can simplify data access, enhance data utilisation and drive data driven decision making. We have examined:

  • How Google has utilised technologies to add structure and meaning into the web and how they use this structure to drive the user experience.
  • Explored how we can apply learnings from Google to data inside a law firm or legal departments to eliminate information silo’s and create Enterprise knowledge Maps.
  • Discussed how an Enterprise Knowledge Map can unlock the value trapped inside your systems and why getting your data right is the foundation for future innovation.
  • Shown how text analytics can be used to extract structured data from unstructured sources and thereby integrate documents into an Enterprise Knowledge Map.

This is really the foundation of a data strategy and adopting an approach like the one described will ensure your data becomes the fuel that will drive innovation. Hopefully the next time someone asks you what your data strategy looks like, you will have a clear answer.

See also: