Comment: As Data Evolution Charges Full Steam Ahead, is E-Discovery Along for the Ride?

By Tom Jackson, FTI Technology

E-discovery professionals need to constantly keep pace with the evolution of differing technologies that generate and retain data. This includes developing technical knowledge for comprehensive data extraction and, more broadly, building an understanding of how clients deploy and use such systems to give context to extracted data. As the pandemic accelerated digital transformation within corporations over the last 18 months, legal teams began to encounter the challenges of identifying, preserving and collecting a greater breadth of electronic evidence generated and stored within cloud-based applications, increasing numbers of collaboration tools and greater volumes of Bring Your Own Device (BYOD). Lessons learned and solutions around awareness, preservation and extraction of new data sources have been widely discussed, and teams have started to align their early-stage e-discovery methodologies and forensic readiness accordingly. However, in context of the Electronic Discovery Reference Model (EDRM), impacts are emerging downstream in the e-discovery process as well, and legal teams are now facing a new set of challenges in the workflows they utilise during processing, analysis, review and production. Without a pivot towards new approaches and appreciation of how technology usage should drive review, e-discovery professionals face the possibility of being left behind as the diversity of data and data usage pushes forward at a rapid pace.

In most cases, processing, analysis and review represent the most time intensive and costly phases of e-discovery, wherein analytics tools are often critical to maintaining efficiency. Therefore, legal and e-discovery teams must understand how legacy approaches toward emerging data sources may undermine the use of sophisticated analytics and review workflows, as well as alter input and demands on output during the downstream phases of the EDRM.


In data processing, the data source’s output and metadata structure are key considerations in enabling reconstruction of data in a review platform. However, some technologies by design may separate or further detail such data from the core file content. For example, many cloud-based file sharing systems include rich metadata that can aid in e-discovery search, but that metadata (such as file owners, editors, viewers, version history) may not be directly attached to a document when it is processed and transferred into the review platform if simply handled as if it were a file extracted from a laptop. Similarly, video conferencing tools may store audio recordings, video recordings and transcriptions, all with metadata relating to a specific meeting, but that metadata may not always remain attached to the associated recordings or transcript files and require reconstructing prior the searching and review.

One challenge we’ve seen in recent matters relates to what happens when an audio platform produces a WAV file recording from a call of interest. In this scenario, it initially seems as though the necessary data has been collected, but in reality, the WAV file does not have any metadata linked to it, leaving out all of the context (such as participants, date and time) that is typically relevant in e-discovery. So, in processing, that recording must be stitched together with other pieces of related information. Similar challenges can arise in chat applications when voice notes, pictures, emojis and shared documents are all part of a single conversation thread and thus intermingled as one e-discovery artifact.

Such processing challenges should not be overlooked, as rejoining content and its associated metadata can be key in effective and complete review. However, this task often requires custom enhancements to existing processing workflows.

Analysis and Review

Additionally, files from chat applications, video conferencing, file sharing and other new data sources do not often fit seamlessly into a Technology Assisted Review (TAR) workflow. Beyond the challenges around metadata as discussed above, the ways in which chat messages are split up from a larger conversation thread can impact TAR’s accuracy and speed. If chats are not split into shorter messages, they may include a high volume of irrelevant material, which can undermine a tool’s ability to identify artifacts and correctly decipher what’s relevant from what’s not. Conversely, when short form messages are split from the conversation thread, context is lost, making it more difficult for an analytics tool and reviewers to understand how the messages relate to the key facts or bigger picture of a case.

A further challenge that chat-based data can present when using tools such as TAR and sentiment analysis is the increased use of audio, images, animated GIFs and emojis. These communication mechanisms can hold strong non-textual meaning and provide additional datapoints to reviewers/investigators in both searching and content review. However, as several analytical tools operate using only the extracted text from files, these elements may not be properly represented or may be entirely absent from analysis.

Common e-discovery tactics such as deduplication and threading may also be impacted when strings of chat messages are included in a dataset, or when emails are collected from mobile devices versus an email server. These examples are just a subset of the many complications that may arise from the new data landscape during the analysis and review stage. Legal teams should work with experts who understand how to build processes and workarounds that will maintain efficiency and ensure an accurate and reliable review when non-traditional data types are in scope.


The range of what will be required in productions for legal and regulatory matters may expand significantly going forward. Similarly, there will likely be significant debate over what should be disclosed and how it should be done. For example, with chat-based data, teams may be required to produce a full chat thread, even if only small portions of it are actually in scope. This will lead to discussions whether to disclose everything and redact sensitive portions, or to split each message into an individual artifact and produce only the pieces that are relevant to the matter and that is before conversations on chat attachments. Likewise, standalone video and audio recordings may be considered sufficient in one matter, while another may require the production of recordings in tandem with metadata or transcripts. Some courts or agencies may require data to be produced in a specific format as well and/or necessitate the provision of metadata known to exist outside of the “core” file content. In a landscape where nearly everyone is learning as they go, legal teams that understand the nuances around emerging data types will be in a much stronger position to negotiate reasonable and realistic production requirements.

The ongoing increase of data volumes, variety and complexity is not going to slow down, and will rather continue picking up speed. The impacts on e-discovery are already proving significant, and cannot be ignored or handled in ad-hoc workflows. The key to dealing with these challenges effectively is to be proactive and build flexibility into existing e-discovery workflows and tools. Legal and e-discovery professionals who invest in emerging data readiness to the extent possible will minimise the risk of unexpected and costly issues. More, those who embrace change will be in a position to keep pace with the rapid evolution of data and find opportunities to enrich their e-discovery processes with new layers of information.

Tom Jackson is a Senior Director in FTI Consulting’s Technology segment. He has extensive e-discovery experience and specialises in the development and customisation of review and reporting workflows and the reduction of review volumes through the application of both Relativity and broader data analytics practices. 

The views expressed in this article are those of the author(s) and not necessarily the views of FTI Consulting, its management, its subsidiaries, its affiliates, or its other professionals.