Latest News

The Evolution of Information Retrieval Technologies

Guest article by Luigi Salzano, Business Services Director – Systems Development, Pannone LLP & Masoud Saeedi, the Managing Director of Purple Frog Text*

The case for effective knowledge management and enterprise search is well rehearsed.  Information plays a vital role in a business’s success and survival. Yet research shows only a small percentage of corporate information is accessible and utilised. Law firms which cannot easily access all their information face a high cost to re-invent “lost” material. A recent IDC study, the Hidden Costs of Information Work, suggests that on average more than ten hours per week is wasted on surfacing information which already exists within an organisation.

It is small wonder, then, that firms are seeking not only to reduce the cost of retrieval, and also in the regime of cost budgets and fixed fees, to leverage their existing investments and knowledge.

In recent years there have been many developments in computer science which have required a fundamental shift in how Enterprise Search is approached. Natural language understanding, artificial intelligence, information retrieval, data mining, entity recognition and extraction have all come to the fore and it is in new technologies that these advances are best leveraged. They offer the ability for users to interact, direct and navigate their searches – an ability which in professional services, where lawyers are increasingly expected to operate systems autonomously, is absolutely necessary.

We can all agree that surfacing the knowledge held within a firm is a positive step, but too often systems have provided just part of the solution and have required a substantial outlay on not just licence costs but also in terms of implementation costs and timeframes. We have seen significant advances in technology and theory but we have not seen these filter through in their entirety into the frontrunners in the legal KM and ES markets.

Firms should consider different approaches which offer advances over conventional technologies, together with expedited implementation times. They should consider the organisation of their information as the creation and curation of information sources can prove to be costly for firms. Whilst for some this is seen as a necessary part of a Knowledge Management implementation, for some it is an expense which can be avoided.  We have all heard horror stories of two year implementation cycles for Knowledge Management and Enterprise Search systems. A system which does not cause any disruption to the sources from which it is compiled but rather consumes the information in situ offers a lower cost of implementation with an equal quality of data collection.

In this information age, the solution to the problem of classification should not be to invest huge resource into manually classifying and tagging documents, but rather to use the content of documents to deduce and infer their classification. A system which can be trained by providing a set of documents for each taxonomy (or class), or even clusters the documents solely based on their content, and utilises contextual information to make better and more transparent classes of documents that are directly relevant to specific users, or groups of users, would provide a much more efficient implementation cycle with a much lower total implementation cost.

Search is the fundamental platform for enterprise knowledge management functions. It should be intelligent, intuitive and adaptive. It should learn from a user’s behaviour, profile and expertise area and create a suitable and relevant context for each search. And if we are to enable lawyers to operate autonomously and efficiently, information dilution and redundancy must be avoided to minimise the lawyer’s time using an enterprise search system.

In the conventional federated search model used by most current providers, the host search engine receives the query, performs its own retrieval operation and passes the query to other search engines. When a user makes a query, the search engine receiving the query performs an amount of computation in order to retrieve all the relevant information. The interpretation and computational model of search engines differ from each other, which means a query on one search engine may be interpreted in a different way on another. This means their results can be dramatically different and not necessarily relevant to the original intention of the search. After all the results from every search engine are returned, the host search engine ranks and presents them to the user according to its own specific algorithm. This can create a huge amount of information irrelevancy and dilution as each search engine has its own mechanism of finding information and ranking mechanisms. The result can be far less accurate with huge amount of irrelevant information provided to the user.

It’s clear that a more efficient and effective model is one in which there is only one search engine, which performs all the operations according to the same computational model and returns all relevant documents with a high level of accuracy. Federated search can no longer provide satisfactory solutions – the level of accuracy, precision and recall it offers is not reliable.

Further, most Enterprise Search systems provide a “top-down” design forcing subsequent search and other related operations to be based on general and non-specific information. No matter how sophisticated the search technology, lawyers find themselves in a search space that is diluted with unrelated information, which reduces the search focus and accuracy. Information organisation and collaboration are also adversely affected by this design.

A more useful paradigm is a “bottom-up” design, where the system is geared up towards all the requirements of each individual lawyer with information sources specifically tailored for them. The benefits of this user-oriented design are better and more transparent knowledge management, high-level accuracy and efficiency together with cost savings in both manpower and time.  This approach can dramatically reduce information irrelevancy, information duplication and redundancy and information dilution.

The key challenges for firms in the information retrieval space are familiar – a high licence cost, together with a high cost of implementation, can prove barriers to implementing an effective KM and information retrieval strategy.  Whilst this strategy can unlock hidden knowledge, save money, mitigate risk issues and, importantly, reduce firms’ cost to serve their clients, technology limitations and cost have in the past obfuscated the path to a clear strategy. Firms should explore the new technologies on the market to rapidly unlock the benefits of enterprise search and knowledge retrieval.


* SynAPPs by Purple Frog Text addresses the key themes identified here. Its innovative design has made it possible to customise and adapt all our products, including our SynAPPs Hub, for each user according to his or her requirements. The direct benefits of this user-oriented system are high-level accuracy and efficiency together with substantial cost savings in both manpower and time. SynAPPs is designed to overcome information irrelevancy, duplication and dilution. For more information visit