Article Tools

Print This Article

Systematic Discovery and Organization of Electronic Evidence
by Joseph L. Kashi
February 2003

Discovering electronic evidence, particularly from a large organization, can be more time-consuming and expensive than a litigator might imagine. Unless a litigator understands the many places where electronic evidence may be found, and how information flows within an organization, electronic discovery will likely be an unproductive, but expensive, hit or miss affair. This article proposes some thoughts about making discovery of electronic evidence a systematic and increasingly efficient process. Given that the information to be discovered is, by definition, already electronically searchable and subject to relatively easier organization before trial, systematic electronic discovery has a high payoff.

Although the nature and flow of information is frequently idiosyncratic, varying greatly from organization to organization, it's safe to generalize that the flow of information between different components of any organization is its life blood. When confronted with major litigation that demands sophisticated discovery, whether electronic discovery or traditional paper document discovery, identifying and modeling the manner in which information moves within your target organization is a key aspect of knowing what documents and information to look for, who the major actors are, where to look for that evidence, and how to secure it. Even if, by some quirk, your own client has unlimited finances and the ability to sift through every single record in any organization, there are some obvious drawbacks. Firstly, this would take years and your case might never get to trial or you'll run up against discovery deadlines. Secondly, it would be so expensive as to make the cost disproportionate to almost any litigation advantage. Thirdly, of course, a court would realistically enter a protective order barring that sort of discovery. Finally, even if you could surmount these problems, it's most unlikely that you would be able to sort through the masses of essentially irrelevant information to effectively find and use the few gems that will make or break your case before a jury. Intelligence agencies have a similar problem - sorting through mountains of contradictory, secondary and marginal data in order to find the few gems that show what's really happening.

It's fair to say that gaining an early, effective, and systematic approach to your electronic discovery efforts can make or break your case. Achieving that focus requires that you understand how information flows within your client's organization and your opponent's organization and understand some basic theoretical concepts about information.
Understanding how information flows within a particular organization is useful because it helps us identify:

  1. The types of data that are collected and the form and method by which such data is collected.

  2. The types of formal computer data structures and paper records that might be retained.

  3. Where we are most likely to find pertinent records and data.

  4. The identity of the principal actors, not all of whom may be obvious from an organizational chart.

  5. The types of informal records we might expect to find that are maintained idiosyncratically by involved individuals.

  6. The most likely areas of concentration, which I'll call "nodes," where particularly rich concentrations of useful documents might be found, which in turn may help us further focus our efforts in a more productive, cost-effective manner that in turn sharpens our litigation strategy.

  7. Where potential breakdowns in communication are most likely to occur either in our client's organization or in the target organization, which assists us in understanding what may have gone wrong, or conversely what did not go wrong - a crucial part of any plaintiff's or defendant's case.

  8. Determining which participants seem to have the greatest affinity for particular types of records and who is talking to whom about the issue at hand, helping us to hone in on particularly rich lodes of electronic discovery.

Organizing and Using the Fruit of Electronic Evidence Discovery

In order to track, analyze, and use this sort of data, a litigation database of some sort is almost mandatory. For cases in which you might expect to find tens or hundreds of thousands of documents, Summation is usually the program of choice. For small to medium cases, though, I personally prefer CaseMap 4, along with its associated time line program TimeMap. These programs are particularly flexible means of organizing discovery, understanding significant time lines, and using the resulting data at trial. You can download 30 day evaluation copies from After 30 days, the full-featured evaluation version won't work unless you purchase a permanent activation code from CaseSoft. CaseSoft charges $495 for one user, with discounts for multiple-user purchases, a cost that I consider worthwhile given the power and value that this program brings to small to mid-sized litigation.

If you have scanned and imaged any discovered electronic and paper discovery using a standard program like Adobe Acrobat, then you'll be able to directly associate a PDF file of each discovered document directly with its associated CaseMap or TimeMap entry and call up the imaged document with a single click within CaseMap - a fast and neat way to work with the discovered documents. You'll need the full Acrobat program, not the limited feature data reader available over the Internet. Plan on spending about $270 for Acrobat.

If you're dealing with many possible actors and large quantities of information, then an industrial strength management system such as IBM's Lotus Discovery Server 2.0 will greatly reduce the manual effort otherwise needed to identify and model information flow within a large entity. Programs like Discovery Server 2.0 automatically sort through the discovery target's electronic systems for relevant information, identify potentially rich "nodes," which may be particular authors, recipients, departments, or document types, and then map the relationships between the potentially most profitable targets for more focused discovery. The resources to do this sort of highly automated discovery will be expensive and will require counsel to formulate reasonable ground rules, probably incorporated into a discovery order using a neutral third party to conduct such discovery and protect privileged documents. If you're not using an integrated knowledge management program like Discovery Server, then you'll also need an advanced indexed search program with highly specific Boolean search functions, such as Concordance or DT Search.

Data mining software, often but not always used on mainframe computers, is also potentially useful in making sense out of large masses of otherwise undigested data. Some well-established data mining software, such as SPSS (now in version 11.5), works statistically with numerical data. Other knowledge management programs, such as Lotus Discovery Server, discover and group related textual documents, explicitly mapping the links between specific authors and various clusters of potentially interesting documents. The ability of Discovery Server to map the relationships between individuals and clusters of pertinent documents potentially makes it a very powerful electronic discovery tool in highly complex corporate and organizational situations, but setting up and using this tool requires experience and technical savvy that's beyond most attorneys.

Concept searching may be considered another form of textual data mining. Some very basic concept searching is employed with Internet meta-search engines. You'll be able to find a somewhat more complex and effective example of concept searching at the National Criminal Justice Reference System web site, Using NCJRS's mainframe-based Internet search engine, I did a quick concept search looking for studies that measured the accuracy of psychological evaluations in predicting future violence, a fairly complex concept, and was very impressed by the precision of the weighted search results, which were exactly on point. Copernic Enterprise Server ( ) has similar capabilities: it's designed to work with all common file formats and across an entire company. Copernic Enterprise Server has real potential as a low cost and relatively simple electronic evidence concept search tool.

Why Study Your Opponent's Information Patterns

The purpose of understanding how information flows within an organization is to make your discovery a deft and swift scalpel rather than a blunt instrument. Because data and the manner in which information flows may vary widely from organization to organization, you will need to have a good understanding of the organization's informational content, the people who create it, the types of documents that are used, the numerous categories and subcategories that define the information kept by an organization, how and where the information flows, and where the organization's information is stored, indexed, backed up, or otherwise maintained. It's also important to be able to quickly spot the most significant and important information (and significant gaps in that information), rather than be bogged down by reviewing, and possibly being distracted by, numerous documents and records that have only a tangential relationship to the issue at hand. Finally, you'll also need to understand the relationship between people who create the information, the documents and information they create, and the end users of those documents and information. You'll need to understand who is an internal expert, or at least highly knowledgeable, about topics critical to your search, and this may not always be obvious from organizational tables. You'll want to identify who is most often handling certain types of information. Finally, you'll also need to understand when and how data errors and distortions occur within a particular organization's internal data flow, because not all communication within an organization is clear, concise, and accurate.

Getting Some Litigation Guidance From Information Theory

Information theory is a theoretical mathematical description that models how electronic communication works. Although primarily applicable to assessing electronic communication systems, information theory and related cybernetics theories include several theoretical concepts that provide useful analogies to the discovery process, such as the "noise" inherent to any communication process and "feedback loops." Information theory concepts are now used much more broadly - indeed, even the National Institute of Health funds entire laboratories devoted to applying information theory to the biology of living organisms.

As analogies, information theory and cybernetic concepts have several important lessons for litigators struggling to undertake very large electronic discovery efforts in a systematic and productive manner. Luckily, because electronic evidence discovery is typically already in a searchable format that gives us the ability to zero in upon, and bring us closer to, the original data, information theory concepts are particularly applicable to electronic data discovery have the potential to greatly sharpen our discovery efforts.

Here are some crude, but real world, examples of how our thinking can be focused and our discovery efforts sharpened. These concepts apply whether you are asserting discovery demands or attempting to comply with reasonable discovery and disclosure while protecting your client from over-reaching. When we talk of "information" within an organization, we refer to any records and human communication, whether or not that communication contains factually accurate data. Indeed, there's a lot of inaccurate information floating around.
Because the author typically represents plaintiffs against larger entities and corporations, these concepts are phrased from the point of view of the party seeking discovery.

  1. As information is repeated within an organization, its meaning often becomes more and more diffuse until much of the original information becomes highly uncertain and very possibly lost. This is not some esoteric scientific concept - it is simply the underlying basis for the hearsay rule, stated more specifically and theoretically.
  2. It is not possible to readily reverse the noise process and to filter out any "noise" in the communications process. That means that we cannot work backwards and arrive at a completely accurate and certain knowledge of the original information. That's do to the human imprecision and uncertainty typically introduced into information as it is repeated throughout an organization. Thus, it is by far the most accurate, of course, to obtain documents that have been written directly by the person whose actions are being questioned or who recorded the original data.
  3. As information is repeated, it tends to become distorted and thus more difficult to ascertain the basis upon which an organization did or did not act. The information ultimately motivates and controls an act or omission becomes increasingly uncertain as we rely upon diffused data and secondary sources. Thus, again, it is by far the most accurate and powerful when the documents upon which you might later rely in litigation have been written personally by the actor whose acts are being questioned. Bill Gates's emails come to mind.
  4. Even when the information is transmitted in a relatively certain and unambiguous manner, the recipient may not understand ambiguous content because of his or her own experiences, biases and idiosyncratic use of language. Thus, extra weight should be placed upon clear declarations of intent and knowledge when made by directly involved actors, particularly when the document's recipients then act in conformance with clear directives and statements. An estoppel situation may have occurred.
  5. As "noise" and blurring increase, the resulting imprecision tends to drown out the true message, the "signal." In electronic communications, such as long distance radio or telephone relays, communications engineers use a concept of a signal to noise ratio. When the signal is high and the noise is low, then there is a great deal of certainty about the true data state. When noise predominates, then the signal to noise ratio drops and there is much more uncertainty about what is in fact being said and done. Thus, when only one or two persons are speaking for an organization, particularly when such people have been conferring, much higher reliability may be placed upon their statements as reflecting the true state of organizational intent and action. We have a high signal to noise ratio. On the other hand, when there are many different actors and persons influencing the process, and they are all producing documents that may be pertinent to a questioned transaction, or when they are issuing contradictory statements about what has been happening and why, then the organization's signal to noise ratio becomes very low. As a result, it's harder to understand what's important, what's not, what really happened, and why.
  6. As uncertainty increases, our ability to discern what is true and important in proving what actually occurred (the crux of any litigation) decreases. Thus, we should look, for example, for a series of documents written by a single major actor which state a consistent theme either throughout or alternatively which initially state one consistent theme or position and then sharply deviate to a new direction, corporate position, and theme. In the latter case, the actual reason for the sudden change may be very interesting and probative.
  7. There will always be some uncertainty about finding specific information, although the amount of uncertainty can be considerably narrowed, indeed calculated with a fair degree of precision, as we get more experience searching through large amounts of data. Standard information theory equations can help you decide when further discovery efforts or efforts to find data and comply with discovery requests will likely be unproductive, or, conversely, when further discovery will probably be money well-spent. Particularly when defending against accusations that discovery compliance is inadequate, uncertainty calculations showing the low probability of finding any more discoverable data, even with massive and costly efforts, may be very useful in establishing a well-founded basis for a protective order or resisting further discovery attempts.
  8. Look for feedback loops. Communications and documents do not exist in a vacuum. They're usually made for a specific purpose and will typically elicit one or more rounds of responses and comments by recipients and other interested persons. The concept of feedback has been corrupted in the popular mind to something akin to interpersonal communication, but it's much more. The American Heritage Dictionary defines a feedback loop as "The section of a control system that allows for feedback and self-correction and that adjusts its operation according to differences between the actual output and the desired output." Although that definition may seem most akin to the thermostats that keep our homes at a constant temperature or an aircraft's autopilot, the concept of feedback loops has surprisingly strong analogies in everyday communication. Look, for example, at email message threads and replies - these are verbal feedback loops where the initial authors and recipients regularly exchange roles, clarify concepts and intents, and expand or narrow a topic of discussion. Consider the documents and counter-documents drafted and circulated back and forth within a business that's trying to decide some issue crucial to your litigation - for example, whether to correct a known product defect. For that matter, look at the sequence of summary judgment motion practice: Motion, opposition, reply, oral argument, decision, appeal. Built into these procedures is an implicit feedback mechanism designed to ensure that faulty evidence, arguments, and decisions are ultimately corrected. Particularly in larger corporations and other entities, finding these feedback loops identifies the important actors and their relationship to each other and to the issue at hand, helping you focus your discovery efforts. The concept of feedback is useful to litigators in other, more direct, ways as well. For example, if you're trying to prove deliberate intent, what stronger evidence than plotting out a time line showing the incriminating documents and responses forming a feedback loop? Again, certain knowledge management programs like Lotus Discovery Server are specifically tuned to map out these relationships.

The Problems Inherent In Language Usage

Communication problems are not unique to the litigation process: It's true of all human endeavor and all human interaction. Even in physics, the hardest of the "hard sciences," discerning which scientific experiments should be relied upon and which contradictory data should be discarded has always been a fundamental challenge.

One of the best approaches to reducing the noise inherent to the litigation discovery process is the use of a software program which acts as a filter to help us sharply focus upon what we have in our case while reducing noise, redundant or distracting documents and information. Filtering out discovery "noise" was hard work for anyone doing traditional manual discovery. If you had a few hundred thousand documents to review, the process might take years and it is still highly likely that you would either overlook the most important documents or perhaps miss their significance and relationship to other discovery. Modern electronic discovery tools, using indexed search programs with highly specific Boolean search functions and thesaurus-based searching tremendously speed our search process while assuring a much more comprehensive search. Similarly, advanced knowledge management programs, such as Lotus Discovery Server 2.0, discern the central themes to any organization's information, organize it into appropriate subcategories by content, ascertain which people have particular affinities for what sorts of information and documents, determine which documents are the most important or frequently used within an organization, and generally relate people and discoverable documents. Although complex to set up, an automated knowledge management tool like Lotus Discovery Server can be the filter to bringing your discovery into razor-sharp focus.

Of course, using a crude electronic filter to sort through electronic discovery has its own drawbacks. Sometimes, such a filter is too selective, reducing the serendipitous but crucial discoveries. And, it's highly probable that searching for specific words or phrases will miss some very important evidence because human beings are not entirely predicable in their use of language and idioms. Our written and spoken words are imprecise from a computer's almost inhumanly precise point of view. For example, while you or I would understand from its context an ungrammatical conversation or document, or one filled with idioms, pronouns and synonyms, searching such materials electronically would probably miss many important documents and concepts. Indexed search engines can partially overcome this problem by searching with common synonyms as well as the original search term. For example, if you were deposing expert witnesses in an airplane crash, the witness might refer to the "aircraft," a vocabulary term that might not be found by a relatively simple search engine looking for "airplane" or "plane." Yet, the witness's phrasing is entirely understandable. A good synonym-based search program would build a thesaurus of synonymous terms such as "airplane," "plane," "aircraft," "Boeing," "727", or "B727" or "airliner", all of which would realistically relate to the same concept, a Boeing 727 aircraft that crashed.

As a result, although a good search program can partially correct vocabulary variations and grammatical imprecision, there will be inevitable noise that causes at least some imprecision in the original document or transcript and also in any later searches. Further, you, the searcher, are also human and have your own implicit search terms and concepts in mind, which arise at least in part from the culture in which you were raised and educated and which may or may not match the words used by the original author or witness or by anyone who has previously prepared the litigation database or indexed any documents in it. One solution to this potential mismatch is to first review the raw discovery product using a concordance program and then, before actually indexing the discovery documents, work out a very carefully controlled indexing and litigation database vocabulary, carefully training all indexers and users.

Even then, the electronic litigator must be prepared to cope with human imprecision and linguistic variations. Some years ago, I did an experiment where I used a number of different legal research tools to look for leading Alaska Supreme Court decisions relating to slip and fall accidents. I already knew two leading cases but wanted to see how readily legal research tools would help a novice find the two leading cases. Searching through a legal research database seemed to be the tightest possible test - generally, West's attorneys, practicing lawyers and Alaska Supreme Court Justices seem much more likely to use and reuse the same learned vocabulary and concepts to describe similar situations, at least compared to how an average corporate officer might phrase documents.

One would think that a generally precise and consistent database of this sort would produce nearly identical search results no matter how the electronic search is conducted or phrased and yet, despite my own expectations, each different search method (full text, key number, searching for specific words or phrases, natural language searching, and Boolean searching) produced very different results. None of them found the leading case or alternatively returned so many hundreds of cases as to lose the desired case in the background noise. Had I been a litigator who did not already know that leading case, I would not have found it and my briefing would be far more precarious.

Thus, it would seem that even relatively seasoned lawyers and State Supreme Court Justices do not always use identical words and phrasings and therein lie lessons for the electronic litigator. Noise is an inevitable concomitant of human communication. As a result, a highly automated, narrowly focused brute force electronic search will likely not find all critical documents and data - the breadth or narrowness of your electronic search will inevitably balance the convenience and speed of a tightly focused search against the increasingly greater manual effort inherent to increasing the probability of finding every critical document. Even when a search is broadened greatly, a litigator cannot be assured of finding everything - he or she can only be assured of finding more data to review and consider. Thesaurus-based weighted searches using concept searching and lists of search term synonyms seem to be the most effective single search method but multiple searches using varying search phrasing and differing search methods increase your chance of finding the smoking gun or leading case.

Most estimates of business recordkeeping suggest that more than 90% of all primary data now resides in computer systems rather than on paper. Because technology now makes creating and replicating that data very quick and easy, the amount of potentially discoverable data has increased dramatically, probably exponentially, over the past twenty years or so. As a result, our discovery burdens are potentially much greater, whether we are asserting discovery or seeking to comply with reasonable demands.

Effective technology, used in a systematic and well thought out manner, is the only way to deal effectively with today's information overload and the need to find the critical items that make or break our case.


Joseph Kashi is an attorney and litigator living in Soldotna, Alaska, who is active in the Law Practice Management Section and a technology editor for Law Practice Today. He has written regularly on legal technology for the Law Practice Management Section, Law Office Computing Magazine and other publications since 1990. He received his B.S. and M.S. degrees from MIT in 1973 and his J.D. from Georgetown University in 1976, and is admitted to practice in Alaska, Pennsylvania, and the Ninth Circuit and the U.S. Supreme Court.

© 2003 American Bar Association | Copyright Statement | Privacy Statement