Jump to Navigation | Jump to Content
American Bar Association

ABA Section of Business Law

Business Law Today

The Tech Side of E-Discovery
Understanding Electronically Stored Information
By Robert L. Kelly
Lawyers have been dragged into the computer age despite some kicking and screaming. The new federal e-discovery rules provide yet another reason for practitioners to embrace computer technology. The proliferation of information technology (IT), with advances in both hardware and software, has required that the legal profession invest heavily in computer systems. First there were word processing programs and e-mail. Then technology advanced with instant messaging (IM), blogging ("blawging" for law blogs), streaming video, and podcasting. While the practice of law does not require a degree in computer information systems (CIS), a working knowledge of computer technology has become a marketable skill for those lawyers who have taken the time to become computer literate. If you think Moore's law has something to do with the rule against perpetuities, then it is time to brush up on your computer IQ.

Electronically stored information (ESI) is remarkable due primarily to its volume; a standard desktop computer can store the equivalent of 40,000,000 typewritten pages of information. New desktop hard drives have been developed that hold a terabyte of data. As printed text, a terabyte would occupy 100 million reams of paper (made from 50,000 trees).

E-Filing in Courts and Agencies
Various courts have motivated the bar to learn about computers by moving to ESI and Internet filing. For example, in federal courts, the PACER system (public access to court electronic records) is an electronic public access service that allows users to obtain case and docket information from federal appellate, district, and bankruptcy courts via the Internet. Federal courts also have gone to electronic filing of pleadings through the case management/electronic case filing (CM/ECF) system. CM/ECF is a comprehensive case management system that allows federal courts to maintain electronic case files and offer electronic filing over the Internet. You can now get copies of most pleadings in recent federal cases without leaving your office. In some specialized fields, state and federal agencies have implemented electronic filing systems. The U.S. Patent and Trademark Office has the Trademark Electronic Application System (TEAS), which allows you to fill out an application for federal trademark registration online, check it for completeness, and file it over the Internet. And, forthe first time in its history, more new patent applications are now filed electronically with the U.S. Patent and Trademark Office than through the traditional paper application process.

While a secretary or paralegal may learn the specific sequence of commands needed to properly file a case or check a court record, coping with the new Federal Rules on e-discovery requires someone with a more thorough understanding of ESI. This may be a lawyer or it may be a third-party vendor. If lawyers intend to handle the matter, they need to know enough computerese to talk intelligently with the client's IT personnel. This article will discuss a few basic concepts for those interested in learning and will describe several benefits of third-party vendors of e-discovery services for those who are not.

Computer Basics
As virtually everyone knows, computers, more specifically, digital computers, operate by converting information into a binary (two-state) code. That means every instruction that a digital computer executes and all of the information it stores is ultimately converted into ones and zeros. Although the adjective "digital" is rarely used, it serves as a reminder that nomenclature, particularly in written discovery, is important; although almost all computers are digital, there are some analog computers.

Computers have three basic functionalities. Input/output allows the computer to receive instructions and data and to display information, for example, on the screen. The CPU (central processing unit) does the computing. Memory stores information. These three functionalities are present in the most basic laptop computer and the largest mainframe. They also allow you to retrieve deleted e-mails and documents that you thought were erased and even track past Internet activity.

Definitions from Sedona
The following key terms are useful when discussing e-discovery. For the most part, the definitions have been adopted from the Sedona Conference on e-discovery. The Sedona Conference provides transitory and focused think-tanks to develop principles, guidelines, and best practices in the areas of antitrust, intellectual property rights, and complex litigation. It has been at the forefront in establishing widely adopted definitions for dealing with ESI in discovery.

Active Data. Active data is information residing on the direct access storage media of computer systems, which is readily visible to the operating system and/or application software with which it was created and immediately accessible to users without restoration or reconstruction.

Backup Data. Backup data is an exact copy of system data that serves as a source for recovery in the event of a system problem or disaster. Backup data is generally stored separately from active data on portable media, for example, magnetic backup tapes.

Data Filtering. Data filtering is the process of identifying for extraction specific data based on specified parameters (e.g., by key word, file type, or name).

De-Duplication. De-duplication is the process of comparing electronic records based on their characteristics and removing or marking duplicate records within the data set.

Legacy Data, Legacy System. Legacy data is information that an organization may have invested significant resources developing, and while it has retained its importance, the information has been created or stored by the use of software and/or hardware that has become obsolete or replaced. Legacy data may be costly to restore or reconstruct when required for investigation or litigation analysis or discovery.

Metadata. Metadata is information about a particular data set or document that describes how, when, and by whom it was collected, created, accessed, and modified and how it is formatted. It can be altered intentionally or inadvertently. It also may be extracted when native files are converted to image. Some metadata such as file dates and sizes can be seen easily by users; other metadata can be hidden or embedded and unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed.

Native Format. Electronic documents have an associated file structure defined by the original creating application. This file structure is the document's native format.

PDF (Portable Document Format). A PDF captures formatting information from a variety of applications in such a way that it can be viewed and printed as intended in its original application by practically any computer, on multiple platforms, regardless of the specific application in which the original was created. PDF files may be text-searchable or image-only.

Sampling. Sampling usually refers to the process of testing a database for the existence or frequency of relevant information. It can be a useful technique in addressing a number of issues relating to litigation, including decisions about what repositories of data are appropriate to search in a particular litigation and determinations of the validity and effectiveness of searches or other data extraction procedures.

Slack Data. When a file name is deleted, the underlying ESI is not automatically erased from the hard drive. A partial overwrite of the file with new material leaves part of the original file intact, and forensically retrievable, as slack data.

TIFF (Tagged Image File Format). A TIFF file has a .tif (or .tiff) extension. Images are stored in tagged fields and programs use the tags to accept or ignore fields, depending on the application.

(Sedona Conference Glossary: E-Discovery and Digital Information Management, May 2005.)

Avoiding Metadata: PDF and TIFF
A computer's input/output functionality allows scanners to convert a typewritten page or picture into an electronic image. It also allows electronic images to be converted to hard copy via a printer. There is an important distinction between image files and text files. Text files are easily searched using key words. An imaged document cannot be searched using a word search unless it is first converted to a text file using optical character recognition (OCR) or has been tagged with key words. OCR programs have evolved in the past decade so that an imaged page of legible text can be accurately converted to searchable text. The days of choosing between retyping a document and spending editing time on poorly converted OCR documents are almost behind us.

There are numerous different formats for storing digital images. PDF and TIFF are two of the most common types of electronically stored images. These can be recognized by their file extensions of .pdf and .tif (or .tiff). File extensions follow the file name and tell the computer the type of operation used in storing the file. This information allows the computer to successfully convert the stored file back into an image for viewing or printing. Both TIFF and PDF formats can be ported into popular litigation-support software such as Summation.

By producing documents as TIFF files, you can avoid producing metadata. Think of metadata as data that describes other data. For example, metadata may indicate that John Doe spent three hours on 1/1/99 editing a contract. If there is an issue regarding when the document was edited, the metadata may be critical evidence. As this article is being typed in Microsoft Word, metadata is being accumulated (as shown in the example to the right).

In this example, there is no hidden data or comments, but if this were a pleading or brief in which the lawyers have filled the comment field with remarks on strategy, producing the metadata could be a serious error. As with all discovery, ESI must be evaluated in terms of relevancy. In one case, metadata may be irrelevant. In another, metadata may be the most important evidence.

If you have produced a document in its native format (the format in which the files were created and in which they are normally maintained), it will typically include metadata that is easily accessible. Remember, since metadata is compiled automatically, simply accessing a file may change the metadata or even replace a date on a document. To avoid changing metadata, hardware is available that images a complete hard drive. In operation, the hard drive of a laptop computer, for example, can be physically removed from its housing, placed in a drive imager, and copied without changing any metadata.

Under the new Federal Rules, a request to produce ESI that does not designate the form of production is satisfied by producing ESI in the form in which it is ordinarily maintained or in a form that is reasonably useable. Again, the native form will typically include metadata unless it is first scrubbed using one of the many commercial programs available for that purpose.

The TIFF and PDF formats emerged early as the formats of choice for requesting and producing ESI. With TIFF and PDF, the documents are produced as image files. In particular, TIFFs closely mimic paper. It allows you to Bates stamp (a sequential numbering system) each image, redact privileged materials, and conduct word searches. If you do not wish to disclose the metadata associated with the document, TIFF allows you remove it prior to production. A number of e-discovery vendors have developed proprietary software that performs the imaging function of TIFF files but is enhanced to preserve additional information such as metadata. If metadata from an opposing party's documents is important, you will need to request native form files or an alternative format that includes metadata. Beware of requesting production without specifying the format. Opposing counsel may produce paper copies. In a case with literally millions of pages of documents, production in a specified electronic format may allow you to conduct word searches to narrow a bulk production to a handful of hot documents. In general, also remember that if you ask for the haystack, the court may not help you find the needle.

Metadata Ethics
The disclosure of metadata also raises ethical questions. While ABA Formal Opinion 06-442 advises that lawyers have no ethical duty to refrain from reviewing and using metadata embedded in e-mail and other electronic documents received from opposing counsel or adverse parties, some states such as Florida and Alabama have taken a contrary position. The unauthorized "mining" of metadata to uncover confidential information in electronic documents constitutes professional misconduct according to the Alabama State Bar's ethics panel (Alabama State Bar Disciplinary Commission Opinion 2007-02, March 14, 2007). The panel opined that a search for metadata could lead to the disclosure of client confidences and secrets, litigation strategy, editorial comments, legal issues, and other confidential information. The Alabama opinion also recognized an ethical obligation to use reasonable care when transmitting electronic documents to prevent the inadvertent disclosure of metadata containing client confidences or secrets:

Just as a sending lawyer has an ethical obligation to reasonably protect the confidences of a client, the receiving lawyer also has an ethical obligation to refrain from mining an electronic document.

Third-Party Vendors
Third-party vendors who claim expertise in e-discovery have proliferated at a rate almost as fast as the increase in computing power. E-discovery vendors have (hopefully) experience and may have proprietary software and even specialized hardware that can make document production and review less of a nightmare. There are literally hundreds of vendors in this nascent billion-dollar industry. Nothing in litigation involving millions of documents is inexpensive. For class action lawsuits, mass tort actions, and antitrust and patent litigation, the stakes may be so great that the cost of e-discovery is not a limiting consideration. Fees charged by most e-discovery vendors are typically broken down into hourly rates, per-page costs, and data volume rates. For example, one service charges $175 per hour for technical time; 11 cents per page for conversion to TIFF format, which includes Bates and confidentiality stamping; and $250 per gigabyte for initial data filtering.

There are several inquiries that should be made when choosing an e-discovery vendor. As with most service vendors, they are defined by their reputation. Accordingly, you should start by asking your partners about their experience with e-discovery vendors. If someone in your firm had good luck with a particular vendor, include it in the list of vendors that will receive your request for proposal (RFP). All e-discovery vendors have Web sites, so try to ignore the hype.

You need to figure out a few basic parameters of the project in order to get an accurate proposal from an e-discovery vendor. These include an estimate of the number of documents involved, the timing of discovery, and the approximate number of hard drives and servers that will need to be reviewed. Cost must not be the sole driver of your decision to select a particular vendor. You need to ask whether the prospective vendor is experienced in the type of project presented. The most important due diligence you conduct will be to ask for references from lawyers and law firms. If a vendor is reluctant for you to contact its past clients, it should be a red flag. Also ask about the vendor's standard representations and warranties and whether the vendor has been sued by any of its clients. If the vendor intends to use subcontractors, you should know. The issue of conflicts of interest also should be discussed. If the vendor has done work previously for your adversary, make sure there are no confidentiality issues. The Sedona Conference Working Group Series has an excellent paper on "Best Practices for the Selection of Electronic Discovery Vendors: Navigating the Vendor Proposal Process" at www.discoverymining.com/files/SedonaPropProc.pdf. If you hire the vendor and a problem arises, you should have a record of your due diligence investigation. Keep your notes. They may be the only thing that saves you from sanctions; Moore's law, which is the prediction by Gordon Moore (cofounder of the Intel Corporation) that the number of transistors on a microprocessor would double periodically (approximately every 18-24 months), will not help you.
Analyzing BLOOMFIELD-#832991-v1-e-discovery_paper.DOC

Document Name: BLOOMFIELD-#832991-v1-e-discovery_paper.DOC
Document Format: Word Document
Compatibility Options: Microsoft Office Word 2003

Built-in document properties:
Built-in Properties Containing Metadata: 3
Title: The Tech Side of E-Discovery
Company: Dickinson Wright PLLC

Document Statistics:
Document Statistics Containing Metadata: 6
Creation Date: 4/18/2007 11:53:00 AM
Last Save Time: 4/24/2007 4:17:00 PM
Time Last Printed: [Blank]
Last Saved By: *
Revision Number: 17
Total Edit Time (Minutes): 400 Minutes

Custom document properties:
No Custom Document Properties

Last 10 authors: NOT PROCESSED
An error occurred. See details.
Error Description: Cannot be processed against an open document.

Attached Template (Convert to Normal):
Attached to Normal

Routing slip:
No Routing Slip

No Versions

Track Changes:
No Tracked Changes

Fast Saves:
Fast Saves is Off

Hidden text:
No Hidden Text

No Comments

No Objects to be converted to Pictures

No Hyperlinks

Document Variables (VBA):
No Document Variables

Smart Tags:
No Smart Tags in Document

Remove Personal Information:
Remove Personal Information: Off

Include Fields:
Does not contain any Include Fields

Small Font Size:
No Small Fonts

White font:
No White Font Text
Kelly is a member of Dickinson Wright PLLC in Bloomfield Hills, Michigan. His e-mail is rkelly@dickinsonwright.com.

Back to Top