Jump to Navigation | Jump to Content
American Bar Association

Litigation News
Tips from the Trenches »

Archival Research on the Web

By Don MacLeod

 

“The past is never dead. It’s not even past.” William Faulkner wrote that in his 1950 book Requiem for a Nun, but he may as well have been writing it today about the Internet. When most of us think of the online world, we think of its characteristic immediacy: ephemeral tweets, disposable pictures on Instagram, and up-to-the-second text messages. What doesn’t come to mind, though, is that the web is actually a giant archive because, in the electronic world, nothing is ever truly deleted permanently.


For researchers, the web’s value as an immense collection of times past outweighs its usefulness as a conduit of current information. For litigators in particular, this online attic represents a largely overlooked resource for turning up factual information. In fact, the web is a library of stored intelligence on people and companies, products and ideas, and a terrific place to learn the backstory on any imaginable subject. Digging up the past often sheds light on the present, and smart lawyers should know how to winnow out useful nuggets from a variety of interesting websites—free and commercial.


The Wayback Machine
When it comes to looking at the web’s past, the place to start is the Wayback Machine, a search tool from the nonprofit organization Internet Archive. The idea behind the Wayback Machine is perfectly simple. It takes a periodic snapshot of the web—yes, the whole, publicly available web, in all its messy glory—and then allows anyone who cares to catch a glimpse of what a website looked like at some time in the past. Just plug in a URL, such as www.whitehouse.gov, and the Wayback Machine will produce a clickable calendar to show the dates on which it snagged that website’s close-up.


The Wayback Machine is like a family photo album for websites; it shows how the pages have changed over the years. This vast electronic archive stores a mind-blowing 398 billion webpages in its servers as part of its effort to create what it calls “a digital library of Internet sites and other cultural artifacts in digital form.” Best of all, it’s free. With a few clicks, you can travel to the web’s past and see the digital evolution unfold, year by year.


When using the Wayback Machine, keep one important caveat in mind. Even though the parent company recognizes the value of its archive for litigation, it is a nonprofit organization with no in-house legal staff. And because Internet Archive is no deep-pocketed Google, it tries not to become embroiled in legal disputes. The site explicitly states, “The Wayback Machine tool was not designed for legal use.” Nevertheless, the organization does try, however reluctantly, to accommodate legal requests. It provides a standard affidavit for use in proceedings and outlines the fees for notarizing the affidavit, as well as the limitations for its use, in the frequently asked questions. Still, Internet Archive encourages lawyers to rely on stipulation and judicial notice as ways of providing archival materials to the courts without burdening them with unnecessary costs.


Google’s Cache
Not every look into the past requires digging up years of old information. Sometimes, merely seeing a page in a state before it was edited—that is, before embarrassing information was removed or an incriminating statement was scrubbed—will do. And that is a job for the Google “cached” feature.


As most everyone knows, Google periodically dispatches its crawler software to swallow the Internet whole. When you run a Google search, you get a list containing the websites and pages from Google’s latest crawl. That way, your Google search is fresh and timely. It is interesting, though, that Google also lets you see what the page looked like before Google last visited it. The cached feature is accessible by running an ordinary Google search and then clicking the dropdown arrow that appears next to the URL. Why might this be useful? Well, in instances where you bump up against the dreaded 404 error (“Page Not Found”), the cached feature may be able to reconstruct what was on a page before the content disappeared. Surprising information may still be within reach, even though it no longer exists on the current web.


A quick point of illustration. Some years ago, the Department of Homeland Security issued a confidential report on planning for terrorist attacks to state agencies responsible for responding to the hypothetical events. The report was inadvertently posted to a state government website in Hawaii and then taken down once the agency realized that it should not have been posted. But an enterprising journalist caught wind of the report, Googled it, and, as expected, saw that the report had been removed. The Google cache still had the copy of the report, and the reporter had a scoop. The cached feature is a backdoor shortcut to yesterday’s information.


Another untapped resource is online media databases. The cliché that today’s newspaper is the first draft of history is put to the test with the advent of online newspaper archives. These archives are no longer the musty “morgue” of clips that reporters once rifled through while fleshing out a story; instead, immense databases of daily news now provide immediate access to facts, names, business trivia, and the day-to-day minutiae of newsworthy events. The New York Times alone, which makes all of its editorial content fully searchable from 1851 on, represents a triumph of search technology. Like the reporter of old who nosed out useful facts from clips in file cabinets, today’s litigator looking for insight could do worse than to search the Times archives. The self-proclaimed “newspaper of record” is indeed an indispensable repository of news, but it is merely one of hundreds of newspapers that can be electronically winnowed for useful facts.


In addition to the Times, try the subscription website Newspapers.com. It provides access to more than 2,100 American newspapers, dating from the 1700s to the 2000s. Unlimited access is $79.99 a year.


Diligent researchers should also refer to the Library of Congress’s guide to newspaper archives to discover where to find repositories of local newspapers. These smaller publications can shine a light on past disputes, help locate heirs, or trace family members. Even the local Tribune can provide the type of details that once were accessible only to those willing to travel to visit physical repositories in remote towns and then spend tedious hours trolling paper or microfilmed contents. And searchable PDFs have transformed archival research into a faster and neater process; litigators should avail themselves of these superb resources. It takes some effort, yes, but mining for gems, literal or figurative, is always a challenging business.


For the well-heeled litigator, the gold standard for searching the news of the past is a Nexis password. The Nexis collection serves up more than 30 years of full-text news from thousands of newspapers, magazines, wire services, and public relations sources. It’s pricey but thorough, and in the hands of a skilled librarian or frequent searcher, Nexis can find that proverbial needle in a haystack that might clinch a key point in your argument.


Moving on—the U.S. Census Bureau’s compilation of statistical information adds to the variety of online reserves. Its decennial headcount doesn’t merely tot up the number of people living in the 50 states of the Union. It also pulls together a vivid statistical portrait of the nation. There are many ways to search the Census Bureau’s site, but the most useful is the American FactFinder. The FactFinder is designed to search dozens of data points from the most current census and compare them with trends from previous years. Search by demographic characteristics, economic data, housing numbers, and other official numbers. The results will offer statistical insight about communities across the country concerning wages, occupations, political districts, and a host of other factual scenarios that crop up during litigation.


Public Records
But what about more individualized or personalized information? “Skip tracing” used to be a job for hard-boiled investigators in hard-soiled trench coats. Now that public records flood the Internet, digging up information on individuals has never been easier. Yes, some of the most interesting and sensitive information about an individual will require a court order or a search warrant, but enough personal data are hiding in plain sight to make online searches for public records worth the time and effort.


As individuals live their lives, they sign contracts, get into trouble, buy houses, and obtain licenses to conduct a business or a profession. All of life’s major activities leave a trail that once was paper but now is electronic. Background checks, criminal records, and social media searches are all part of a person’s publicly available curriculum vitae.


There are two ways to round up public records data: Do it yourself or pay a content aggregator for the results of the rounding up they’ve already done. The first way is cheaper, but it’s also time-consuming. A confirmed do-it-yourselfer could sleuth her way to an extensive dossier of information on individuals and organizations by sifting through myriad databases for everything from political contributions at the Federal Elections Commission to real estate records at the local recorder’s office. You will spare yourself time and aggravation, though, by paying to see the digestible package of public records from content aggregators like Lexis’s People Finder or the web-based vendor Intelius. These companies do the legwork for you by harvesting the interesting data from county, state, and federal agencies and then offering the results of their work for a fee. Access restrictions to the data apply, and you will need a legitimate reason for looking at personal records. But whether you slog through the files yourself or rely on a commercial service to put its finger on an individual’s latest address—or real estate assets or pilot’s license or incarceration record or bankruptcy filing—you will be looking through the same collection of records.


The long look backward is not just the province of services like Google or public records research. Smart litigators who need to see the text of federal statutes from the past have a superb resource in the widely available U.S. Code archive from the redoubtable legal publisher William S. Hein. Its web service, HeinOnline, is best known for its comprehensive collection of law reviews. But Hein delivers on legal archives, too. Available by subscription or at your local law library, Hein cooks up a smorgasbord of old legal materials, including the complete texts of early federal codes and U.S. Statutes at Large, antique American case law, and, for the rules-ravenous, the complete Code of Federal Regulations from 1938 to present. Also look to Hein for the Federal Register from volume 1. If you still need to appease the legal scholar lurking inside your mind, there’s always Hein’s “Legal Classics Library” to scratch your intellectual itch. When your practice demands that you reconstruct the legal ecosphere from the Eisenhower years, HeinOnline should be your first stop on your trip down legal memory lane.


No compilation of online archival information would be complete without a reminder that Google Books rescues ink-and-paper damsels from the clutches of dusty library shelves, by e-printing millions of out-of-print and out-of-circulation book titles. Your law firm library probably doesn’t have a copy of the 1766 edition of Blackstone’s Commentaries on the Laws of England sitting on the shelf, but no matter: Google Books scanned the copy held in Munich’s Staatsbibliothek to make it as readily accessible and searchable as this month’s issue of the American Lawyer.


The depth of archival materials on the web cannot be overstated. Historical information comprises the vast majority of the web’s content. The sites outlined here are only a small selection of resources to help your look backward. The conscientious researcher should have little problem digging up the recent past. As Faulkner said, the past isn’t even past. With so much older information still floating around the Internet, that’s truer than ever.


Keywords: web research, web archive, cache, public record, archive, content


Don MacLeod is the manager of knowledge management for Debevoise & Plimpton LLP in New York and the author of How to Find Out Anything (Prentice Hall Press, 2012).


This article was adapted from a longer one that was published in the Summer 2014 issue of LITIGATION.


 
Copyright © 2017, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).


Back to Top