Firewire Column

Print This Article

There is a lot of data out there . . .
by John C. Tredennick, Jr.
January 2004

(Report from the UC Berkeley School of Information Management and Systems.)

How much data are we creating every year? The folks at UC Berkeley’s School of Information Management and Systems released a groundbreaking study about the amount of data created in 2002 that reached some pretty startling conclusions. Here are some of the highlights.

  1. Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly on hard disks.

    Ever wonder how big is 5 exabytes? According to the authors, the 19 million books and other print collections in the Library of Congress would only take up about 10 terabytes of digital space. So, we created information equivalent in size to that contained in half a million new libraries the size of the Library of Congress.

    Paper storage represents about 0.1% of the total.
     
  2. The amount of new information stored on paper, film, magnetic, and optical media has almost doubled in the last three years.

      We are seeing a 30% growth of new information each year now. While paper is also increasing, the vast majority of new information is being stored digitally.
     
  3. Information flowing through electronic channels contained almost 18 exabytes of new information in 2002--three and a half times that recorded in digital media.
     
    To be sure, telephone calls worldwide represent the overwhelming majority of this transmitted data, but consider this:
    • Instant messaging (partly thanks to my daughter) generated 5 billion messages a day or 274 terabytes a year.
    • E-mail generated about 400,000 terabytes of new information.
    • The World Wide Web contains about 170 terabytes of accessible information--about 17 times what is available in the Library of Congress.

The group conducted its first study in 2000 and estimated that the world produced between 1 and 2 exabytes of unique information in 1999 (later revised to between 2 and 3 exabytes). In the summer of 2003, the group repeated the study, leading to these conclusions.

Here is my favorite chart from the study:

Table 1.1: How Big is an Exabyte?

Kilobyte (KB) 1,000 bytes OR 103 bytes
2 Kilobytes: A Typewritten page.
100 Kilobytes: A low-resolution photograph.
Megabyte (MB) 1,000,000 bytes OR 106 bytes1 Megabyte: A small novel OR a 3.5 inch floppy disk.
2 Megabytes: A high-resolution photograph.
5 Megabytes: The complete works of Shakespeare.
10 Megabytes: A minute of high-fidelity sound.
100 Megabytes: 1 meter of shelved books.
500 Megabytes: A CD-ROM.
Gigabyte (GB) 1,000,000,000 bytes OR 109 bytes
1 Gigabyte: a pickup truck filled with books.
20 Gigabytes: A good collection of the works of Beethoven.
100 Gigabytes: A library floor of academic journals.
Terabyte (TB) 1,000,000,000,000 bytes OR 1012 bytes 1 Terabyte: 50000 trees made into paper and printed.
2 Terabytes: An academic research library.
10 Terabytes: The print collections of the U.S. Library of Congress.
400 Terabytes: National Climactic Data Center (NOAA) database.
Petabyte (PB) 1,000,000,000,000,000 bytes OR 1015 bytes
1 Petabyte: 3 years of EOS data (2001).
2 Petabytes: All U.S. academic research libraries.
20 Petabytes: Production of hard-disk drives in 1995.
200 Petabytes: All printed material.
Exabyte (EB) 1,000,000,000,000,000,000 bytes OR 1018 bytes
2 Exabytes: Total volume of information generated in 1999.
5 Exabytes: All words ever spoken by human beings.

Source: Many of these examples were taken from Roy Williams “Data Powers of Ten” web page at Caltech.

This report has a wealth of interesting information and a great deal for thought. You can find it at at http://www.sims.berkeley.edu/research/projects/how-much-info-2003/.

Top


John C. Tredennick, Jr. (jtredennick@caseshare.com) is a partner at Holland & Hart and CEO of CaseShare Systems, an Internet company building paperless systems for the legal and business communities. He is also the Editor-in-Chief of Law Practice Today.