This past March, a British Newspaper called The Observer made a disturbing observation:
16 years after it was created, the £2.5 million [$4.3 million] BBC Domesday Project has achieved an unexpected and unwelcome status: it is now unreadable.... The special computers developed to play the 12-inch video discs of text, photographs, maps and archive footage of British life arequite simplyobsolete.
As a result, no one can access the reams of project informationequivalent to several sets of encyclopaediasthat were assembled about the state of the nation in 1986. By contrast, the original Domesday Bookan inventory of eleventh-century England compiled in 1086 by Norman monksis in fine condition
and can be accessed by anyone who can read and has the right credentials.
This ironic death of Domesday has been taken as a rallying cry for an increasingly vocal group of computer scientists and archivists who argue that we are in danger of losing our cultural heritageor at least that part of our cultural heritage that we have been foolish enough to commit to electronic storage devices.
Theres just one problem with this reasoning: its wrong.
Recently I was at a conference with David Stork, chief scientist of Ricoh Innovations, the Silicon Valley research center of that giant Japanese office products company. We were there to talk about computer security, but all Stork wanted to discuss was his idea for the Digital Lock Boxan as-yet nonexistent service that would allow people to digitally archive information in such a way that they would be guaranteed, with 99.9999 percent confidence, of being able to retrieve it at least 15 years later.
As Stork put it to me at the conference, a Word Pro 2.5 document on a [Macintosh low-density] 3.5-inch floppy, with an Illustrator 2.0 image with an out-of-date compression scheme, cannot be easily retrieved and viewed even a few years after it is created. Building a system that can store and retrieve digital information with high fidelity is an engineering project that is worthy of major government, corporate, and academic support. Stork even has a slogan at his fingertips worthy of a bumper sticker: Just save it!
But consider the Domesday project. Its true that the original discs can be played only on a Philips VP 415 Videodisc Playera system, designed by Philips specifically for the project, that could overlay every frame of extraordinarily sharp analog photo with 6 kilobytes of digital data. But advances in digital image compression technology made the VP 415 obsolete. Domesday was its first and last significant application.
That doesn't mean, however, that the data on the Domesday discs are gone forever. A group of dedicated engineers and electronic preservationists have painstakingly copied the information off the original discs and onto more modern systems. They have also created a computer program that emulates the BBC Micro, the special-purpose computer on which the Domesday system ran. This emulation allows today's standard PCs to play back the original Domesday videodiscs.
To be sure, this has all been an expensive and time-consuming process. But it has been done, proving that the process is possible. Not all digital material is worth preservingmost, in fact, is not. But Domesday was worth preserving and, as a result, it has been.
The real lesson of the Domesday Project is that nonstandard file formats carry a huge hidden cost. Because high-quality image and video compression hadnt been invented yet in 1986, the BBC saved a tremendous amount of money by putting the Domesday Project on a pair of videodiscs rather than stamping the data onto perhaps a hundred CD-ROMs. But those savings must now be cast against the real cost borne by those who must migrate the data into a modern format.
Some argue that its impossible to look into the future and determine which of todays formats will survive and which will go the way of the VP 415. Poppycock! As a society we have a very good understanding of what will make one file format endure while another one is likely to perish. The key to survival is openness and documentation.
It is simply inconceivable that documents created today in Adobes Portable Document Format (PDF), or images stored in the Joint Photographic Expert Group (JPEG) format, wont be decipherable on computers in the year 2030. Thats because both the PDF and the JPEG formats are well-defined and widely understood. Adobe has lost control of PDF: there are more than a dozen programs that can create PDFs and display them on a wide range of computers. In other words, PDF is no longer a proprietary format. The same goes for JPEG. Yes, Adobe may fail and new 3D cameras may make two-dimensional photography obsolete. But we will always be able to read files in these formats, because the detailed technical knowledge of how to do so is widely distributed throughout society.
What about the physical media itself? Although there are many examples of tapes and floppy disks being unreadable five or 10 years after they are created, there are many counterexamples as well. Generally speaking, people who make an effort to preserve digital documents have no problem doing so.
Take, for example, the electrical standard (sometimes called IDE, now called ATA) thats used by the disk drives in most PCs. Developed in the 1980s, the ATA interface has been significantly enhanced over the past 20 years. Yet with rare exceptions, you can take a hard disk drive from the late 1980s or early 1990s, plug it into a modern desktop computer, and read the files that the disk contains. Thats because the power cables, physical mounting brackets, data connectors, and even the electrical signals used by todays computers are compatible with the old drives. Whats more, todays PCs, Macs, and Linux boxes all can read DOS file systems created in the 1980s. If the disk spins, you can frequently get back the data.
Consumer optical storage media has evolved into an even more stable standard. Music CDs and CD-ROMs created in the 1980s are still readable on todays DVD drives. When the next generation of optical storage comes out, its likely to be backwards compatible as well. A disk drive unable to read old CDs would not be commercially viable.
Electronic archivists do have a significant challenge facing them: computer systems make it easy to put a tremendous amount of information in a single place. If you arent careful, its easy to lose all of this information at once. And todays computer systems are so tremendously reliable that fewer and fewer users are properly backing up their data; people just dont remember the bad old days when a computer might fail at a moments notice.
But on the whole, I think that electronic records are far more stable, more durable, and more likely to last than their paper equivalents. The technical problems are largely solved. We know how to create David Storks Digital Lock Box. Whats needed now is a plan to make long-term electronic archival services available to the masses.
Copyright 2003 Technology Review, Inc. All rights reserved