The Digital Dark Age - the Internet

One of the oldest, still surviving, electronic ‘texts’ is the Index Thomisticus (an index of the writings attributed to Thomas Aquinas) created by the Jesuit Priest and Scholar, Father Roberto Busa. The computerisation of this project started in 1949 when Busa was able to gain sponsorship from the founder of IBM and for many long years this huge project’s data storage medium was many thousands of punched cards.

Aside: There are probably younger readers who don’t understand what I mean by a punched card. See below for a wikipedia reference to this relic of computing history to fully appreciate why obsolescence of data storage formats can be a good thing.

Busa’s index was first published in book form in 1970s (a 56 volume set) and one of the few ways at the time the fruits of it creator’s labour could be made available. In 1989 it became available on CD-ROM as the commonly available storage formats finally began catching up with the magnitude of his work and in 2005 it became available on the World Wide Web (

This project has been Busa’s life work (he was born in 1913) and has no doubt been transferred through most of the digital storage platforms that computing technology has developed since the days of the punched card. Considering the subject of this series of articles one cannot help but think that without his dedication to the work (especially the value adding of computerised searches which the digital version gives) it might have been left languishing to die in some digital backwater.

Use It Or Lose It:

What I hope the example of Father Busa above shows, is that digital resources need to be maintained rather than preserved, and that it is only through this vigilance that a text can be guaranteed to propagate onto the next ‘latest and greatest’ platform as it comes along. However, even the regular checking of a digital document becomes a non-trivial problem when the application software, operating systems and hardware used to do so are continually changing.

So what are we to do?

Last article I mentioned that the current HTML file format, the back bone of the world wide web, was one of the file formats that was good for file preservation because it did not obfuscate the text. HyperText Markup Language (HTML) along with XML and DocBook are now all applications of SGML (the Standard Generalised Markup Language). SGML was created for the long term storage of official text-based records and was designed so that it encoded some of the semantics of the data as well as the text (both in a human readable form) to make it easier for the information to be reused at a later date.

SGML isn’t an efficient file format. SGML isn’t a good file format for page layout and styled typography. So SGML did not find much use in the rapidly expanding personal computing marketplace where more ‘efficient’ products that could provide a better user experience in terms of output were deemed necessary.

However, SGML (and its children) are very robust in terms of long term survivability and because they are open formats that are self-explanatory it’s easy for programmers on any platform to work with the information they contain. This is why Tim Berners-Lee and Robert Cailliau, the creators of the original world wide web project at CERN, made HTML an SGML-like language. They knew, that for the project to work as they envisioned, it would have to be very much a collaborative project to be developed on as many different platforms as possible, if it were going to work at all.

They were right.

These days, the world wide web is an extremely rich and sophisticated information source supporting many different media types (Note: despite this complexity, html files contain only references to files of other media types, it is the browser that brings these various resources together and renders them as a single entity on screen) and there are plenty of WYSIWYG (is this a term known by the youngest generation?) web page editors available, so self-publishing of your texts on the web is now entirely feasible IF, you can’t get anybody else to do so AND/OR, you want to ensure that your work is available to anybody who wants to read it (such as these articles).

What advantage does this give you with regard to the preservation of my work?

Billions of people (and many spiders) use the world wide web every day. If you have a site where people come to read your material (even infrequently) you will be willing to take that little bit of extra effort to keep it live and up to date, especially if someone reports a problem. This will mean your work is more likely to survive the inherit obsolescence of the advance of digital technology more so than if it is sitting on a CD somewhere (a technology ripe for obsolescence, in my opinion).

Ummm ... What about the spiders?

Well, that will have to wait for next week’s article, I’m afraid.


The Wikipedia on Father Roberto Busa.

About punched (Hollerith) cards

About Standard Generalised Markup Language

About Sir Tim Berners-Lee, Robert Cailliau and the origin of the World Wide Web and HTML

N.B. Please note that I although I use the Wikipedia (and WikiMedia Commons) a lot for references, this is for expediency and the familiarity of my readers. Anyone interested in further studies should make use of the references where available and understand the Wikipedia is a co-operative project contributable to by anyone and must always be looked at in that light.

Phill Berrie, August, 2008.