David Madore's WebLog: How Web pages bit rot

Index of all entries / Index de toutes les entréesXML (RSS 1.0) • Recent comments / Commentaires récents

Entry #0959 [older|newer] / Entrée #0959 [précédente|suivante]:

(Sunday)

How Web pages bit rot

Moving this site to its new location was the occasion for me to try to form a clear picture of what's actually on it. The site map is supposed to answer that question, but I'm not sure it does so in a satisfactory manner. It's not even easy to say how many documents there are: there are presently 1689 files in my public_html directory, but I estimate that about 1200 make up the actual public content (the rest can be private links, ancillary data, configuration files, CVS control structures, backups, temporary files that escaped deletion or programs of various kinds). Similarly, the total disk size is in excess of 90 megabytes, but it's hard to say how much of it is real public content: probably around 40 to 60 megabytes.

What really amazes me is how huge parts of it are out of date, sometimes hopelessly so. It's not so much a Web site as an archaeological site. One can, in fact, discern several strata according to the page style (or HTML version) used. In fact, I've just gone through my archives and found a version[#] of what my Web site was seven years ago (on 1998-03-11, actually): it turns out that some pages nearly haven't changed in all that time.

Bit rot is a very real phenomenon, on the Web specially so. You write a Web page with something that is accurate and valid and then, five or ten years later, you find that everything is wrong: all the links are broken, the HTML is obsolete, and the content itself no longer describes the current situation. Sometimes you had lef the page alone and forgotten about it and then, out of the blue, you get an email asking you to change something (fix an obvious mistake, or correct a specific broken link). Now what do you do about it? You can forcibly remove the page (and instruct the Web server to respond with a 410 gone HTTP error code): but that's not very nice for people who might have found the content useful or interesting, albeit outdated. You can leave the page as is, or perhaps add a prominent notice that it is out of date and will not be updated any more: but that's a sort of wishy-washy solution that isn't intellectually pleasing. You can make the effort of freshening the page, rewriting the parts that need to be, and so on: but that takes a lot of work. You can make only cosmetic changes (correcting a few glaring mistakes, for example, maybe those that were requested by someone who mailed you about them): but that leaves the unsatisfactory feeling that you're not addressing the real problem. There's simply no good answer to that question. And I feel that huge chunks of my site are in that case.

One particularly serious offender is my math page on this site (distinct from my professional Web site), which hasn't been modified in any way in nearly three years: for example, it quite wrongly states that I have no publications—this is ridiculous, and it could even cause me some professional difficulties. On the other hand, the amount of work needed to make it accurate is not at all negligible, and I have so much more important things to do (in mathematics, at the very least) than freshen that page.

Some may wonder how it is that I find the time to write such an enormous amount of text and then am unable to maintain it. Well, first of all, the original impetus to writing a Web page on a specific subject is that I've come to learn a certain number of things on that subject and I want to forget all about it but make sure that the information is not lost; so when I write the page, I'm interested in what I'm writing about, but I may no longer be several years later. Second, one often fails to realize that it may be even more difficult and painful to update a page than to rewrite it from scratch.

This is how I find myself the owner of hundreds of Web pages that I consider—sometimes embarrassing—heirlooms of a former self and that I don't know what to make of.

[#] I thought I'd put it online, as a sort of Web exhibit. But I realized that most of the links I used at the time where absolute (in other words, they have the full address in them, which is no longer valid), so I'd have to change them, and perhaps remove some broken internal links (which I simply can't resurrect), and I guess I'd also have to remove all the email addresses to avoid spamming innocent people, and perhaps add a prominent disclaimer to every page saying it's an old version kept only for historical interest, and while I'm at it maybe I should make the HTML valid, and why stop there… only (1) it takes a lot of work making all these changes and (2) it's no longer the original file so the whole point of the Web exhibit is lost. So unless there's a lot of pressure, you won't get to see what my Web site looked like seven years ago. Not that it's of any interest, really, except to me.

↑Entry #0959 [older|newer] / ↑Entrée #0959 [précédente|suivante]

Recent entries / Entrées récentesIndex of all entries / Index de toutes les entrées