David Madore's WebLog: Some fossils found in the Turing tarpit

For those not familiar with the expression, the phrase Turing tarpit refers to a situation in computer science (typically, a programming language) from which “everything” is possible (everything that a Turing machine can do can be done) but nothing is easy. Some programming languages (such as the dire Unlambda which I unleashed upon the world) achieve this situation deliberately, but more frequently it occurs unsought for. As a matter of fact, I do believe we are all—all of computing, that is—in the Turing tarpit. Some are deeper than others, of course.

Here, then, is a ~~thought~~ rant as to what archaeologists of future times will find within the tarpit, under the auspices of a famous epigram by Alan Perlis,

A programming language is low level when its programs require attention to the irrelevant.

The irrelevant refers to anything that is not part of the algorithm behind the problem, the latter being what one would describe to a fellow human being when explaining how things work; or, if you will, the algorithm is what requires inspiration whereas the irrelevant is merely perspiration (apologies due to Thomas Edison). The irrelevant can be further categorized in: syntactic sugar (which, Perlis notes in another epigram, causes cancer of the semicolon), propitiatory rites (to ward off evil deities), reinventing the wheel (because you need a different color of wheel than the ones provided in the tarpit), feeding fuel to the Shaddock pump (if you haven't lived in France, you might not know what Shaddocks are—besides a kind of grapefruit—but it doesn't matter much), keeping track of the zorkmids, taking out the garbage, and generally working around the many gratuitous barriers which someone decided to erect in your way to prevent you from reaching your goal too easily. Naturally, what is irrelevant to one man, or in one context, can be of the highest importance elsewhere: another insidious form of tarpit is that in which one lacks control over something because it was arbitrarily categorized as irrelevant.

The dominant programming language of our times is probably C. Accordingly, C is responsible for the sorry state of most programs nowadays: it lacks even elective bounds checking, strict type checking, stack checking and integer overflow checking, whether static (at compile time) or dynamic (at run time); and, certainly, C's absence of any kind of optional verification is the cause of a great deal of bugs and security holes in a vast repertoire of programs (one might argue, of course, that the bug is always the programmer's fault: prima facie this is admittedly true, but it is at least debatable whether the programmer's role is to count the zorkmids like a Byzantine monk, just as it is debatable whether a mathematician's work is to provide complete proofs in some formal system, a similarly tedious kind of job). Another of C's nastinesses is the lack of any sort of exception (even flow control is very limited, in the absence of labeled breaks, which require goto to be used instead), except as setjmp()/longjmp(), which has been so carefully maimed as to make it useless. Let us also mention the lack of inner (nested) functions, and the necessity of explicitely constructing any kind of closure or continuation as a manually allocated structure (for this reason, all callback data is systematically passed as void* and benefits from no kind of type checking), making all manner of functional programming or polymorphism absolutely impractical. C encourages the use of null-terminated strings, which cause all sorts of problems (such as mishandling of null characters, possibly with underlying security problems, in a huge number of programs). C requires the programmer to collect all his garbage himself: not only does it not promote the use of a garbage collector (or promote a specific one, and this is probably a good thing, because having a GC, let alone a specific GC, forced upon oneself, is not always nice), it actually discourages it in every way (for example, most of C's very extensive external library is only remotely usable with a GC), and allows only very conservative garbage collection. And let's not even mention the worthlessness of C's absurdly complex and aggressively syntactical preprocessor.

C was invented so that Ken Thompson and Dennis Ritchie could port Unix to the PDP-11, and it shows: it was designed to function as a portable assembler for Digital's computers, and it is dubious whether it is good for anything else (even as a general-purpose portable assembler, it isn't very good as it obscures many implementation details). C makes modular programming (at least top-down, functional and callbackable modular programming) very difficult, and any kind of code reusability is severely limited by the way C works (and the flat namespace is only one aspect of the problem). De facto, code reusability is low, metaprogramming is practically inexistent, even automated code analysis is extremely difficult, data structures are kept to a bare minimum, and all maintenance work on the code (such as upgrading to a new set of specifications, or providing compatibility bindings) must be done by hand. Worse: all these limitations have so severely penetrated the minds of programmers that many of them are persuaded that they are some fundamental part of computer science.

Given all these problems, one may ask of me: why program mostly in C, then? Besides inertia, there is the simple fact that all libraries are written for and in C, and using them in any other language means a load of extra work to convert the bindings and very little gain since the library semantics are in any case restricted to those that C can afford; and among these libraries I count the operating system I use, Unix, which is sadly dependent on C from its inception. So C is the mammoth that is bringing us all with it in the tarpit. In any case, I have almost entirely ceased to program, given that all programming languages that currectly exist are either full of tar or unusable, or both: it seems to me that the only really useful program to write would be a compiler for an entirely new programming language.

What about existing languages other than C?

Java, for example, seems to improve considerably upon some of C's most ridiculous limitations. Still, Java is rather conservative in the way it departs from C, its main improvements being usable exceptions and generalized garbage collection: they are not certain to compensate the pratical problems associated with Java (the lack of a really usable free-as-in-speech implementation, and the slowness of the existing environments). Even insofar as it differs from C, Java also is not perfect, however (it has pretty much failed at providing polymorphism). Likewise, C++ provides some improvements on C (but still no garbage collector, at the cost of an unreasonable complexity of the language). Functional programming languages are no better: OCaml is plagued by, besides a horrible syntax, a type system with an unreasonable number of features, the sum of all previous research work and experimentation; Haskell is highly elegant and very understandable, but its lazy evaluation is so pervasive and impossible to escape that it becomes a plague (not even counting the consequently atrocious performance); Scheme is a toy-language, its ridiculous standard library (not even a decent printf!) competes to make the language useless with the fact that one must create types “by hand” by composing pairs and tags. Common Lisp presumably has every imaginable feature from every imaginable language, but this makes it impossibly complicated. Perl is as ugly as my worst nightmares (except that perhaps Perl6 will be better, but so far this is complete vaporware); Python is better but still has some annoying defects (the gratuitous separation of expressions and instructions, which has no place in this kind of imperative high-level language, is certainly one). Prolog, or its much better cousin Mercury, is too specialized. The list is long (even just counting languages which I know: I have yet to learn Smalltalk, for example).

The bottom line is that good—high-level—programming languages, in my opinion, are still to be invented. But I don't think it would be impossibly complicated: merely learning the design principles and the motivating logic behind a number of very different languages which come closest to being good (say: Algol, ML, Haskell, Scheme, Common Lisp, Smalltalk, Java, PostScript, Mercury, Dylan, Erlang and Python), and avoiding each one's most glaring mistakes, should be a good start. (I'm not saying it is possible to produce a language that is excellent in every circumstance; but it is assuredly possible to produce one which is uniformly catastrophic.) Then we might start escaping the Turing tarpit, and then we can start thinking about truly novel features for languages (one that comes to my mind is: bounded resource reflexivity/sandboxes, which I believe no known language implements in any way).