Comments on Un peu de programmation transfinie

jonas (2017-09-16T06:09:08Z)

In the first part of "Qu'est-ce qu'une machine hyperarithmétique ?" <URL: >, you create more and more powerful computation models by adding a halting oracle for a previous model. You then say that this construction stops after a countable number of steps. I hadn't understood why, but this same argument in your comment <URL: 2017-08-22T16:48:29+0200 <URL: > explains that too.

jonas (2017-08-27T19:33:28Z)

You should probably link to one more MathOverflow question you asked, about the construction of the interpreter: <URL: >.

Ruxor (2017-08-27T17:46:29Z)

@jonas: I think what you wrote is correct. At least, I'm quite confident that the general principle is correct, so if there are any errors, they are inessential to the argument.

As of writing this, however, the exact reason why arrays don't add anything to languages (0) and (2), and/or why it is possible to write an interpreter with fuel for language (0) in itself, continues to elude me. I've managed to get a copy of Stanley Stahl's 1974 doctoral dissertation, but while he gives a clear construction of the self-interpreter-with-fuel (Kleene T predicate), he does it assuming that a function of the language can refer to arbitrary prior values computed by the function (i.e., construction by transfinite recursion, which can be done with arrays just like you explain the converse construction), and for the reason why *that* is true, he still refers to the Jensen & Karp paper, so the latter seems to contain the _only_ proof of the fact in the mathematical literature, and it is really not enlightening. This is very irritating.

jonas (2017-08-23T01:53:01Z)

(My reaction during writing that proof: ARGH this is like my thesis all over again, I keep running out of sufficiently mnemonic letters I can use and keep having to do global renamings.)

jonas (2017-08-23T01:51:02Z)

Ok, now for something trickier. I want to prove that arrays don't add any power to the language, without a self-interpreter, but only for the languages (1) and (3)_λ. My question is whether this works.

So we have a (1) or (3) program with arrays. We want to transform this to a new program that doesn't use arrays. The main idea is that whenever the original program reads an array element, the new program will rerun the whole program up to that point, but remembers the index of the array element, tracks the element of that index only of the array in a scalar variable, and also remembers when to stop.

Here again, I assume the language specifies the order of evaluation of expressions in the program uniquely. I also assume that break statements cannot break out of a function.

First, rewrite any complex expression in the program so that function calls and array element reads and writes and returns only appear as separate declaration statements of the following forms:

local e=G(j…);
local e=b[j];
return v;

Here G is the function called, b is the array variable accessed, e is a newly introduced temporary variable, and j… and v are plain variables. For this, you may have to break out some more expressions to separate statements, convert short-circuit operators to if-else statements, and introduce more temporary variables.

For the top-level function M(i…), we'll create a top-level function M(i…) in the new program that shall behave just like M. For each function F(a…) in the original program, we create a functions F'(t, g, w, i…, a…) in the new program. The single prime functions will maintain a timer value that increases monotonically during the run of the original program, and increases by one at every array element read in the original program. These functions take the starting value of the timer t as an argument, and return the new value of the timer t' as the car of their return value. In the function F', the second argument is a goal value telling when to stop execution, and w gives an array index.

When you call r'=F'(t, g, w, i…, a…), it simulates the call r=F(a…) up to the point when the timer value jumps to g, or up to the completion of the call if the timer value is never equal to g during that. In the former case, when the timer becomes equal to g, an array element read would happen in the original program with the index w, and r'=g#e where e shall be the value of the array element read. In the latter case, r'=t'#r where t' is the timer value after the call, and necessarily 0==g. The precondition is that this call to F happens at timer value t in the original program, whose top-level call was M(i…), the and timer started at 1.

Now we have to write the new program. We forward declare every single prime function first. The new top-level function is easy:

function M(i…) { return M'(1, 0, 0, i…, i…); }

Note that the timer never becomes 0, so this will run to completion. Now we have to define each function F'(t, g, w, i…, a…). The body for this shall be the body of the function F(a…), except with the following rewrite rules that replace array declarations, function calls, array element writes and reads, and return statements.

local b[]; ===> local b'; {0}

local e = G(j…); ===> local e' = G'(t, g, w, i…, j…); t = e'.p; if (g == t) return e'; local e = e'.q; {1}

b[j] = v; ===> if (w == j) b' = v; {2}

local e = b[j]; ===> t+=1; if (g == t) return t#b'; else M'(0, t, j, i…, i…).q; {3}

return r; ===> return t#r; {4}

Notes. Throughout this call, t simulates the timer and b' simulates b[w]. In {3}, the call to M' starts a new sub-simulation of the whole original program, with the timer following the timer in the original program and so decoupled from the timer t in this function body. This sub-simulation is identical to the outer simulation except that the values of g and w are different, and that it will stop right at the array element read this very statement is simulating. In the sub-simulation, the timer will reach t exactly when this read statement happens, with a stack frame corresponding to the same stack frame of the original program as the stack frame in this outer simulation. (The w and j of the sub-simulation will also be equal to the j here.) Thus, in the sub-simulation at that point, the condition in the same transformed statement will be true, and the simulation stops.* Rule {1} simulates function calls of the original program, and ensures that if the simulated function call chain broke out early in a {3} statement, this early break is propagated to the top of the simulation, until the call to M' in the same {3} statement and corresponding stack frame one level of simulation above.

All scalar variables of the original program are simulated by scalar variables of the new program in a straightforward way, so the limsup rule in the new program simulates the limsup rule of the original program for those scalar variables. For each array variable b, the new program keeps only the value of b[w] for the value of w in that particular simulation in a new scalar variable b', so the limsup rule of the new program simulates the limsup rule of the original program for array elements. The values of g and w and i… are constant within the same invocation of the same function, so the limsup rule of the new program also leaves them constant. The value t tracks the timer and only never decreases within a function invocation, so when the limsup rule of the new program sets t to a limsup, its value will be a greater than equal to all the previous values in the same function invocation. (Sometimes the limsup rule will set t to greater than each of its previous values, in which case that value of t doesn't correspond to any array element read.) Other variables of the new program are newly introduced temporary variables that are assigned only once, so the limsup rule doesn't change them either.

To make the proof of the simulation rigorous, you would have to use a transfinite induction to make sure the new program won't run into infinite recursion. The induction is on pairs (T, g) where T is an evaluation step of the original program, and g is the parameter g in that simulation. The induction statement is that the simulation is accurate up to that point. T only ever increases within a level of the simulation, so we can use induction for the previous steps of this level of simulation. When the simulation starts a sub-simulation, then it sets g of the sub-simulation equal to the timestamp t in the simulation, so the sub-simulation will stop right at the evaluation step T, and also either t < g or 0 == g, because a state where 0 < g && g < t is never reached during the simulation, and if t == g then we're breaking out of this level of the simulation and not calling sub-simulations anymore.


* If this paragraph makes your head hurt, that just means I don't write about mathematics in such a clear style as David usually does.

jonas (2017-08-22T20:25:27Z)

Yes, I did suspect that ω_1 is admissible, which is why I asked for countable admissible ordinals. I hadn't worked out the proof, and “everything remains countable” isn't precise enough, but you do give the right hint in your comment. Cofinality is the key. (I'm writing down the proof you already understand to myself, to better understand it, just like how you write your blog entries to better understand their topic.)

Assume for simplicity that the language actually specifies a unique sequential order of execution, you just didn't document that in the article. (This makes sense because you clearly wanted (0) to be a deterministic and total language, which more or less requires a well-defined order of execution.) Also assume that you have a successor operation in the language, and that you remove addition, multiplication, pair construction and pair deconstruction from the language, implementing it from successor, comparison, and the control structures the language provides. (You have pointed out that this is possible.)

Let λ be an infinite regular ordinal, and all inputs less than λ. Run any (3)_λ program. If you put the steps of the execution trace of the program in time order. This is a total order because of sequential evaluation, and in fact a well-order because you define language runtime semantics as the least function satisfying your inference rules. (That sentence makes me cringe as I think of the language lawyers that argue about the precise definition of multi-threaded semantics that is useful enough but future-proof.) The only execution steps that are not successors of another execution step are the ones where you have to use the limsup rule: loop iterations when the loop counter is a limit ordinal, and loop completion when the loop bound is a limit ordinal.

Now you prove by transfinite induction on the steps that at every step, the number of steps so far, the contents of all the variables, and the value of expressions computed must be less than λ. In successor steps, the number of steps is a successor, and any newly computed value is either a natural number or is computed from existing values by the successor operation. Since λ is a limit ordinal, it can't be computed this way. In limit steps, the loop counter or the loop boundary of a completed loop is a previously computed value, and all newly computed values are either that loop counter or loop boundary, or the limit of some sequence of values computed in the previous iterations of this loop, where the loop iterations were indexed by values already computed. Since λ is a regular ordinal, you can't get it this way as a limit indexed by smaller ordinals.

ω_1 is known to be a regular ordinal, so this proves that it's an admissible ordinal. (It also proves that ω is an admissible ordinal and that the language cannot create infinite values from finite ones.)

You are right of course that a simple argument can prove that there are admissible ordinals between ω and ω_1. (This works for any of the languages you define separately.) This proof still needs to use the fact that ω_1 is admissible. So let λ be at most ω_1. There are countably many programs, and each program takes a fixed finite number of inputs, so there are countably many pairs of programs and input vectors with each input less than λ for that program. If you execute the program with the input in each of these pairs, you always get an ordinal less than ω_1 as the result. Since there are only countably many of these results, the supremum of these results has a cofinality of less than ω_1. Since ω_1 is regular, the supremum must be less than ω_1. Call this supremum Φ(λ). Clearly Φ commutes with limits on increasing sequences of ordinals, and this argument proved that Φ(ω_1)<ω_1, so by a fixed point theorem, Φ has a fixed point less than ω_1, and in fact it has fixed points strictly between any countable ordinal and ω_1.

Ruxor (2017-08-22T14:48:29Z)

@jonas: If you just want the existence of many admissible ordinals, the argument is fairly easy: take the first uncountable ordinal ω₁ (or in fact any regular cardinal) as loop ordinal: this is admissible for cardinality reasons (everything remains countable, and countable sequences in ω₁ remain bounded). Now you probably want *countable* admissible ordinals, but this is still fairly easy: keep ω₁ as loop ordinal for now: then there are only countably many computable functions in the language (3), and by taking the closure of ω, or any given countable ordinal α, under *all* of these functions (as well as "taking any ordinal smaller than any ordinal we already have", so we construct ordinals, not arbitrary sets of ordinals), we get a countable ordinal σ (because we close in countably many steps by countably many functions), which is ≥α, and stable under all ω₁-recursive functions; but if it is stable under all ω₁-recursive functions then *a fortiori* it is stable under σ-recursive functions, i.e., it is admissible. This shows that for all countable α there is a countable admissible ordinal ≥α (i.e., the admissible ordinals are "cofinal" in ω₁ or, in fact, in any regular cardinal). In fact, the ordinals constructed by this reasoning are way more than admissible, they are "stable", which is a very (very!) strong condition.

Proving that a specific ordinal is admissible, on the other hand, *is* difficult, you're completely right about this. But then the question is how this ordinal arises or how it is described. The obvious one is ω₁^CK, and it is indeed not obvious to show that it is admissible: this is essentially the theorem known as the Σ¹₁-boundedness theorem (theorem IV.2.1(iii) in Hinman's book I referred to), but of course, this is for a particular construction of ω₁^CK.

jonas (2017-08-21T23:38:04Z)

Your short proof about how a single unbounded loop at top level is enough does look elegant, but as I think more about it, I'm starting to worry that the complexity of the proof may be hidden in proving that a particular power λ is admissible. Is it possible to prove that there is a countable ordinal greater than ω that is admissible, for either the (2) or (3) language without arrays you define here, without going through a Gödel coding argument that you want to avoid?

Ruxor (2017-08-21T12:33:23Z)

@Arnaud Spiwack:

Merci pour la précision sur le terme d'« essence ».

Pour les paires, effectivement, le manque de continuité est source de problèmes. Disons que c'est la raison pour laquelle même les tableaux finis (non bornés a priori) ne sont pas évidents à implémenter ; pour les tableaux possiblement infinis, la difficulté est encore plus grande. J'avoue que je ne comprends toujours pas bien comment se fait la magie, ni si on peut échapper à essentiellement refaire la construction de l'univers de Gödel.

Pour le reste, je n'ai pas de réponse… je me suis aussi posé des questions relativement proches. Par exemple, est-ce qu'on peut définir un lambda-calcul non typé qui soit équivalent à la α-récursion ? Je n'en sais rien (le problème a l'air d'être proche de celui de donner un sens transfini à la récursion), je ne vois même pas à quoi ça pourrait ressembler.

Pour ce qui est de définir quelque chose de transfini typé de façon analogue du système T, en revanche, je soupçonne que ça doit être possible, et que la classe des ordinaux stables par les fonctions en question serait une classe relativement naturelle dont le plus petit élément serait l'ordinal de Bachmann-Howard (probablement assez technique à prouver, même si une inégalité doit se déduire si on peut formuler le système T dans la théorie des ensembles de Kripke-Platek et y montrer sa normalisation forte ou je ne sais plus ce qu'on est censé dire).

Arnaud Spiwack (2017-08-21T10:02:40Z)

Très intéressant tout ça, merci.

Pour l'essence, c'est un terme courant dans le monde des langages de programmation totaux. Mais c'est une communauté essentiellement disjointe de celle de la calculabilité (qui utilise peut-être une expression différente, ou pas d'expression du tout, pour la même chose). Je pense que c'est de là que vient la difficulté à la googler.

Quelques remarques & questions:

- Une des difficultés en passant aux ordinaux me semble être que, pour le codage de la pair x#y, la limite sup de x#y n'a pas de relation particulière avec la limite sup de x et celle de y (il me semble comprendre que c'est un peu la même chose qui rend le codage des tableau difficile, non?)

- Du coup, ce serait intéressant de se donner une théorie des types de données (probablement les types de donnée récursif du premier ordre, avec du coup, les listes chaînées qui peuvent avoir des longueurs transfinies, je suppose; et d'ailleurs, le codage des entiers à la Peano: `N = Z | S N` coïnciderait-il avec les ordinaux? Ce serait curieux).

- Une extension qui n'est pas considérée dans ce post c'est l'ordre supérieur (prendre des fonctions en argument (qui elles mêmes peuvent… etc…)). Dans le cas fini, on tombera sur le système T de Gödel. Ce n'est pas Turing complet (toutes les fonctions terminent) mais c'est tout de même très expressif: on peut définir toutes les fonctions PA-définissables (et réciproquement). Ce serait intéressant de voir quel effet ça aurait sur le cas transfini (je n'en ai aucune idée).

- C'est dommage (un peu troublant même) que la récursion n'admette pas d'extension intéressante. Est-ce sans espoir, ou est-ce juste compliqué?

Ruxor (2017-08-21T07:12:46Z)

@jonas: Good questions. I hope the following clarifies the situation:

• The operation # on ordinals ("Gödel pairing") extends the operation # on natural numbers, but the simple definition I gave for the natural numbers (y²+x or x²+x+y) does not work.

• Yes, it's still a bijection between pairs of ordinals and ordinals (that's it's whole raison d'être; that's pretty much the only thing we care about the operation, apart from some weak monotony properties).

• Explicit formulas could be given, but they must either involve a tedious number of cases (I gave a sample of what they would look like) or use an ordered sum (in which case it's not too complicated, but not very explicit: x#y is sum(u·2+1 for u<y)+x when x<y, and sum(u·2+1 for u<x)+x+y when y≤x; here, sum(u·2+1 for u<t) is the sum of the u·2+1 for u ranging over the ordinals <t in order; what fails for ordinals is the rewriting of sum(u·2+1 for u<t) as t²).

• The operation and its inverses could indeed be defined in the language (0) if they were not part of it. In fact, I think the following will work:

function make_pair(x,y)
    local max = x<y ? y : x;
    local sum = 0;
    loop (local u < max)
        sum += u*2+1;
    return x<y ? sum+x : sum+x+y;

function unmake_pair_p(z)
    local max = 0;
    local sum = 0;
    loop (local u < z) {
        local newsum = sum + u*2+1;
        if (newsum > z)
        max += 1;
        sum = newsum;
    loop (local x < max) {
        if (sum == z)
            return x;
        sum += 1;
    return max;

function unmake_pair_q(z)
    local max = 0;
    local sum = 0;
    loop (local u < z) {
        local newsum = sum + u*2+1;
        if (newsum > z)
        max += 1;
        sum = newsum;
    loop (local x < max) {
        if (sum == z)
            return max;
        sum += 1;
    loop (local y < max) {
        if (sum == z)
            return y;
        sum += 1;
    return max;

— So maybe I shouldn't have put the operation in the language itself; but then, by the same logic, I might similarly have dispensed with multiplication (and even addition, using only successor).

jonas (2017-08-21T02:39:32Z)

I'm a bit confused about the cons operator #. Are you saying that when you extend it from the natural numbers to the ordinals, then you can't describe it with a simple formula like the one you gave for the natural domain? Is it still an extension of the operation for the naturals? Is it bijective in the sense that for any ordinal x, there is exactly one pair of ordinals p and q such that x = p#q ? Is it possible to define the operation in the (0) language on ordinals, without using the built-in cons operator or its built-in inverses?

ooten (2017-08-20T21:06:58Z)

Merci pour ton expertise, Ruxor 😉

Ruxor (2017-08-20T20:51:06Z)

@ooten: OK, mais ta question est vraiment mal définie, donc je peux donner plusieurs éléments de réponse dans des sens opposés :

Dans le sens « non » :

• Dans ZFC, ω est fourni par un axiome ad hoc (l'axiome de l'infini), qu'on sait qu'on ne peut pas démontrer parce que l'ensemble de tous les ensembles hériditairement finis (c'est-à-dire finis et d'éléments finis à tous les niveaux) forme un modèle de ZFC sans cet axiome. Ceci suggère qu'on ne peut pas faire apparaître ω autrement qu'en postulant son existence.

• S'il y avait un calcul sur les entiers naturels, menable par une machine de Turing ordinaire, donnant ω comme réponse, cela permettrait aux ordinateurs de faire des calculs infinis, ce qui n'est manifestement pas le cas.

Dans le sens « oui » :

• Certes, dans ZFC, il faut postuler l'existence de ω, mais c'est quelque chose d'essentiellement technique (lié à la façon dont ZFC envisage les ensembles). Intuitivement, si ZFC fabrique les entiers naturels, il devrait être permis d'envisager leur totalité, ce qui est précisément ω.

• Les ordinateurs (machines de Turing) ordinaires peuvent tout à fait manipuler ω au sens de faire des calculs avec : par exemple, l'ensemble des ordinaux <ε₀ (ou même bien au-delà) est tout à fait susceptible de représentation informatique (on peut faire des additions, multiplications, puissances, comparaisons, etc., dessus), ce n'est même pas spécialement difficile. En ce sens, la limite de puissance des ordinateurs est vraiment fournie par ω₁^CK et pas par ω.

Je te laisse décider ce que tu préfères…

ooten (2017-08-20T19:31:18Z)

@Ruxor : je ne suis pas d'accord car dans les langages (2) et (3), tu présupposes l'existence de ω. Moi ma question est : peut-il exister une procédure algorithmique dans le sens d'aujourd'hui ou une autre à trouver qui permettrait de "calculer" ω à partir des entiers naturels ? Bon ma question est sûrement farfelue et mal formulée mais pourtant notre cerveau comprends très bien ce qu'est ω mais la machine ne sais pas le calculer ce qui est intuitivement évident.

Ruxor (2017-08-20T19:09:05Z)

@ooten: Je réponds à cette question dans mon post. 😉

Dans les langages (0) et (1), on ne peut pas générer ω (ni aucun ordinal infini) à partir de rien. Dans les langages (2) et (3), on peut le faire, à condition que l'ordinal de boucle soit >ω, et le code n'est pas compliqué :

function compute_ω() // Return the ordinal ω
    uloop (x) {
        if (1+x == x)  // Return the first x such that 1+x == x
            return x;

ooten (2017-08-20T17:40:19Z)

Bon post ! Mais la question que je me pose est la suivante : est-ce que la machine de Turing peut générer ω (que cela ne soit pas un symbole de sa grammaire définie à priori) à mon avis non mais peut être que je me trompe ou peut-il exister une machine définie formellement qui pourrait en faire ainsi ?

Ruxor (2017-08-20T14:29:03Z)

@Bob: Effectivement, c'est aussi ce que je trouve fascinant ici, d'avoir des choses infinies qui se comportent comme si elles étaient finies. Historiquement, la α-calculabilité a d'abord été introduite pour ω₁^CK par Kreisel sous le nom de « métarécursion », et un point clé pour cela était de considérer les parties hyperarithmétiques de ℕ comme « métafinies », c'est-à-dire de faire comme si elles étaient finies : c'est vraiment cette idée-là qui a débloqué toute la théorie. (Le cas admissible général a ensuite été développé par Kripke et Platek indépendamment, l'un par une approche plus récursive et l'autre par une approche plus ensembliste.)

Pour ce qui est du codage, c'est le « codage de Gödel », qui est standard sur les ordinaux parce qu'il a pas mal de propriétés sympathiques. Sur les entiers naturels, il s'écrit avec des formules pas trop pénibles, donc je l'ai présenté comme ça.

Bob (2017-08-20T13:38:58Z)

Les ordinaux sont vraiment fascinants dans ce que je perçois comme leur double nature, infinie mais aussi "pratiquement" finie (propriété des suites strictement décroissantes).

Ce petit traité de programmation transfinie rend bien cela. Par exemple le fait qu'une fonction du programme (0) termine toujours est une très bonne illustration concrète de cette double nature, et ne laisse pas de me surprendre (je le vois, mais ne le crois pas).

Remarque orthogonale : je ne connaissais pas le codage des paires d'entiers a#b, est-ce un codage courant ?

Ruxor (2017-08-19T23:16:04Z)

@jonas: Ah yes, of course, this was silly. I rewrote this differently: I hope it's correct now. I should probably have used a loop syntax that allows a part which is executed if and only if the loop is not prematurely interrupted: then a uloop would just be a loop with a "do_not_terminate()" for that part.

(I also fixed the syntax mistake.)

jonas (2017-08-19T22:36:03Z)

> La définition est très simple : uloop (a) s est équivalent à { loop (a<λ) s; do_not_terminate(); } où do_not_terminate() est supposé être une fonction qui ne fait rien et ne termine pas

I think that definition is wrong. A break; statement in s is supposed to be able to make the loop terminating, but that definition makes it look like you can only terminate the loop with a more distant break or return.

I think this phrase has a syntax error: “Il que le point suivant est notable : ”

Ruxor (2017-08-19T17:46:11Z)

@« @/2 »: Merci de confirmer que je n'ai pas complètement fumé. Ça doit quand même être un terme très marginal parce que je ne trouve aucune référence dans un texte d'exposition quelconque, ni aucune mention en lien avec des termes comme « primitive recursive » ou « Kleene T ».

@Laurent Claessens: J'ai corrigé le lapsus. Et je suis content d'apprendre que cette entrée n'est, au moins, pas complètement illisible.

Laurent Claessens (2017-08-19T15:27:36Z)

Petite coquille dans la définitions de «calculable avec paramètres». La première est *sans* paramètres.

En tout cas merci pour cet article qui est dorénavant mon préféré de ce blog.

@/2 (2017-08-19T14:51:02Z)

Pour ton problème d'essence, il me semble que « fuel » est de fait relativement fréquent en preuve formelle.

<URL:"fuel"+coq+filetype:v >

You can post a comment using the following fields:
Name or nick (mandatory):
Web site URL (optional):
Email address (optional, will not appear):
Identifier phrase (optional, see below):
Attempt to remember the values above?
The comment itself (mandatory):

Optional message for moderator (hidden to others):

Spam protection: please enter below the following signs in reverse order: 8f5418

Recent comments