Comments on A Unicode-obfuscated programming language proposal

Ruxor (2019-11-20T15:10:01Z)

@Salpynx: I admit I had completely forgotten about this language. Congratulations on doing something with it, and you're welcome to take up ownership if you feel like developing it further.

(Hmmm… The Wikipedia page about me as the initiator of Unlambda was removed for lack of notoriety. But maybe if one esolang isn't enough, two will do? 😇)

Salpynx (2019-11-19T05:08:15Z)

Bonjour!

Some time ago I put together what I am calling a "compiler toolchain" named Ю⎔⎔ (Юᓂ곧⎔ SOFTWARE-FUNCTION SOFTWARE-FUNCTIONER) for this language which can compile the provided examples, and I took the liberty of extending the spec a bit further, hopefully in the same spirit. https://github.com/hornc/- <-- note that github doesn't support Unicode repo names.

The examples from these posts and a couple I created myslef are probably the most interesting to look at: https://github.com/hornc/-/tree/master/examples

If I remember correctly, one of the examples is 99 bottles of beer in (very sketchily) Egyptian hieroglyphics.

I had plans of taking it further and was being serious about the various keywords in the various languages and scripts. I think I have more notes about adding more Norse. It was a fun exercise. My code is deliberately a bit silly, and tries to exclusively (over)use sed, but it is meant to compile and create usable binaries (at least under Linux).

It was a fun exercise to follow this through, and I agree that the original point of surfacing Unicode issues with this is actually valuable, and working with Unicode and code should not be so difficult. I think things have improved slightly since this post was first written. Anyway, thanks for the post! I thought I should let you, the creator, and anyone else who is interested, know that this thing exists. Feel free to try it out, and if there is interest, I'm prepared to accept PRs or suggestions.

SB (2016-04-15T18:25:39Z)

@Hélène: https://web.archive.org/web/20080414231249/http://lorien.sdsu.edu/~carroll/shrub.html

Ruxor (2016-04-14T14:09:10Z)

Je pense que c'était une référence à ceci : <URL: http://www.youtube.com/watch?v=zIV4poUZAQo >.

Hélène (2016-04-14T06:35:59Z)

> it says NI and GOD at once

Le lien est complètement cassé et une recherche Google ne donne rien de pertinent. Peux-tu expliquer de quoi il s'agissait ? (si tu t'en souviens…)

Fork (2011-08-01T15:28:06Z)

Maman Oo

bi (2005-11-07T11:34:51Z)

R, phi, Ruxor: Mongolian letters have initial, medial, and final forms, like in Arabic ( http://www.omniglot.com/writing/arabic.htm ). The Unicode charts list only one form for each letter, usually the initial form. If I remember correctly, the Code2000 font also includes only 1 form for each letter. The MonTeX package does generate the correct forms and even the correct (vertical) layout (!), but it works only with eLaTeX it seems.

jonas (2005-09-06T10:04:14Z)

Let me argue with your statement that most programming languages do not allow non-ASCII characters.

Java and Perl definitely allows arbitary non-ASCII characters in its identifiers, even though people rarely use this feature. In Perl, you can declare the encoding of the source with the encoding pragma (the default is utf-8), so that you can write accented characters in your native encoding.

C++ also has some support for non-ASCII characters in identifiers, although I don't really know how.

AFAIK, Visual Basic allows non-ASCII characters in identifiers too, but I'm not sure how.

Non-english versions of MS Excel (or at least the Hungarian version) has the names of most builtin functions translated, so they include non-ASCII characters too. User-defined identifiers can also contain (certain) non-ASCII chars.

Perl6 will have at least three non-ASCII punctation characters (0xAB, 0xBB, 0xA5) which have built-in meaning, although there are ASCII-only constructs with the same function as these.

(I personally don't like the way these last two languages use non-ASCII characters.)

Ewww (2005-09-05T15:43:02Z)

My mind is in a world of hurt!

Muriel (2004-12-12T09:18:24Z)

Juste pour l'auteur :
c'est amusant, la variable temporaire "無 "
m'apparaît comme un caractère chinois signifiant
"il n'y a pas" "ne pas". C'est le seul endroit du texte qui fait sens pour moi, ce caractère chinois.

phi (2004-12-05T15:39:29Z)

one more page full of stuff
http://www.omniglot.com/writing/mongolian.htm
It's very adequate for other languages as well.

phi (2004-12-04T23:51:32Z)

btw, see also http://www.babelstone.co.uk/Scripts/Mongolian.html
One should also define the corresponding characters for hexadecimal digits whose value is greater than or equal to the normal number of our fingers.

R (2004-12-04T20:03:24Z)

There actually is a free mongolian font available; it is in Type 1 format and is used for the support of mongolian in LaTeX (which was achieved by the quite impressive Oliver Corff, a researcher in linguistics now working at "Freie Universitaet Berlin"). Quite unfortunately, it is not Unicode-compliant, but could be converted (something I have already started to do a while ago, actually).

Anyway, you may find it at
<URL: ftp://ftp.ctan.org/tex-archive/language/mongolian/montex/contrib/montex.type1/bicig/>
(try bcghsm.pfb, for example) and its various mirrors all over the world (see http://www.ctan.org/ for a complete list).

And by the way, I will remember to try and avoid speaking on specific subjects at dinner from now on ;-)

bort (2004-12-04T15:24:50Z)

To a certain extent, the relatively well-known paper "Generalizing Overloading for C++2000" by Bjarne Stroustrup uses some of the same ideas as your proposed language. It doesn't mandate non-ASCII identifiers, although it allows them.

<URL: http://www.research.att.com/~bs/whitespace98.pdf>

phi (2004-12-04T13:12:40Z)

The true type font Code2000 does have Mongolian characters, from U+1800 : MONGOLIAN BIRGA to U+18A9 : MONGOLIAN LETTER ALI GALI DAGALGA. And it's donation-ware.

Ruxor (2004-12-04T13:00:30Z)

kox → I'm afraid I don't know of any Mongolian fonts that would be freely available. (Besides, there's the annoying complication that it's supposed to be vertically written, and only CSS3 handles this properly, and there exist no full implementations of CSS3 to date, only very partial ones.) If you merely wish to know what they look like, you can go to the Unicode web site and download the PDF code chart for Mongolian (<URL: http://www.unicode.org/charts/PDF/U1800.pdf >); and if you're a PDF guru you can actually extract the vector font which is embedded in the PDF file (doing so is illegal, of course).

phi (2004-12-04T12:47:03Z)

Ah non, alors, arrêtons de faire de la mauvaise vulgarisation, nos lecteurs ont le droit de savoir, et de tout savoir.

Pour les variables, on peut utiliser l'alphabet latin, à condition bien sûr d'employer la série U+1D434 : MATHEMATICAL ITALIC CAPITAL A du plan 01. Pour les indices il suffit d'utiliser U+2080 : SUBSCRIPT ZERO.

Pour les nombres, la série U+1D7CE : MATHEMATICAL BOLD DIGIT ZERO s'impose pour toutes les valeurs représentées exactement. Pour les approximartions en virgule flottante, il vaut mieux employer U+1D7E2 : MATHEMATICAL SANS-SERIF DIGIT ZERO. Pour les nombres de taille fixe (entiers 32 bits par exemple) il faut utiliser U+1D7E2 : MATHEMATICAL SANS-SERIF DIGIT ZERO. Grâce à ce dispositiof, vous n'avez plus à déclarer p&lablement le type entier, float, ou real de vos constantes numériques.

Ah, dernier détail: chaque ligne de programme doit porter un no de ligne comme en BASIC, lequel no de ligne doit être écrit avec U+FF10 : FULLWIDTH DIGIT ZERO. Par ailleurs, vous avez droit à 10 labels de gotos, nommés U+E0030 : TAG DIGIT ZERO.

Voilà, j'espère avoir bien clarifié l'utilisation des chiffres. Je vous souhaite de longues heures de programmation heureuse.

kox (2004-12-04T09:31:29Z)

And if I haven't the mongolian digits, how can I get them ? At least under firefox, because I want to see what they look like.

bidibulle (2004-12-04T09:30:52Z)

Et le Brainfuck???

Ruxor (2004-12-04T01:14:21Z)

Ouarf…

Joël (2004-12-04T01:01:41Z)

U+2302 HOUSE / U+00A0 NO-BREAK SPACE / U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK / U+1D157 MUSICAL SYMBOL VOID NOTEHEAD / U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
U+2639 WHITE FROWNING FACE
U+043F CYRILLIC SMALL LETTER PE / U+0438 CYRILLIC SMALL LETTER I / U+0441 CYRILLIC SMALL LETTER ES / U+0430 CYRILLIC SMALL LETTER A / U+0442 CYRILLIC SMALL LETTER TE / U+044C CYRILLIC SMALL LETTER SOFT SIGN / U+00A0 NO-BREAK SPACE / U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK / U+201C LEFT DOUBLE QUOTATION MARK / U+261E WHITE RIGHT POINTING INDEX / U+2420 SYMBOL FOR SPACE / U+2642 MALE SIGN / U+2420 SYMBOL FOR SPACE / U+24D4 CIRCLED LATIN SMALL LETTER E / U+24E2 CIRCLED LATIN SMALL LETTER S / U+24E3 CIRCLED LATIN SMALL LETTER T / U+2420 SYMBOL FOR SPACE / U+24BB CIRCLED LATIN CAPITAL LETTER F / U+24C4 CIRCLED LATIN CAPITAL LETTER O / U+24CA CIRCLED LATIN CAPITAL LETTER U / U+0589 ARMENIAN FULL STOP / U+2424 SYMBOL FOR NEWLINE / U+201D RIGHT DOUBLE QUOTATION MARK / U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK / U+0964 DEVANAGARI DANDA
U+263A WHITE SMILING FACE