David Madore's WebLog: A Unicode-obfuscated programming language proposal

I'm feeling geeky today, so here's a very geeky post.

People have invented vast repertoires of obfuscated programming languages (one of my favorite is Shakespeare), but none, so far, seems to make use of Unicode's great resources for obfuscation (except perhaps, with a remarkable amount of foresight, the infamous APL). So here are a few suggestions for a language that I think someone (but not me!) should have the patience to implement. Let's call it Юᓂ곧⎔ (U+042E CYRILLIC CAPITAL LETTER YU / U+14C2 CANADIAN SYLLABICS NI / U+ACE7 HANGUL SYLLABLE GOD / U+2394 SOFTWARE-FUNCTION SYMBOL) so (1) nobody will know how to pronounce it (except simply Unicode, which is confusing), (2) nobody will have the fonts to write it properly because nobody ever has a Canadian unified aboriginal syllabics font and a Hangul font and a font with this strange SOFTWARE-FUNCTION SYMBOL (even Microsoft's supposedly all-inclusive Arial Unicode font does not recognize that symbol), (3) it says NI and GOD at once, (4) it's another of these obnoxious names which includes symbols (like Divx;-)), and (5) I'm sure nobody has ever used that particular combination of Unicode characters before me (and Google agrees, of course).

Now most programming languages up to quite recently had the rule that only ASCII characters are allowed in source code (they will give you some stupid error like illegal character if you try). Юᓂ곧⎔, on the contrary, does not allow ASCII characters in source files (except for line feed). For example, instead of using the * (U+002A ASTERISK) character to denote multiplication as is done in C, one uses × (U+00D7 MULTIPLICATION SIGN). And of course, one does not write digits with the usual “European/Arabic” digits (they are in ASCII), rather, one uses Mongolian digits, so, for example, the Юᓂ곧⎔ code to store the product of 42 and 1729 in the temporary variable 無 would be: 無 ← ᠔᠒×᠑᠗᠒᠙।. More generally, here are the basic syntax rules of Юᓂ곧⎔:

Generic whitespace in the program must consist of U+00A0 NO-BREAK SPACE characters.
Instructions are separated by U+0964 DEVANAGARI DANDA. Code blocks start with U+2639 WHITE FROWNING FACE and end with U+263A WHITE SMILING FACE.
Function names must be written in Cyrillic, except for the reserved “main” function, which is called U+2302 HOUSE.
Function arguments start with U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK and end with U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK; they are separated with U+00A6 BROKEN BAR.
Global variable names must be written in the Hebrew alphabet.
Type names must be written in Ethiopic, except for certain reserved names like U+2124 DOUBLE-STRUCK CAPITAL Z for integers.
Local/temporary variables names are only one character which must be a CJK unified ideograph.
Function arguments are written in hiragana when the function is declared, and the corresponding name in katakana is used within the function definition.
Preprocessor directives are written in Runic. Comments start with U+275D HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT and end with U+275E HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT.
Decimal digits are Mongolian. For octal digits, one uses Yijing trigram symbols: U+2630 TRIGRAM FOR HEAVEN for the octal digit 0, U+2631 TRIGRAM FOR LAKE for the octal digit 1 and so on. For hexadecimal, one enters digits two at a time by using one of the 256 Braille patterns.
Character strings are written between U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK, which can contain arbitrary Unicode characters, except for a few special characters (such as the quotes themselves) which must be prefixed with U+2041 CARET INSERTION POINT. One can also enter characters numerically (in decimal, octal or hexadecimal) by prefixing and postfixing the code with U+204C BLACK LEFTWARDS BULLET and U+204D BLACK RIGHTWARDS BULLET respectively: this is necessary to enter ASCII characters in a string (since ASCII can never be part of Юᓂ곧⎔ source code, even between quotes). For a single character (rather than a character string), use U+2018 LEFT SINGLE QUOTATION MARK and U+2019 RIGHT SINGLE QUOTATION MARK instead of double quotes.
The assignment operator is U+2190 LEFTWARDS ARROW. One can also reverse assignment and use U+2192 RIGHTWARDS ARROW. Addition, subtraction, multiplication and division are written U+2A22 PLUS SIGN WITH SMALL CIRCLE ABOVE, U+2212 MINUS SIGN, U+00D7 MULTIPLICATION SIGN and U+00F7 DIVISION SIGN (there are lots of other operators, too). The pointer to an object is obtained with U+261B BLACK RIGHT POINTING INDEX, and the pointed object by U+261A BLACK LEFT POINTING INDEX.

We conclude with the example of the usual Hello, world! program written in Юᓂ곧⎔:

⌂ «» ☹ писать «“⁌☱☱☰⁍⁌☱☴☵⁍⁌☱☵☴⁍⁌☱☵☴⁍⁌☱☵☷⁍⁌☰☵☴⁍⁌☰☴☰⁍⁌☱☶☷⁍⁌☱☵☷⁍⁌☱☶☲⁍⁌☱☵☴⁍⁌☱☴☴⁍⁌☰☴☱⁍␊”»। ☺