David Madore's WebLog: All human beings… in Babel

All human beings… in Babel

More Unicode fun: here is the first article of the Universal Declaration of Human Rights, probably one of the most translated pieces of text ever, in a number of languages. The High Commissioner's Web site, where I got the texts, has the full declaration in vastly more languages, but, unfortunately, in many cases—especially the most interesting cases as far as Unicode is concerned—, we are given either a PDF file or an image scan of a printed version. This site also displays the article in question (using Unicode) in a great number of languages, but somtimes differs from the High Commissioner's text (for example for Thai or Vietnamese).

The Thai language gave me some trouble, because the source text is badly printed and badly scanned and the Thai script is pretty complex. I solved the problem by typing not the article itself, but the declaration's title (it's shorter and printed bigger, so that was much easier), and then searching for that in Google until I found a page which contained the text I was looking for (a secondary problem is that Thai doesn't seem too keen on word breaks).

The Yi syllables were even worse: at first I thought I could do the typing myself, but even for a mere thirty-six characters, the effort of visually pattern matching each tiny symbol from the source text against Unicode's array of over one thousand syllables was too much for me. And to make matters worse, Google refuses to search for Yi text—fortunately, though, AllTheWeb will do it. And I was lucky to find what is probably the only sample of Unicode-encoded Yi text fragment on the Web: the Yi version of Wikipedia, with a single page on it, and a single sentence on that page, namely, the sentence I was trying to encode! Still, doubt remains, because two of the glyphs do not match what I decipher on the text from the High Commissioner's Web site (the Wiki page says ꊿꂷꃅꃧꆘꐥ, ꌅꅍꀂꏽꐯꒈꃅꐥꌐ. ꊿꊇꉪꍆꌋꆀꁨꉌꑌꐥ, ꄷꀋꁨꂛꂰꅫꃀꃅꐥꄡꑟ. and I believe I read ꊿꂷꃅꃧꐨꐥ, ꌅꅍꀂꏽꐯꒈꃅꐥꌐ. ꊿꊇꉪꍆꌋꆀꁨꉌꑌꐥ, ꄷꀋꁨꂛꊨꅫꃀꃅꐥꄡꑟ. on the printed text—of course, it's all Yi to me); I have no idea whether the author of the Wiki page is fluent in Yi and obviously got it right, or is just another geek like me with a little more patience for matching glyphs against a table. Well, there's perhaps one chance out of two that I got the Yi text right. But I know at least of one potential reader of this blog who might be able to help me.

I'm disappointed that the High Commissioner did not provide a Cherokee version of the declaration. True, far fewer people speak Cherokee than Yi, but there's a much more developed Cherokee Wikipedia (ᏫᎩᏇᏗᏯ) than the Yi version: it has a whopping twenty-eight articles.

One last comment concerns the Armenian script. I find that alphabet amazing, not only for its intrinsic beauty, but also because it dazzles me in a strange way: it looks vaguely like the Latin script, except of course that I can't make heads or tails of it (to me it looks something like: fununu uwunnnh uqunulu wqwnu—and so on), so it's kind of tantalizing, much like the Voynich manuscript defies imagination.

הבה נרדה ונבלה שם שפתם אשר לא ישמעו איש שפת רעהו

Many thanks to Φ for pointing out to me the existence of the Code2000 font, which is remarkably complete (before this I didn't have any font to display Yi syllables or Cherokee). Even though I dislike shareware, I believe I'll send this guy (I mean, the font's author) a check. Combine with Yudit for best results.

