Comments on Martine fait encore des bÃªtises avec UTF-8

jonas (2016-08-29T20:04:16Z)

By the way, there's a worse but much rarer encoding problem that is sort of the opposite of the double-utf8 problem you show here. You could call it negative-utf8. It results when some software encodes your text as an iso-8859-1-derived byte encoding, then tries to decode it as utf-8. In that case, the accented letters come out as question marks, but they also swallow the next two or three characters of the text. This makes the resulting garbage much more difficult to decode than just double-utf8 would.

Ruxor (2012-02-15T19:18:01Z)

<URL: http://en.wikipedia.org/wiki/File:Letter_to_Russia_with_krokozyabry.jpg > :-)

jonas (2012-02-15T09:52:02Z)

In some clinics, I still routinely receive medical documents that are misencoded. Luckily, it's not as bad as here: it's not utf-8 versus a byte encoding, but only two 437-derived byte encodings, thus most of the accented letters (namely áéíóöúü and perhaps ÖÜ) come out right, only a few of them get mixed up. This of course happens with old-style line printers. These were very likely technologically capable of printing all letters correctly, but the developers apparently just don't bother.

Comments will not appear until they have been reviewed by the moderator.

Deleting comments is not supported. Please think before you post.

HTML is not allowed in content or elsewhere (writing <foo> simply produces <foo> in the text).

To cite a URL, write <URL: http://somewhere.tld/ >, and it will be automatically made into a link. (Do not try any other way or it might count as an attempt to spam.)

Your email address will not appear to other readers. Use it if you wish to leave me (the moderator) a way of contacting you.

To leave a contact address for all readers, use the Web site address field (it can be a mailto: URI, e.g. mailto:my.email@somewhere.tld, if you do not have a genuine Web site).

You can use an optional (semi-secret) “identifier phrase” of your choice to avoid identity stealing: it will not appear except to the moderator, who can use it to detect obvious usurpations. Don't rely seriously on this, however.

Use the last field to enter any other kind of comments for the moderator's eyes only.