Comments on Martine fait encore des bêtises avec UTF-8

jonas (2016-08-29T20:04:16Z)

By the way, there's a worse but much rarer encoding problem that is sort of the opposite of the double-utf8 problem you show here. You could call it negative-utf8. It results when some software encodes your text as an iso-8859-1-derived byte encoding, then tries to decode it as utf-8. In that case, the accented letters come out as question marks, but they also swallow the next two or three characters of the text. This makes the resulting garbage much more difficult to decode than just double-utf8 would.

Ruxor (2012-02-15T19:18:01Z)

jonas (2012-02-15T09:52:02Z)

In some clinics, I still routinely receive medical documents that are misencoded. Luckily, it's not as bad as here: it's not utf-8 versus a byte encoding, but only two 437-derived byte encodings, thus most of the accented letters (namely áéíóöúü and perhaps ÖÜ) come out right, only a few of them get mixed up. This of course happens with old-style line printers. These were very likely technologically capable of printing all letters correctly, but the developers apparently just don't bother.


You can post a comment using the following fields:
Name or nick (mandatory):
Web site URL (optional):
Email address (optional, will not appear):
Identifier phrase (optional, see below):
Attempt to remember the values above?
The comment itself (mandatory):

Optional message for moderator (hidden to others):

Spam protection: please enter below the following signs in reverse order: bb48c6


Recent comments