This issue made me angry on Monday the 4 of September, 2006. Other irritations may be found here . The most recent irritation may be found here
slrn is a nice terminal-based newsreader with lots of lovely scripting support that makes it wonderfully useful.
Usenet is an international medium in which people post using many different character sets.
Unicode is a character set that includes approximately every glyph known to man.
UTF-8 is a standard for encoding Unicode in 8-bit streams.
In order to display text on a screen usefully, two things must be known:
Once these are both known, converting from one to the other is easy. In cases where a one to one mapping is non-trivial (say, conversion from ä to 7-bit ascii), there are various representative ways to provide a meaningful output (say, a").
Now, 8-bit characters are not themselves displayable under utf-8 - high-bit characters signify a non-ascii unicode character, and thus an arbitrary Latin-1 character will end up as several bytes. Passing a raw Latin-1 character to a UTF-8 terminal will result in undefined behaviour, as in itself it is not a valid UTF-8 string (if you're really unlucky, it and the following characters will in fact be a valid UTF-8 string and things will be even more confused).
So, doing the following is almost certainly going to be wrong:
And yet slrn does all of these things.