[rescue] T5220 update
Jonathan Patschke
jp at celestrion.net
Tue Oct 31 12:09:56 CDT 2017
On Tue, 31 Oct 2017, Dave McGuire wrote:
> BTW, all of your messages are littered with random 'b' characters.
>
> Windows?
UTF-8 does that when you strip the high bit.
b^X = 0x62, 0x18
Add back the high bits and it's 0xE298, which is an invalid Unicode code
point (3-bit prefix, only two characters). Add a byte containing just the
high bit (as UTF-8 demands), and it's:
0xE2, 0x80, 0x98
which is Unicode code point U+2018, or "left quotation mark."
This is Unix/Plan9 all the way down. UTF-8 degrades poorly when you throw
away 12.5% of the data.
--
Jonathan Patschke
Austin, TX
USA
More information about the rescue
mailing list