[rescue] Looking for some Solaris patch clusters
Mouse
mouse at Rodents-Montreal.ORG
Tue Dec 6 06:57:47 EST 2022
>> For some reason "B"s are being added after specific characters /
>> places. B --B I suspect it's tab related.
> It's not tabs.
> This seems to be related to the mailing list. I say this because the
> message that I sent to the mailing list is different in my sent items
> than it is coming in from the mailing list.
> 8bdd2b278818a9e914cfded08ed701a7 10_recommended_cpu_20180111.zip
> > 8bdd2b278818a9e914cfded08ed701a7 10_recommended_cpu_20180111.zip
> Do either of those lines get a B added to the end of the MD5 sum?
Not in the copy I got.
I think this is a bad interaction of mail clients with the list.
This list, unusually for mailing lists, strips high bits, apparently
without paying any attention to the Content-* headers. When combined
with depressingly common mail client misbehaviour, various artifacts
result. Here's my analysis.
ASCII B is 0x42. If we postulate that that is the high-bit-stripped
version of something, putting back the high bit gives 0xc2, which is
the first octet of the UTF-8 representation of any Unicode codepoint
in the U+0080 to U+00bf range.
The following character in the list mail I saw was a space in some
cases, another B in others. The space is 0x20, which as 0xa0 is a
plausible second UTF-8 octet; with the 0xc2, it becoems UTF-8 for
U+00a0, which is non-break space.
I see a (comparative) lot of mail that appears to have converted the
first of two consecutive spaces into a non-break space; I speculate
that the intended original was MD5, two spaces, and filename (this
reinforced by seeing that form of it in an earlier mail). The
composing MUA then "helpfully" decided the user wanted the first of the
two spaces to be a non-break space, encoded it as UTF-8, and sent that,
resulting in it becoming "B " after the list stripped high bits, thus
effectively appending a B to the MD5.
The cases with two Bs I think arose because someone quoted the former
sort, B and all, and the MUA involved once again silently converted the
first of the two spaces to a non-break space, thus appending another B.
This "convert first of two spaces to a non-break space" is merely
annoying when it happens to user-typed text. It is defniitely broken
for it to be done to quoted text, because it results in quoted text
that misrepresents the pre-quoting text. (It also definitely should be
turned off when sending mail to this list, for exactly this reason.)
I thus think there are three things interacting here:
- MUA misbehaviour, silently converting spaces to non-break spaces
under certain circumstances;
- MUA insistence on use of UTF-8, even when (say) 8859-1 would do fine;
- List stripping high bits, ignoring Content-* headers.
The first one is definitely broken, especially when applied to quoted
text, as I mentioned above.
The second one is annoying, but not really the problem; using 8859-1
would actually make it worse, in that it would conceal the problem (the
non-break spaces would get turned back into single spaces by the
high-bit stripping, thus hiding the first misbehaviour).
The third is...I'm not sure what I think of it. It's well within a
list's purview to impose ASCII on all traffic, and, given the
retrocomputing nature of this list, this would be one of the first
lists I would expect it from. And I'm no fan of UTF-8. But it does
seem excessive to me to blindly strip all high bits; I'd prefer to see
the list reject mail that either has non-ASCII content or that is
marked as being non-ASCII (I'm not sure which I'd prefer). But the
list config is unlikely to change now, and I'd much rather keep what we
have than rock the boat and risk whoever's running things now getting
annoyed with us.
So, people, if you don't want to contribute to the problem, check your
MUAs, and reconfigure, fix, or replace them as necessary to stop
converting ordinary spaces to non-break spaces. The first place I see
such stuff in this thread is Dave McGuire quoting Grant Taylor quoting
Dave McGuire. I'm not sure whom I suspect most. The above would lead
me to suspect Grant Taylor's MUA, but the first list of MD5 sums I saw
came from Grant Taylor and did not get mangled, despite having doubled
spaces between MD5s and names, so I'd actually lcheck Dave McGuire's
first.
If someone wants to test it without bothering the list, you could try
sending me a note, but make sure you put two consecutive spaces in it
somewhere. I can then see whether one got converted to a non-break
space. Or, if you use an over-the-network mail client, you could send
yourself mail and snoop the network traffic to see what's actually
being sent.
Mouse
More information about the rescue
mailing list