[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: (Not-so) hypothetical question: What to do about NULs?
From: |
Steffen Nurpmeso |
Subject: |
Re: (Not-so) hypothetical question: What to do about NULs? |
Date: |
Sun, 19 Feb 2023 01:48:10 +0100 |
User-agent: |
s-nail v14.9.24-411-g8db62d75cb |
Ken Hornstein wrote in
<20230219001921.597AD1E0839@pb-smtp20.pobox.com>:
...
|- mutt
...
|[.]Internally mutt does
|have an idea if the content contains a NUL (the CONTENT structure contains
|a 'nulbin' member which contains the number of NUL bytes), but it's not
|clear to me what happens when a NUL is encountered.
Seems to me this is classifcation of attachment data, which will
end up as octet-stream in that case.
For S-nail we more or less do what Heirloom mailx has done.
For classification purposes we switch to octet-stream.
For display purposes we happily display it after passing it
through some kind of makeprint.
isuni = ((n_psonce & n_PSO_UNICODE) != 0);
...
if(!iswprint(wc) && wc != '\n' /*&& wc != '\r' && wc != '\b'*/ &&
wc != '\t'){
if ((wc & ~S(wchar_t,037)) == 0)
wc = isuni ? 0x2400 | wc : '?';
else if(wc == 0177)
wc = isuni ? 0x2421 : '?';
else
wc = isuni ? 0x2426 : '?';
}else if(isuni){ /* TODO ctext */
/* Need to filter out L-TO-R and R-TO-R marks TODO ctext */
if(wc == 0x200E || wc == 0x200F || (wc >= 0x202A && wc <= 0x202E))
continue;
/* And some zero-width messes */
if(wc == 0x00AD || (wc >= 0x200B && wc <= 0x200D))
continue;
/* Oh about the ISO C wide character interfaces, baby! */
if(wc == 0xFEFF)
continue;
}
Or, without mb* and wc* sausage,
{
int c;
while(inp < maxp){
c = *inp++ & 0377;
if(!su_cs_is_print(c) &&
c != '\n' && c != '\r' && c != '\b' && c != '\t')
c = '?';
*outp++ = c;
}
out->l = in->l;
}
This is even a degression against Heirloom mailx that Jörg
Schilling was very dissatisfied about, as the above only handles
ASCII printable regardless of the locale. (My plan was to write
a CText library for Unicode handling, and it was quite progressed
with only about two months until decomposition and normalization
were implemented (Christmas 2014), when something very bad
happened. Maybe i will do it someday. Or simply do what OpenBSD
does and use perl's fantastic Unicode support to generate some
tables.)
The implementation is total crap. (longjmp codebase, data leaks,
blocking I/O, all that (it was).) All of these (mailbox read,
content-transfer decoding, character set conversion, .. display
preparation) should be "filters" with input and output plugged
together, with internal buffers as necessary. That is the v15
MIME and I/O layer rewrite that is not happening for nine years.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)