[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] pan reformatting my posts
From: |
Duncan |
Subject: |
Re: [Pan-users] pan reformatting my posts |
Date: |
Mon, 21 Oct 2024 03:33:26 -0000 (UTC) |
User-agent: |
Pan/0.160 (Toresk; fa1e697052a6485cde62654cfa15e55c318e51a9) |
David Chmelik posted on Sun, 20 Oct 2024 11:31:46 -0000 (UTC) a
excerpted:
> Okay; I've read Duncan's explanation, but dislike unnecessary
> 'newlines'.
So that explanation described the practical situation, but skipped over
the more technical RFC[1] standards references and the history behind why
the behavior is what it is, which should help explain your "unnecessary
newlines". Additionally, I explained the reader-side wrap toggle but
forgot entirely the poster-side option. Since your post gives me the
opportunity to revisit and I have the time this afternoon/evening...
Most of the foundational RFCs originated in the 1970s and 80s, many based
on even earlier ad-hoc private network implementations and early RFCs,
often from before the network inter-operation that defines the INTERnet
became a thing, with those efforts at inter-operation forcing the
standardization that the RFCs defined.
Back then, displays/monitors were text-based, with hardware-defined lines
commonly 40 or 80 characters wide[2].
Thus the extremely common in the era but now legacy 80-character per-line
limit, including the line-terminating two-character CRLF sequence so in
practice it was 78 characters. That 78 displayed-character max-practical
limit was in turn implemented as a 72-character nominal line length,
allowing for a few levels of quoting before the 78 character hard-limit
was exceeded. Of course that was if you were lucky enough to have an
expensive monitor (or dot-matrix printer, many computers of the era had no
monitor and log-printed output) that didn't force breaking those 80-
character lines into two 40-character lines!
Of course in context, this was also the era of 300-baud acoustically-
coupled modems (if you were lucky!), basically 300 characters (including
error correction) per second if the line quality was perfect, meaning over
normal-quality connections you could watch the characters as they were
downloaded and drawn on-screen or printed a line or even (for monitors,
log-printers were line-oriented) a character at a time in real-time!! ...
So while the original internet message RFCs defined a MUST-level mandatory
limit up to 1000-character lines (998 plus terminating CRLF), they
/recommended/ (in RFC language SHOULD, as opposed to the 998/1000 MUST
limit above) sticking within the 80-character often display/print-
hardware-defined limit, which as explained, ended up being 72-characters
per line to allow for some layers of nested quoting before wrapping.
Meanwhile, exceed that 80 characters with too deep a quote or with
messages trying to squeeze in a few extra characters (in practice, most
often both, say a 74 or 76 character original content length hitting the
78-plus-terminating-CRLF limit with fewer levels of quoting, exacerbated
by clients that inserted a space between nested quote-level indicators
thus halving the allowed nested quote level), and you hit the dreaded
"jaggies!" -- quotes where full-length lines alternated with short lines
because the line-length exceeded the 78/80-char SHOULD so the line was
split!
That 80-char hard-limit of course long ago disappeared, but it is within
the context of how various individual internet messaging applications
differently implemented updated policies and how strictly they continued
to adhere to that 80-char-limit SHOULD, that we come to our current line-
length and wrapping implementation discussion.
To avoid "the jaggies", particularly as displays improved and were no
longer hard-limited to 80 character line lengths, various implementations
used different strategies. As it happens, there is an RFC describing an
update to the earlier RFC-standards in the context of the MIME RFCs[3],
that defines a new format=flowed header, that in effect allows dynamic-
rewrapping of lines within paragraphs, with strict "hard wrapping" the
fallback if it wasn't specified. Unfortunately, the format=flowed RFC was
late to the party, with various implementations already coping with the
problem in their own way, and it never got the necessary traction to
become a near-universal "agreed common standard" implementation. (I don't
believe I ever bothered to check whether it actually formally graduated
from "RFC" to "Standard" level; I assume not given the still-per-
implementation differing behavior all these decades later.)
*NOW* we have the necessary historical context to understand pan's line-
wrapping behavior! =:^)
As I mentioned, format=flowed never gained traction, because most
implementations including pan already cope with the problem in their own
way.
For the pan line-wrapping implementation, as described in the previous
explanation, this means *TWO* options, one for composition, which I
honestly entirely forgot about in my previous explanation, plus the one
for reading which I explained.
For composition/posting, pan optionally auto-wraps during the composition
process, inserting hard CRLF line termination when manually added, at
send, or when the "wrap" button/option is toggled.
What's interesting/nice about pan's composition-mode wrap button/option is
that it on-demand wraps what's already there. This allows (forces?) a
strategy where you ignore the wrap and let it automatically soft-wrap when
composing the "prose" of mixed-content, hit "wrap" to force it to hard-
wrap what's there ("setting" the existing auto-wraps, but also rewrapping
any short/long lines), and only /then/ insert the hard-wrapped content,
say by pasting it in, such that any preexisting new-lines in the pasted
content will be retained as will any manual new-lines you enter.
Don't rewrap after inserting your own hard-wraps, however, lest your just-
inserted hard-wrapped content be rewrapped along with everything else and
you have to either delete/reinsert (if pasted in) or manually edit to
correct the problem.
For reading downloaded posts, we have the previously explained dynamic-
wrap toggle, best used with the hotkey. In dynamic-wrap mode it ignores
single line-termination CRLFs, combining and rewrapping while
automatically dealing with quote indicators in combined lines. This works
well to eliminate "the jaggies" but is frustrating for single-spaced lists
and code that needs literal as-posted line handling because it dynamic-
rewraps them too.
As-posted mode works where literal as-posted lines need retained,
generally single-spaced lists and code, but also ASCII-art (which also
needs mono-spaced fonts, the reason pan has that option too), but not so
well for "the jaggies" where the poster's client hard-wrapped too-long
lines at the (normally) 72 or 80 char limit, resulting in alternating
long/short lines if not rewrapped. It also has problems with
format=flowed posts, either explicit (with the RFC-specified header) or
implicit (without that header, just using the full 998/1000-char MUST
limit or simply lines of different length than your display window).
And as mentioned, the /real/ challenge is posts that contain both types of
content, unwrapped "prose" lines along with to-be-displayed-as-posted
content such as lists, code or ASCII-art. It is for these posts that the
hotkey really comes in handy, allowing the quickest and most convenient
toggling between displayed wrap modes depending on whether you're reading
the prose or the literal-line bits.
The key to keep in mind when posting is that due to all this "messy"
history, pan-specific mode-toggling behavior aside, not all clients will
present your posted content in the same way. As a poster you can't really
do anything about mono-space vs. variable-space font choices on the
receiving client except explicitly saying "best when viewed with monospace
fonts" or similar when posting ASCII-art, but you CAN double-line-space
your lists and similar as-posted-line-oriented content if desired,
effectively presenting it as single-line paragraphs, which forces most
clients (including pan) to present it with lines as-posted, regardless of
what sort of line-wrap solution they've otherwise implemented for their
display.
---
[1] RFC: Request For Comments. These are the formal (or occasionally
less formal, there's a(n in)famous April Fools one that describes an
implementation of IP/internet-protocol over carrier pigeon! =:^) documents
that describe the technical foundations of the internet and its various
protocols, including a decent number of RFCs describing the internet
message format we're discussing here, which is common to both internet
news and email, thus explaining why many clients that work with one work
with both -- once you've written the code to properly deal with one you're
most of the way to dealing with both. RFCs are assigned numbers by which
they are commonly referenced, and later become Standards, which are also
numbered. But in practice an RFC doesn't normally become a full standard
until after the fact, after there are multiple implementations and it's in
widespread use, so the RFC number is the much more commonly known
reference, with the standard number almost a historical footnote denoting
completion of the process after everyone's adopted it already.
https://en.wikipedia.org/wiki/Request_for_Comments
[2] 40/80 characters: Note that the origin for this hardware limit was
even earlier, based on 80-column Hollerith punch-cards standardized by IBM
in 1928, with other (non-80-column, including 24- and 40-column) formats
going back to Hollerith's 1889 patents! That in turn has roots going back
at least as far as the punched-paper-tape loom of 1725! An example of the
process of iterative-invention building upon earlier invention. But by
the same token, those earlier inventions can constrain newer ones too,
thus the fact that we're still dealing with the 80-char-line legacy, which
remains encoded in the RFCs that define the common internet message format
used by both smtp/email and nntp/news.
https://en.wikipedia.org/wiki/Punched_card
[3] MIME: Multi-purpose Internet Mail Extension. These RFCs did in fact
become standards and dated from the 90s. They defined extensions to the
original RFC internet message definition, remaining compatible with it but
standardizing specific header and "message part" definitions for the
purpose of standardizing attachments, allowing separate plain-text and
HTML format message parts, etc. Among other things the MIME RFCs defined
the MIME-type header and enumerated some of the basic MIME types, with the
MIME-type spec then repurposed for various other things including HTTP
MIME-types and to form the basis for the Unix/Linux local file-type
handling used to this day.
https://en.wikipedia.org/wiki/MIME
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman