Re: [Pan-users] pan reformatting my posts

pan-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] pan reformatting my posts

From:	Duncan
Subject:	Re: [Pan-users] pan reformatting my posts
Date:	Mon, 21 Oct 2024 03:33:26 -0000 (UTC)
User-agent:	Pan/0.160 (Toresk; fa1e697052a6485cde62654cfa15e55c318e51a9)
David Chmelik posted on Sun, 20 Oct 2024 11:31:46 -0000 (UTC) a
excerpted:

> Okay; I've read Duncan's explanation, but dislike unnecessary
> 'newlines'.

So that explanation described the practical situation, but skipped over 
the more technical RFC[1] standards references and the history behind why 
the behavior is what it is, which should help explain your "unnecessary 
newlines".  Additionally, I explained the reader-side wrap toggle but 
forgot entirely the poster-side option.  Since your post gives me the 
opportunity to revisit and I have the time this afternoon/evening...

Most of the foundational RFCs originated in the 1970s and 80s, many based 
on even earlier ad-hoc private network implementations and early RFCs, 
often from before the network inter-operation that defines the INTERnet 
became a thing, with those efforts at inter-operation forcing the 
standardization that the RFCs defined.

Back then, displays/monitors were text-based, with hardware-defined lines 
commonly 40 or 80 characters wide[2].

Thus the extremely common in the era but now legacy 80-character per-line 
limit, including the line-terminating two-character CRLF sequence so in 
practice it was 78 characters.  That 78 displayed-character max-practical 
limit was in turn implemented as a 72-character nominal line length, 
allowing for a few levels of quoting before the 78 character hard-limit 
was exceeded.  Of course that was if you were lucky enough to have an 
expensive monitor (or dot-matrix printer, many computers of the era had no 
monitor and log-printed output) that didn't force breaking those 80-
character lines into two 40-character lines!

Of course in context, this was also the era of 300-baud acoustically-
coupled modems (if you were lucky!), basically 300 characters (including 
error correction) per second if the line quality was perfect, meaning over 
normal-quality connections you could watch the characters as they were 
downloaded and drawn on-screen or printed a line or even (for monitors, 
log-printers were line-oriented) a character at a time in real-time!! ...

So while the original internet message RFCs defined a MUST-level mandatory 
limit up to 1000-character lines (998 plus terminating CRLF), they 
/recommended/ (in RFC language SHOULD, as opposed to the 998/1000 MUST 
limit above) sticking within the 80-character often display/print-
hardware-defined limit, which as explained, ended up being 72-characters 
per line to allow for some layers of nested quoting before wrapping.

Meanwhile, exceed that 80 characters with too deep a quote or with 
messages trying to squeeze in a few extra characters (in practice, most 
often both, say a 74 or 76 character original content length hitting the 
78-plus-terminating-CRLF limit with fewer levels of quoting, exacerbated 
by clients that inserted a space between nested quote-level indicators 
thus halving the allowed nested quote level), and you hit the dreaded 
"jaggies!" -- quotes where full-length lines alternated with short lines 
because the line-length exceeded the 78/80-char SHOULD so the line was 
split!

That 80-char hard-limit of course long ago disappeared, but it is within 
the context of how various individual internet messaging applications 
differently implemented updated policies and how strictly they continued 
to adhere to that 80-char-limit SHOULD, that we come to our current line-
length and wrapping implementation discussion.

To avoid "the jaggies", particularly as displays improved and were no 
longer hard-limited to 80 character line lengths, various implementations 
used different strategies.  As it happens, there is an RFC describing an 
update to the earlier RFC-standards in the context of the MIME RFCs[3], 
that defines a new format=flowed header, that in effect allows dynamic-
rewrapping of lines within paragraphs, with strict "hard wrapping" the 
fallback if it wasn't specified.  Unfortunately, the format=flowed RFC was 
late to the party, with various implementations already coping with the 
problem in their own way, and it never got the necessary traction to 
become a near-universal "agreed common standard" implementation.  (I don't 
believe I ever bothered to check whether it actually formally graduated 
from "RFC" to "Standard" level; I assume not given the still-per-
implementation differing behavior all these decades later.)

*NOW* we have the necessary historical context to understand pan's line-
wrapping behavior! =:^)


As I mentioned, format=flowed never gained traction, because most 
implementations including pan already cope with the problem in their own 
way.

For the pan line-wrapping implementation, as described in the previous 
explanation, this means *TWO* options, one for composition, which I 
honestly entirely forgot about in my previous explanation, plus the one 
for reading which I explained.


For composition/posting, pan optionally auto-wraps during the composition 
process, inserting hard CRLF line termination when manually added, at 
send, or when the "wrap" button/option is toggled.

What's interesting/nice about pan's composition-mode wrap button/option is 
that it on-demand wraps what's already there.  This allows (forces?) a 
strategy where you ignore the wrap and let it automatically soft-wrap when 
composing the "prose" of mixed-content, hit "wrap" to force it to hard-
wrap what's there ("setting" the existing auto-wraps, but also rewrapping 
any short/long lines), and only /then/ insert the hard-wrapped content, 
say by pasting it in, such that any preexisting new-lines in the pasted 
content will be retained as will any manual new-lines you enter.

Don't rewrap after inserting your own hard-wraps, however, lest your just-
inserted hard-wrapped content be rewrapped along with everything else and 
you have to either delete/reinsert (if pasted in) or manually edit to 
correct the problem.


For reading downloaded posts, we have the previously explained dynamic-
wrap toggle, best used with the hotkey.  In dynamic-wrap mode it ignores 
single line-termination CRLFs, combining and rewrapping while 
automatically dealing with quote indicators in combined lines.  This works 
well to eliminate "the jaggies" but is frustrating for single-spaced lists 
and code that needs literal as-posted line handling because it dynamic-
rewraps them too.

As-posted mode works where literal as-posted lines need retained, 
generally single-spaced lists and code, but also ASCII-art (which also 
needs mono-spaced fonts, the reason pan has that option too), but not so 
well for "the jaggies" where the poster's client hard-wrapped too-long 
lines at the (normally) 72 or 80 char limit, resulting in alternating 
long/short lines if not rewrapped.  It also has problems with 
format=flowed posts, either explicit (with the RFC-specified header) or 
implicit (without that header, just using the full 998/1000-char MUST 
limit or simply lines of different length than your display window).

And as mentioned, the /real/ challenge is posts that contain both types of 
content, unwrapped "prose" lines along with to-be-displayed-as-posted 
content such as lists, code or ASCII-art.  It is for these posts that the 
hotkey really comes in handy, allowing the quickest and most convenient 
toggling between displayed wrap modes depending on whether you're reading 
the prose or the literal-line bits.


The key to keep in mind when posting is that due to all this "messy" 
history, pan-specific mode-toggling behavior aside, not all clients will 
present your posted content in the same way.  As a poster you can't really 
do anything about mono-space vs. variable-space font choices on the 
receiving client except explicitly saying "best when viewed with monospace 
fonts" or similar when posting ASCII-art, but you CAN double-line-space 
your lists and similar as-posted-line-oriented content if desired, 
effectively presenting it as single-line paragraphs, which forces most 
clients (including pan) to present it with lines as-posted, regardless of 
what sort of line-wrap solution they've otherwise implemented for their 
display.

---
[1] RFC:  Request For Comments.  These are the formal (or occasionally 
less formal, there's a(n in)famous April Fools one that describes an 
implementation of IP/internet-protocol over carrier pigeon! =:^) documents 
that describe the technical foundations of the internet and its various 
protocols, including a decent number of RFCs describing the internet 
message format we're discussing here, which is common to both internet 
news and email, thus explaining why many clients that work with one work 
with both -- once you've written the code to properly deal with one you're 
most of the way to dealing with both.  RFCs are assigned numbers by which 
they are commonly referenced, and later become Standards, which are also 
numbered.  But in practice an RFC doesn't normally become a full standard 
until after the fact, after there are multiple implementations and it's in 
widespread use, so the RFC number is the much more commonly known 
reference, with the standard number almost a historical footnote denoting 
completion of the process after everyone's adopted it already.

https://en.wikipedia.org/wiki/Request_for_Comments

[2] 40/80 characters:  Note that the origin for this hardware limit was 
even earlier, based on 80-column Hollerith punch-cards standardized by IBM 
in 1928, with other (non-80-column, including 24- and 40-column) formats 
going back to Hollerith's 1889 patents!  That in turn has roots going back 
at least as far as the punched-paper-tape loom of 1725!  An example of the 
process of iterative-invention building upon earlier invention.  But by 
the same token, those earlier inventions can constrain newer ones too, 
thus the fact that we're still dealing with the 80-char-line legacy, which 
remains encoded in the RFCs that define the common internet message format 
used by both smtp/email and nntp/news.

https://en.wikipedia.org/wiki/Punched_card

[3] MIME: Multi-purpose Internet Mail Extension.  These RFCs did in fact 
become standards and dated from the 90s. They defined extensions to the 
original RFC internet message definition, remaining compatible with it but 
standardizing specific header and "message part" definitions for the 
purpose of standardizing attachments, allowing separate plain-text and 
HTML format message parts, etc.  Among other things the MIME RFCs defined 
the MIME-type header and enumerated some of the basic MIME types, with the 
MIME-type spec then repurposed for various other things including HTTP 
MIME-types and to form the basis for the Unix/Linux local file-type 
handling used to this day.

https://en.wikipedia.org/wiki/MIME

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Pan-users] pan reformatting my posts, David Chmelik, 2024/10/20
- Re: [Pan-users] pan reformatting my posts, Duncan <=
Prev by Date: [Pan-users] [ANNOUNCE] Pan release 0.161
Next by Date: Re: [Pan-users] [ANNOUNCE] Pan release 0.161
Previous by thread: Re: [Pan-users] pan reformatting my posts
Next by thread: [Pan-users] [ANNOUNCE] Pan release 0.161
Index(es):
- Date
- Thread