Handling files with CRLF line ending

From: Dale R. Worley
Subject: Re: Handling files with CRLF line ending
Date: Tue, 06 Dec 2022 11:00:14 -0500

It seems to me that there's more going on than first meets the eye.

My understanding is that Posix's file open function allows specifying if
the file is text or binary, and in text mode, if the underlying system
natively uses CRLF for EOL, CRLF in the file is turned into LF for the
code in a transparent way.  And so I'd expect that Bash considers the
file that it is reading to execute to be text and Bash's command parser
wouldn't see CRs if it was running on a system that use CRLF on disk for

And conversely, if you use "echo" to write a line, it goes to stdout,
which presumably has been opened in text mode.

Generally, when a command has redirection, Bash doesn't have to think
about this, since Bash only opens an FD; it's the command that is going
to execute an fdopen() that wraps a Posix FILE* around the open FD, and
to do that, will specify the I/O mode as text or binary.

So far, everybody is happy -- things automatically work as intuition

The trouble happens when a Bash built-in command reads or writes an FD.
Then Bash needs to implicitly or explicitly handle the text/binary
decision, in parallel to when a C command starts up and the C startup
code does an fdopen() on FD 0 to create the FILE* "stdin".

Looking at the code of Bash 5.2 -- and I am no expert, and I didn't
study it deeply -- it looks like "readarray/mapfile"
(builtins/mapfile.def) uses "zgetline" (lib/sh/zgetline.c) to read input
rather than the underlying Posix implementation's fdopen().  And that
function's comment says:

/* Derived from GNU libc's getline.
   The behavior is almost the same as getline. See man getline.
   The differences are
        (1) using file descriptor instead of FILE *;
        (2) the order of arguments: the file descriptor comes first;
        (3) the addition of a fourth argument, DELIM; sets the delimiter to
            be something other than newline if desired.  If setting DELIM,
            the next argument should be 1; and
        (4) the addition of a fifth argument, UNBUFFERED_READ; this argument
            controls whether get_line uses buffering or not to get a byte data
            from FD. get_line uses zreadc if UNBUFFERED_READ is zero; and
            uses zread if UNBUFFERED_READ is non-zero.

   Returns number of bytes read or -1 on error. */

And zgetline() doesn't have a "mode" argument for setting the
text/binary mode.  (getline() doesn't have such an argument either, but
it takes a FILE*, not an FD.)


