bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] <<# indent-stripping heredoc


From: Martin D Kealey
Subject: Re: [PATCH 1/2] <<# indent-stripping heredoc
Date: Fri, 14 Jul 2023 17:44:35 +1000

On Fri, 14 Jul 2023 at 09:47, Grisha Levit <grishalevit@gmail.com> wrote:

> This patch implements the ksh93-style <<# redirection operator to enable
> indentatable heredocs.


On the whole I think this is great, and thankyou for working up the patch,
but I would like to offer some comments and suggestions:

Firstly, it's impossible to have the initial line of output indented. This
might not seem important, but it would make certain kinds of code
generation more awkward. Consider:

cat <<# EndOfHead
>     <html>
>      <body>
>       <table>
>     EndOfHead
> generator_thingy |
> while IFS= read -r
> do
>     cat <<# EndOfRow
>         <tr>
>           <td>
>             $REPLY
>           </td>
>         </tr>
>     EndOfRow
> done
> cat <<# EndOfTail
>
      </table>         <!-- this line won't be indented properly (and nor
> will the following lines -->
>      </body>
>     </html>
>     EndOfTail
>

(If anyone is about to suggest that HTML isn't space-sensitive, then
imagine this outputs YAML or Python instead.)

One option that some other languages use is to find the terminator, and
then use its indentation as the pattern to remove from the content lines.
The problem of course is that it would take a double run over the content,
but the benefit is that there'd only be one in-band signaling line instead
of two.

Secondly, the battle for 8-space tabs has been well and truly lost at this
point, so hard-coding that constant feels like it's likely to be a source
of errors. Thirdly, allowing lines to have less than the specified
indentation seems likely to be a net loss - worse maintenance, and no
visual improvement (except in the case where there is no fixed indentation
and it's just "remove all").

In order to be tab-agnostic, I can see two reasonable options:
1. remove only an exact match for the sequence of whitespace characters
that occurs in the indicator line
2. the same, but only accept tabs followed by spaces in the indicator line.
(A side benefit would be that "ordinary" indented heredocs can use the same
logic with T=INT_MAX and S=0.)

To aid error reporting, I think the terminator token should be identified
regardless of the combination of whitespace in its indentation, but if its
leading whitespace was not tabs-then-spaces, or doesn't match the indicator
line, then this should have the same consequences as "delimiter not found"
only with a better error message.

I wonder if this should be called "<<--" rather than "<<#" if it's not
(quite) compatible with what ksh does?

I will work up a modified version of the patch to implement this.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]