help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: printf '%s\n' "$@" versus <<< redirection


From: Mike Jonkmans
Subject: Re: printf '%s\n' "$@" versus <<< redirection
Date: Tue, 21 Feb 2023 22:15:44 +0100

On Mon, Feb 20, 2023 at 08:11:52PM -0500, Greg Wooledge wrote:
> On Tue, Feb 21, 2023 at 12:44:02AM +0100, Mike Jonkmans wrote:
> > On Mon, Feb 20, 2023 at 10:40:13AM -0500, Chet Ramey wrote:
> > > On 2/18/23 12:45 PM, Mike Jonkmans wrote:
> > > 
> > > > > "If IFS is unset, or its value is exactly <space><tab><newline>, the 
> > > > > default, then sequences of <space>, <tab>, and <newline> at the 
> > > > > beginning and end of the results of the previous expansions are 
> > > > > ignored, and any sequence of IFS characters not at the beginning or 
> > > > > end serves to delimit words."
> > > > 
> > > > If IFS is unset then it is slightly ambiguous as to what the IFS 
> > > > characters are,
> > > > in the last part of this sentence.
> > > 
> > > If IFS is unset, word splitting and other uses behave as if it had the
> > > default value of space, tab, newline.
> > 
> > True.
> > But, if i have not overlooked this, it is not stated in the documentation.
> > 
> > Perhaps your text could be added to the description of IFS
> > in the 'Shell Variables' section.
> 
> You have to start with the POSIX documentation, because in most cases
> the bash documentation follows the POSIX wording, in whole or in part.
> 
> Field splitting is documented at
> <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05>
> and says, in part:
> 
>  1. If the value of IFS is a <space>, <tab>, and <newline>, or if it is
>     unset, any sequence of <space>, <tab>, or <newline> characters at
>     the beginning or end of the input shall be ignored and any sequence
>     of those characters within the input shall delimit a field.
> 
>  [...]
> 
>  3. a. IFS white space shall be ignored at the beginning and end of the input.
> 
> This is where the crazy wording comes from.  Instead of just saying
> "IFS whitespace is ignored at the beginning and end of the input" and
> "If IFS is unset, it behaves like <space><tab><newline>" it uses this
> redundant wording.
> 
> The wording implies that $' \t\n' is special and should somehow be
> treated differently from $'\t \n' or any other rearrangement when it
> comes to field splitting, but it's not.  It's just confusing.
Not really relevant here, but the first character in IFS is special in the
expansion of "$*". (I know you know)
> 
> All the shell documentation which starts from this definition is going
> to inherit that confusion.

Reading that POSIX section is confusing indeed.


Starting with the first paragraph:
``After expansion ... the shell shall scan the results of expansions and
substitutions that did not occur in double-quotes for field splitting
and multiple fields can result.''

- Should each expansion be scanned this way
  or just the result after all expansions?


``3. ... The term " IFS white space" is used to mean any sequence
(zero or more instances) of white-space characters that are in the IFS value
(for example, if IFS contains <space>/ <comma>/ <tab>, any sequence of
<space> and <tab> characters is considered IFS white space).''

- "zero or more instances" Why zero? That could be problematic.

- Missing is a definition of 'white-space' characters.
  These could be defined by either isspace(3) or isblank(3).
  Maybe even by their locale aware variants, isspace_l and isblank_l.
  Bash seems to use isspace (or isspace_l), as shown by:
    $ IFS=$'\f'; foo=$'x\fy'; printf '<%s>' $foo; echo
    <x><y>

- The example shouldn't use the slashes to separate the characters.
  It is not consistent with the IFS notation in 1.


``3b. Each occurrence in the input of an IFS character
that is not IFS white space, along with any adjacent IFS white space,
shall delimit a field, as described previously.''

- The term 'IFS character' is slightly confusing.
  At first i thought that that would mean any of <space><tab><newline>.
  But it just means 'a character that is in IFS'.
  Reword: Each occurence in the input of a character that is in IFS
  but is not IFS white space ...

- So 3b. allows for fields separated by IFS-white space and characters from IFS,
  where the fields get trimmed from IFS white space.
  Let's see...
    $ IFS=$' :'; foo=' : x : y z : '; printf '<%s>' $foo; echo
    <><x><y><z>
  (y z become two words because of 3c.)
  At first it seems strange that the first word is empty,
  while an empty one in the last position seems to be missing.
  But that is because delimiters are terminators as described
  in the second paragraph of the section. POSIX is a little weird here
  (too late to change).
  POSIX should use the word 'terminate' instead of 'delimit'.

- Nowhere is it stated that IFS characters are removed.
  Nor is it stated that IFS whitespace is removed.
  It seems implicit that characters, delimiting a field, are removed.
  This may seem logical, though a standard should be more precise.


``3c. Non-zero-length IFS white space shall delimit a field.''

- Because of the 'zero' in 3. the 'Non-zero' is needed.


I can see the difficulties in writing the bash manpage from this POSIX text.
Still the bash manpage is very nicely written.
Good job, Chet and all other contributors!

-- 
Regards, Mike Jonkmans



reply via email to

[Prev in Thread] Current Thread [Next in Thread]