bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20638: BUG: standard & extended RE's don't find NUL's :-(


From: Linda Walsh
Subject: bug#20638: BUG: standard & extended RE's don't find NUL's :-(
Date: Mon, 25 May 2015 15:22:30 -0700
User-agent: Thunderbird



Paul Eggert wrote:
Linda Walsh wrote:
it is documented, that '\ddd' or '\xHH' can be used
to match a single character of the value specified.

I don't see where it's documented to behave that way. Perhaps you're looking at the wrong documentation?
Perhaps you want to tell me where the documentation on the
standard and/or extended RE's is that you use?
I think I was referred to a number of different manpages...
it's the first reference under "See Also" at the bottom of the
grep page: awk.  From the awk manpage:

  String Constants
String constants in AWK are sequences of characters enclosed between double quotes (like "value"). Within strings, certain escape sequences
      are recognized, as in C.  These are:

      \\   A literal backslash.
      \a   The "alert" character; usually the ASCII BEL character.
      \b   Backspace.
      \f   Form-feed.
      \n   Newline.
      \r   Carriage return.
      \t   Horizontal tab.
      \v   Vertical tab.
      \xhex digits
The character represented by the string of hexadecimal digits fol- lowing the \x. As in ISO C, all following hexadecimal digits are considered part of the escape sequence. (This feature should tell us something about language design by committee.) E.g., "\x1B" is
           the ASCII ESC (escape) character.
\ddd The character represented by the 1-, 2-, or 3-digit sequence of
           octal digits.  E.g., "\033" is the ASCII ESC (escape) character.




 The argument was that
a NUL in a file made it non-text -- therefore it woudln't be a "line".

Obviously -z changes the definition of a line. -z is explicitly designed to operate on files containing NUL bytes. So that argument was not coherent.
---
That is my opinion, also, but nevertheless, that '\000' implies binary was said
early in this bug-discusion -- I was refuting that.  The other thing that
corrupts some tools is not working well if there is no terminating LF at the end of a page of text. (i.e. some editors will text-based files by adding an extra
LF at the end, which can cause problems with config files in some cases.



I'm afraid you've gone off the deep end here.
I didn't bring up POSIX, Eric did.

Eric's comments didn't incorporate conspiracy theories about corporate payoffs; yours did.
---
I am stating facts. The ones who had the most influence on posix in the past were the largest "gold sponsors". Now, it's fewer of them and more 'silver'.... but they, historically have had the most influence on such standards organizations.

   I will remind you that POSIX described its initial mission statement as
"descriptive" -- not "prescriptive". That changed ~ 2003 or so when they started
telling implementors what they had to remove to be posix compliant.
The worst violation I can think of is removing the ability for rm to be used
easily and safely to remove everything under a specific directory:
"rm -fr --one-file-system ." -- It might be good to have a 1 char name for
that.  For some reason I remember "-x" being a reasonable choice.

"rm" was always described to do a depth-first traversal, which means it shouldn't
even look at top-paths except to descend into them.That was changed
making coreutils rm's that follow that standard, unreliable for removing dir contents (w/o removing the dir).

   I have good reasons -- not conspiracy, but capitalistic reasons for
what I say, and if you don't believe money and capitalism run this country,
I'd have to say it was you, who had gone off the deep end.

But if you had -- I can probably welcome you -- I think I live in the
deep end... ;-)

linda





reply via email to

[Prev in Thread] Current Thread [Next in Thread]