[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Fixed incomplete and incorrect treatment of comments and tra
From: |
Tim Rice |
Subject: |
Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace |
Date: |
Sat, 28 May 2022 04:54:03 +0000 |
Hi Dima,
AWK doesn't have a concept of comments in the data it parses.
Yep, that's exactly the point.
Whether or not Awk has a concept of comments in the data it processes is
irrelevant. We're relying on analogies between Datamash and Awk to think about
correct behavior, and so the analogies need to match up in the right way.
Datamash accepting commented lines in data is not in the same category of
functionality as Awk accepting comments in Awk scripts. Therefore, you can't
rely on Awk's implementation of commenting to reason about this specific issue
for Datamash.
I disagree, but I also don't particularly care what datamash does by
default. In --vnlog mode it should suppress trailing comments.
Then we needn't worry about it until we're sure the vnlog patch is ready for
merging. I might have lost track of all developments in that thread, but iirc
there was still more to be done.
Currently, I'm thinking that vnlog support might be a good candidate for v2.0.
For v1.8, we should focus on bug fixes and general tidying up.
Remember, I am still new to maintaining a GNU project. For this release, I'd
appreciate having a chance to try out the process without too many
complications involved.
That is,
bar 5
bbb
The second line has trailing spaces. At the moment, Datamash handles this
in a way that is arguably correct:
```
$ ./datamash -W transpose < ~/tmp/testing.txt
bar bbb
5
```
The patch adds a test to the test suite to flag the faulty behavior, and
to demonstrate that it has been fixed. See that for detail.
Yep, your test is what my example was based on. My point was that if you
arbitrarily skip end-of-line whitespace, that will break other things that
currently work.
And this whole discussion is obviated if you use sed to clean up eol
whitespace. Datamash is intended to work in a pipeline with the rest of the GNU
ecosystem. We needn't solve all problems related to cleaning messy data, so
there is no need to break what currently works.
So, the whole point of datamash and vnlog and all the others is to be
convenient. I can do everything with AWK, or with C code, and nobody
NEEDS any of these tools. Handling obviously-quirky input in the obvious
way improves the convenience of datamash, and we should strive to do
that.
Data like in my previous example isn't "obviously quirky." This means that the
'if' guard proposed in your GitHub code is really not the obvious way to handle this.
At the very least, you would need to introduce more book-keeping, to only
discard the whitespace if the number of fields detected so far in the current
record is not less than previously detected record lengths.
Which would introduce a lot more complexity. For a task that can already be
handled quickly and simply with Sed.
Once you decide what you want the base code to be, I'll rebase my
patches on top, to do the extra stuff with --vnlog. Kinda busy right
now. Probably will get to it in a week or two.
Sure, that's okay. Like I said, vnlog support is a better match for what I have
in mind for v2.0, and I'd prefer to roll v1.8 without it.
This gives us more time to think about the form vnlog support should take, and
what accommodations will need to be made. Remember, for v2.0 we're thinking
about making Datamash more modular and extensible, so vnlog might be able to
take advantage of that work, too.
~ Tim
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, (continued)
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Shawn Wagner, 2022/05/18
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/19
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/20
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/21
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Tim Rice, 2022/05/27
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Dima Kogan, 2022/05/28
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace,
Tim Rice <=
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Tim Rice, 2022/05/27
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/28
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/28
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Tim Rice, 2022/05/28
- Re: [PATCH] Fixed incomplete and incorrect treatment of comments and trailing whitespace, Erik Auerswald, 2022/05/29