[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] vnlog support
From: |
Erik Auerswald |
Subject: |
Re: [PATCH] vnlog support |
Date: |
Sun, 15 May 2022 17:09:16 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 |
Hello Dima,
On 14.05.22 22:18, Dima Kogan wrote:
Since we're talking about working on this again, and making a new
release, I'd like to ping this feature request. I exchanged a few emails
about it with Assaf right before he disappeared, and it sounded like he
was going to add this feature. I've no idea what, if anything, he wanted
to change about the patch.
vnlog support would make both projects much more useful. The original
mailing list post (quoted in full below) contains a demo and a patch.
The patch needs to be updated such that -v implies -W. If I can get an
ACK from whoever is intending to take over datamash, I can re-test the
patch, finalize things, add tests, and so on.
I am not a GNU datamash maintainer, but I'd like to provide some
high-level comments on the vnlog support patches:
1. While GNU datamash, when given the option -C, --skip-comments,
recognizes lines where the first non-whitespace character is
either '#' or ';' as comments, the vnlog format does not treat
';' as starting a comment. Thus keeping ';' as comment start
in vnlog mode creates a new and slightly different vnlog format.
This could result in incompatibilities with existing data and
tools. Is this intended?
2. The patches do not add any special treatment of '-' to GNU
datamash, but '-' does have a special meaning in vnlog. I
would expect a vnlog mode in GNU datamash to support the
following use case:
$ cat vnlog.example
# v1 v2 v3
1 2 3
4 - 6
- 8 9
$ # GNU datamash does not interpret '-'
$ ./datamash -C -W sum 1-3 < vnlog.example
./datamash: invalid numeric value in line 2 field 2: '-'
$ # tr can be used for this example, but not in general
$ tr -- - 0 < vnlog.example | ./datamash -C -W sum 1-3
5 10 18
But then missing values do not work with "sum" anyway:
$ cat missing_value
1 2 3
4 6
8 9
$ ./datamash sum 1-3 < missing_value
./datamash: invalid numeric value in line 2 field 2: ''
$ ./datamash sum 3 < missing_value
18
3. The patches seem to create a vnlog mode where both input and
output are in vnlog format. Could it be useful to be able to
specify vnlog format separately for input and output?
4. If one would consider creating vnlog output from character
separated input data via GNU datamash, empty fields would
need to be replaced with '-'. While GNU datamash has some
support for missing values via the --no-strict and --filler=X
options, this does not seem to replace empty fields with the
specified filler, and missing fields seem to be replaced only
sometimes, e.g., with the "transpose" operation, but not the
"reverse" operation. Would it be useful to add optionally
generating '-' fields?
5. Would it make sense to add the functionality required for
vnlog format support via separate options? There could be a
--vnlog option that sets all those correctly and then adds
the vnlog specific prologue handling.
Perhaps the functionality could be added using variables that
could be controlled via options, without adding all those
controlling options immediately.
- There is already a -W, --whitespace option.
- There is already an --output-delimiter option.
- There is already a -C, --skip-comments option.
- There could be a new option to specify the comment
character.
- There could be a new option to treat some value, e.g., the
filler value, as representing an empty field.
- There could be a new option to replace empty and missing
fields in the output with the filler value.
- There could be a new option to add a prefix to the output
header line.
- There could be a new option to read the input header line
from a vnlog prologue.
I have trimmed the patches from my email, since I did not directly
comment on the code details. Here are mailing list archive URLs
for easy reference:
- Original posting of vnlog support patches:
https://lists.gnu.org/archive/html/bug-datamash/2020-04/msg00006.html
- Current re-posting of vnlog support patches:
https://lists.gnu.org/archive/html/bug-datamash/2022-05/msg00015.html
The above comments are only questions and suggestions, of course.
Best regards,
Erik
Re: [PATCH] vnlog support, Erik Auerswald, 2022/05/21