[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-datamash] New feature: extended identifiers

From: Stefan Vargyas
Subject: Re: [Bug-datamash] New feature: extended identifiers
Date: Wed, 5 Oct 2016 09:18:05 -0700

Hello Assaf.

Thank you for your reply!

> I think this could be made the default, without the need for additional
> option.

Indeed, true. Added `-E|--extended-identifiers' solely as a proof of concept.

> We just need to make it well defined (since quotes are also used for shell
> quoting).
> Then we document it, ensure it doesn't cause any regressions with the current
> tests,
> and add new tests for this new behavior.
> IIUC, you want quotes (single and double) to allow minus characters
> (and in fact, any other character?) in a token, which will later be used as a
> field name.
> Currently, datamash rejects field names with dashes as:
>     $ datamash -H sum total-node-mem
>     datamash: field range for ‘sum’ must be numeric
> However, simple quotes won't help, as they will be discarded by the shell:
>     $ datamash -H sum "total-node-mem"
>     datamash: field range for ‘sum’ must be numeric
>     $ datamash -H sum 'total-node-mem'
>     datamash: field range for ‘sum’ must be numeric
> Which means it will either require awkward double quoting:
>     $ datamash -H sum "'total-node-mem'"
> Or slightly less awkward, with the entire 'program' as a string:
>     $ datamash -H "sum 'total-node-mem'"
> Both of these are not intuitive (definitely so for less savvy unix users).
> It will be tricky to explain this in the documentation,
> as saying "use quotes for fields with minus charactesr" is incorrect and
> insufficient,
> and then we'll need to go into explaining shell quotes.
> Perhaps we should consider another escaping scheme?
> one that's easy to type, and does not conflict with other possible shell
> characters?
> Something like this (just a thought, not necessarily the optimal solution):
>     datamash -H sum {total-node-mem}

I by no means stick to '\'' and '"' as quoting chars for datamash.

  $ ...|datamash -H sum \"total-node-mem\"

  $ ...|datamash -H sum \'total-node-mem\'

But, I found no possibility to use a single char for this. Maybe I'm wrong.
Curly braces -- '{' and '}' -- are used by bash. One in fact needs to quote
curly braces too, if commas and dots are to be allowed as part of a quoted
identifier. Why not allowing them along with many others, if datamash is
supposed to understand quoting of its identifiers? Datamash itself produces
column names of the following form:

  $ (echo 4; seq 4)|datamash -H sum 1

Note that the brace expansion mechanism is under the rule of bash -- therefore
bash might well choose sometime in the future to extend the present day syntax
of brace expansion. Consequently, even if certain use cases need not quote the
curly braces now, in the future they might well have to do that!

If we find that indeed there is no chance to use a single quoting char, then
(I suppose) using escaped quoting chars will be the most natural variant to
opt for. We should look around for other well-established GNU/Linux tools that
use something similar. I don't have right now an example at hand, but will try
to find one.

What do you think?

Stefan V.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]