bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#14224: Feature request for the `cut`: record delimiter


From: George Brink
Subject: bug#14224: Feature request for the `cut`: record delimiter
Date: Thu, 18 Apr 2013 11:41:17 -0400

Pádraig,

Thank you for alternative suggestions.
Actually I just found yet another way to solve my problem:
perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
\"\002\");" data.dat >new_data.dat
It works fine, but I am a little concerned of the speed. I have over three
hundreds of such files, from 3Mb to 30Mb each. And this process should be
run every day... I thought that by using cut (which just looks for
delimiters) I can gain a few minutes on the whole process.

Originally I though of adding "-r, --record-delimiter=DELIM" and
"--output-record-delimiter=DELIM: keys to the cut.
Then the example above could be done with
cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47
data.dat >new_data.dat
I think it is feasible and would be more convenient (and hopefully faster)
than using a whole perl or two calls to tr.




Bob,
I understand your desire to receive a discussion of features not inside the
bug related mail list, but here is a extract from the README:
> Mail suggestions and bug reports for these programs to
> the address on the last line of --help output.
And guess what, the `cut --help` has the bug-coreutils email in the last
line! The coreutils email is not mentioned inside README at all. And
bug-coreutils is mentioned several times in different context.
I apologize for using this mail-list inappropriately, but I did not know
about any other mail-lists



On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady <address@hidden> wrote:

> On 04/17/2013 02:26 PM, George Brink wrote:
> > Hello,
> >
> > I have a task of extracting several "fields" from the text file. The
> > standard `cut` tool could be a perfect tool for a job, but...
> > In my file the '\n' character is a legal symbol inside fields and
> therefore
> > the text file uses other symbol for record-separator. And the `cut` has a
> > hard-coded '\n' for record separator (I just checked the source from the
> > coreutils-8.21 package).
>
> The patch would be simple but not without compatibility cost.
> I.E. scripts using this would immediately become incompatible
> with any systems without this feature.
>
> So you'd like something like tac -s, --separator
> However cut -s is taken, so we'd have to avoid the short -s at least.
> Also tac -s takes a string rather than a character, so
> that gives some extra credence (and complexity) to that option there.
>
> Also related would be to support the -z, --zero-terminated option.
> join, sort and uniq all have this option to use NUL as the record
> separator,
> however they're all closely related sort dependent utilities
> and we're trying to unify options between them.
>
> If it is just a character you want to separate on,
> then you can always use tr to convert before processing,
> albeit with associated data copying overhead.
>
> SEP=^
> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>
> So given that cut is not special here among the text filters,
> and there is a workaround available, I'm 60:40 against
> adding this feature.
>
> thanks,
> Pádraig.
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]