bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields


From: Arnold Robbins
Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
Date: Tue, 23 May 2017 05:59:04 +0300
User-agent: Heirloom mailx 12.5 6/20/10

The changes have been made and pushed out to gawk's master. They'll
be in the next release.  The doc and tests have been greatly
improved as well.

Thanks everyone for the feedback related to this.

Arnold

> From: address@hidden
> Message-Id: <address@hidden>
> Date: Sun, 21 May 2017 21:01:24 -0600
> To: address@hidden, address@hidden
> Subject: Re: [bug-gawk] FIELDWIDTHS can miscount the number of fields
> Cc: address@hidden
>
> I think I'll go with this. Thanks for the feedback.
>
> Arnold
>
> "Andrew J. Schorr" <address@hidden> wrote:
>
> > On Sun, May 21, 2017 at 10:12:44PM +0300, Arnold Robbins wrote:
> > > Q1. Given FIELDWIDTHS = "2 3 4" and input data "aabb". How many fields
> > >    should there be?
> > >    A. Two, since that's all the data that's there
> > >    B. Three, with $3 == "", since it's supposed to be all fixed width data
> > > 
> > > A1. Gawk currently says three. Arnold leans towards two, since it reflects
> > >     the actual data and allows code expecting three fields to weed out
> > >     bad records.
> >
> > I agree.
> >
> > > Q2. Given FIELDWIDTHS = "2 3 4" and input data "aab", should $2 have a
> > >     value?
> > >     A. No - we're expecting three characters and they weren't all there
> > >     B. Yes - something was there, make it available
> > > 
> > > A2. Gawk currently says "yes".  Arnold isn't sure what's right here.
> > >     Input is welcome.
> >
> > I agree with current behavior (B).
> >
> > > Q3. Given FIELDWIDTHS = "2 3 4" and input data "aabbbccccddd" what should
> > >     be done with the dddd?
> > >     A. Nothing - it's extra, ignore it. NF should be set to 3. Code that
> > >        wants to know if there's something extra can use length() and
> > >        substr() to get it out of the record.
> > >     B. Stick it into $4 anyway.
> > > 
> > > A3. Arnold and gawk agree on (A).
> >
> > Since we plan to add support for trailing "*" as in Q4 below, I would
> > choose the approach that is easiest to implement. I think that's probably A,
> > since that's what we do now. Those who are interested in trailing data
> > can use "*".
> >
> > > Q4. Given the idea that using "*" at the end of FIELDWIDTHS to mean
> > >     anything else, then with FIELDWIDTHS = "2 3 4 *", and input
> > >     data "aabbbccccdddd" the dddd would go into $4. The final data
> > >     would be optional.  Is there any reason not to add this to gawk?
> > >     It seems to be actually useful and not just theoretically useful.
> > > 
> > > A4. Arnold thinks it's right to add it.
> >
> > Agreed. I presume that NF will be 3 if the record length is 9 and 4 for
> > 10 or longer.
> >
> > Regards,
> > Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]