bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Records longer than INT_MAX mishandled


From: arnold
Subject: Re: Records longer than INT_MAX mishandled
Date: Tue, 04 May 2021 02:17:25 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thanks for the report and patch.

If you can produce a patch to allow records to be SIZE_T_MAX (or whatever
the right constant is), I'd be interested.

I will try to find some time to look into this also.

Arnold

"Miguel Pineiro Jr." <mpj@pineiro.cc> wrote:

> Hello, gawk devs.
>
> gawk mishandles records longer than INT_MAX when get_a_record stuffs their 
> size_t length in an int (io.c:4081: `retval = recm.len`).
>
> All of the following examples are paired, first a success using a record of 
> length INT_MAX, then a failure using INT_MAX + 1.
>
>
> In the main i/o loop, records vanish when their corrupted length is negative, 
> since inrec doesn't consider a negative value a valid record.
>
> $ gawk 'BEGIN {printf("%2147483647s\n", "a")}' | gawk 'END {print NR}'
> 1
> $ gawk 'BEGIN {printf("%2147483648s\n", "a")}' | gawk 'END {print NR}'
> 0
>
>
> In getline (do_getline/do_getline_redir), if the corrupted length is equal to 
> EOF, it will trigger a silent bypass of the rest of the file. More likely, 
> some other value will mislead buffer memory management routines and crash 
> gawk.
>
> This bare getline fails fatally in set_record's buffer resizing loop, when it 
> gives up trying to accomodate what it thinks is a humongous record 
> (field.c:284: `cnt >= databuf_size` promotes a negative int cnt to unsigned 
> long).
>
> $ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline} END {print 
> NR}'
> 2
> $ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline} END {print 
> NR}'
> gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: input record too large
>
>
> This getline var dies in make_string (make_str_node) from a corrupted 
> allocation request:
>
> $ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline var} END 
> {print NR}'
> 2
> $ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline var} END 
> {print NR}'
> gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: node.c:415:make_str_node: 
> r->stptr: cannot allocate -2147483647 bytes of memory: Cannot allocate memory
>
>
> If INT_MAX is deemed sufficient, despite the use of capacious size_t i/o 
> buffers, here's a diff.
>
> diff --git a/io.c b/io.c
> index 91c94d9b..4e777d75 100644
> --- a/io.c
> +++ b/io.c
> @@ -4026,6 +4026,9 @@ get_a_record(char **out,        /* pointer to pointer 
> to data */
>                       iop->dataend += iop->count;
>       }
>  
> +     if (recm.len > INT_MAX)
> +             fatal(_("input record length too large to return"));
> +
>       /* set record, RT, return right value */
>  
>       /*



reply via email to

[Prev in Thread] Current Thread [Next in Thread]