[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Records longer than INT_MAX mishandled
From: |
arnold |
Subject: |
Re: Records longer than INT_MAX mishandled |
Date: |
Tue, 04 May 2021 02:17:25 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Hi.
Thanks for the report and patch.
If you can produce a patch to allow records to be SIZE_T_MAX (or whatever
the right constant is), I'd be interested.
I will try to find some time to look into this also.
Arnold
"Miguel Pineiro Jr." <mpj@pineiro.cc> wrote:
> Hello, gawk devs.
>
> gawk mishandles records longer than INT_MAX when get_a_record stuffs their
> size_t length in an int (io.c:4081: `retval = recm.len`).
>
> All of the following examples are paired, first a success using a record of
> length INT_MAX, then a failure using INT_MAX + 1.
>
>
> In the main i/o loop, records vanish when their corrupted length is negative,
> since inrec doesn't consider a negative value a valid record.
>
> $ gawk 'BEGIN {printf("%2147483647s\n", "a")}' | gawk 'END {print NR}'
> 1
> $ gawk 'BEGIN {printf("%2147483648s\n", "a")}' | gawk 'END {print NR}'
> 0
>
>
> In getline (do_getline/do_getline_redir), if the corrupted length is equal to
> EOF, it will trigger a silent bypass of the rest of the file. More likely,
> some other value will mislead buffer memory management routines and crash
> gawk.
>
> This bare getline fails fatally in set_record's buffer resizing loop, when it
> gives up trying to accomodate what it thinks is a humongous record
> (field.c:284: `cnt >= databuf_size` promotes a negative int cnt to unsigned
> long).
>
> $ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline} END {print
> NR}'
> 2
> $ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline} END {print
> NR}'
> gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: input record too large
>
>
> This getline var dies in make_string (make_str_node) from a corrupted
> allocation request:
>
> $ gawk 'BEGIN {printf("\n%2147483647s\n", "a")}' | gawk '{getline var} END
> {print NR}'
> 2
> $ gawk 'BEGIN {printf("\n%2147483648s\n", "a")}' | gawk '{getline var} END
> {print NR}'
> gawk: cmd. line:1: (FILENAME=- FNR=2) fatal: node.c:415:make_str_node:
> r->stptr: cannot allocate -2147483647 bytes of memory: Cannot allocate memory
>
>
> If INT_MAX is deemed sufficient, despite the use of capacious size_t i/o
> buffers, here's a diff.
>
> diff --git a/io.c b/io.c
> index 91c94d9b..4e777d75 100644
> --- a/io.c
> +++ b/io.c
> @@ -4026,6 +4026,9 @@ get_a_record(char **out, /* pointer to pointer
> to data */
> iop->dataend += iop->count;
> }
>
> + if (recm.len > INT_MAX)
> + fatal(_("input record length too large to return"));
> +
> /* set record, RT, return right value */
>
> /*