bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] libunistring-0.9.10: coverity scan shows a few mo


From: Bruno Haible
Subject: Re: [bug-libunistring] libunistring-0.9.10: coverity scan shows a few more problems in bundled gnulib
Date: Wed, 01 May 2024 17:40:05 +0200

Hi Mike,

> A while ago, I reported 4 problems discovered by coverity in
> libunistring-0.9.10:
> 
> https://lists.gnu.org/archive/html/bug-libunistring/2021-06/msg00000.html
> 
> and Bruno Haible fixed them:
> 
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4d288a80bf7ebe29334b9805cdcc70eacb6059c1
> 
> A new scan with Bruno’s patch applied now complains about several more
> problems. I tried to have a look and as far as I understand it, most of
> them could be false positives and no real problems. But I am not 100%
> sure.

libunistring version 0.9.10 is pretty old. Since then, several fixes have
been applied to vasnprintf.c in particular:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=278b4175c9d7dd47c1a3071554aac02add3b3c35
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4d288a80bf7ebe29334b9805cdcc70eacb6059c1
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=5527d5c548702b89d217bbe58036996066a709b6
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=54c80fb6f106d7f3430dd075fcb7327bab07f368
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=ca1cd9b39787fe8a2329c77bc60d4a7c3ab2334e
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=c9e246f596836d1eb57120a2d3b65687170356a1

It would therefore be better to do it with libunistring 1.2.

Also, for coverity findings on Gnulib source code, we use the coverity portal 
where we mark
false positives etc. - so that when Gnulib source code is used in other 
packages, redundant
work does not need to be done.

> Here are the problems reported:
> 
> Error: BAD_ALLOC_ARITHMETIC (CWE-131):
> libunistring-0.9.10/lib/striconveha.c:347: bad_alloc_arithmetic: Adding an 
> offset to the result of a call to "__builtin_alloca" might indicate an 
> under-allocation.
> libunistring-0.9.10/lib/striconveha.c:347: remediation: Did you intend for 
> the size argument to be "len + 10UL + 1UL + 32UL - 1UL + 31UL"?

This warning occurs for every malloca() or xmalloca() invocation. Ignore.

> Error: UNINIT (CWE-457):
> libunistring-0.9.10/lib/vasnprintf.c:5352: var_decl: Declaring variable 
> "tmpdst_len" without initializer.
> libunistring-0.9.10/lib/vasnprintf.c:5360: uninit_use_in_call: Using 
> uninitialized value "tmpdst_len" when calling "u16_conv_from_encoding".
> # 5358|                           tmpsrc = tmp;
> # 5359|   # endif
> # 5360|->                         tmpdst =
> # 5361|                             DCHAR_CONV_FROM_ENCODING (locale_charset 
> (),
> # 5362|                                                       
> iconveh_question_mark,
> 
> I checked what is done with that unitialized value of tmpdest_len and it
> appears to me that there is no real bug because &tmpdest_len is passed
> to some function, which pass it to other functions and eventually the
> unitialized value is overwritten with some length and returned.

Yes, no problem because this is the calling convention that is documented in the
GNU libunistring manual.

> But it
> seems clearer to initialize it to 0 anyway, just in cases, because it is
> hard to track what all these functions do with the unitialized value. So
> maybe something like this would be good:
> 
> diff -ru libunistring-0.9.10.orig/lib/vasnprintf.c 
> libunistring-0.9.10/lib/vasnprintf.c
> --- libunistring-0.9.10.orig/lib/vasnprintf.c 2024-04-30 18:11:17.942095093 
> +0200
> +++ libunistring-0.9.10/lib/vasnprintf.c      2024-04-30 18:10:50.256958451 
> +0200
> @@ -5349,7 +5349,7 @@
>                             The result string is not certainly ASCII.  */
>                          const TCHAR_T *tmpsrc;
>                          DCHAR_T *tmpdst;
> -                        size_t tmpdst_len;
> +                        size_t tmpdst_len = 0;
>                          /* This code assumes that TCHAR_T is 'char'.  */
>                          verify (sizeof (TCHAR_T) == 1);
>  # if USE_SNPRINTF

No, we don't want this in our source code, because
  - It does not make the code clearer to add unnecessary initializations.
  - Tools like coverity or a particular GCC version have certain shortcomings, 
which
    can be fixed in 2 years. Whereas this source code is meant to be readable 
in 10
    or 20 years from now.

> Error: OVERRUN (CWE-119):
> libunistring-0.9.10/lib/vasnprintf.c:1906: assignment: Assigning: "length" = 
> "augmented_length".
> libunistring-0.9.10/lib/vasnprintf.c:1996: return_constant: Function call 
> "xsum(((width <= 1844674407370955161UL) ? (size_t)width * 10UL : 
> 18446744073709551615UL), *digitp++ - 48)" may return 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:1996: assignment: Assigning: "width" = 
> "xsum(((width <= 1844674407370955161UL) ? (size_t)width * 10UL : 
> 18446744073709551615UL), *digitp++ - 48)". The value of "width" is now 
> 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:2171: assignment: Assigning: 
> "characters" = "0UL".
> libunistring-0.9.10/lib/vasnprintf.c:2226: assignment: Assigning: "n" = 
> "width - characters". The value of "n" is now 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:2228: overrun-buffer-arg: Calling 
> "memset" with "result + length" and "n" is suspicious because of the very 
> large index, 18446744073709551615. The index may be due to a negative 
> parameter being interpreted as unsigned. [Note: The source code 
> implementation of the function has been overridden by a builtin model.]
> # 2226|                             size_t n = width - characters;
> # 2227|                             ENSURE_ALLOCATION (xsum (length, n));
> # 2228|->                           DCHAR_SET (result + length, ' ', n);
> # 2229|                             length += n;
> # 2230|                           }
> 
> I think the size calculations are OK and SIZE_MAX is normally not reached 
> with the format string specifiers.

... and the SIZE_MAX is there precisely in order to avoid integer overflow.

> Error: USE_AFTER_FREE (CWE-416):
> libunistring-0.9.10/lib/vasnprintf.c:2142: assign: Assigning: "result" = 
> "memory".
> libunistring-0.9.10/lib/vasnprintf.c:2114: assign: Assigning: "converted" = 
> "result + length".
> libunistring-0.9.10/lib/vasnprintf.c:2125: identity_transfer: Passing 
> "converted" as argument 3 to function "u8_to_u32", which returns that 
> argument.
> libunistring-0.9.10/lib/vasnprintf.c:2125: assign: Assigning: "converted" = 
> "u8_to_u32(arg, arg_end - arg, converted, &converted_len)".
> libunistring-0.9.10/lib/vasnprintf.c:2145: freed_arg: "free" frees 
> "converted".
> libunistring-0.9.10/lib/vasnprintf.c:1899: pass_freed_arg: Passing freed 
> pointer "result" as an argument to "u32_cpy".
> # 1897|               size_t augmented_length = xsum (length, n);
> # 1898|   
> # 1899|->             ENSURE_ALLOCATION (augmented_length);
> # 1900|               /* This copies a piece of FCHAR_T[] into a DCHAR_T[].  
> Here we
> # 1901|                  need that the format string contains only ASCII 
> characters
> 
> 
> This seems OK to me.
> 
> if (converted != result + length) then converted is copied to
> "result+length", after that it can be freed.

Yes.

> Error: UNINIT (CWE-457):
> libunistring-0.9.10/lib/uniconv/u-strconv-from-enc.h:32: var_decl: Declaring 
> variable "length" without initializer.
> libunistring-0.9.10/lib/uniconv/u-strconv-from-enc.h:34: uninit_use_in_call: 
> Using uninitialized value "length" when calling "u32_conv_from_encoding".
> #   32|     size_t length;
> #   33|   
> #   34|->   result =
> #   35|       U_CONV_FROM_ENCODING (fromcode, handler,
> #   36|                             string, strlen (string) + 1, NULL,
> 
> Initializing length = 0  would probably be good practice, even if this
> does not really make a difference here because length is written to
> during the function call.

See above.

> Error: USE_AFTER_FREE (CWE-416):
> libunistring-0.9.10/lib/vasnprintf.c:1926: assign: Assigning: "result" = 
> "memory".
> libunistring-0.9.10/lib/vasnprintf.c:2114: assign: Assigning: "converted" = 
> "result + length".
> libunistring-0.9.10/lib/vasnprintf.c:2118: identity_transfer: Passing 
> "converted" as argument 6 to function "u8_conv_to_encoding", which returns 
> that argument.
> libunistring-0.9.10/lib/vasnprintf.c:2118: assign: Assigning: "converted" = 
> "u8_conv_to_encoding(locale_charset(), iconveh_question_mark, arg, arg_end - 
> arg, NULL, converted, &converted_len)".
> libunistring-0.9.10/lib/vasnprintf.c:2142: freed_arg: "free" frees 
> "converted".
> libunistring-0.9.10/lib/vasnprintf.c:5612: double_free: Calling "free" frees 
> pointer "result" which has already been freed.
> # 5610|     out_of_memory:
> # 5611|       if (!(result == resultbuf || result == NULL))
> # 5612|->       free (result);
> # 5613|       if (buf_malloced != NULL)
> # 5614|         free (buf_malloced);
> 
> I don't see a double free here.

Yes, coverity cannot deal well with a variable being malloc()ed only in
a certain case and not in another case. In other code, as a workaround,
I introduced a variable that is either NULL or malloc()ed. Then at least
GCC's analyzer reports fewer false positives. But in vasnprintf.c we already
have a ton of local variables.

> Error: UNINIT (CWE-457):
> libunistring-0.9.10/lib/striconveh.c:374: var_decl: Declaring variable "tmp" 
> without initializer.
> libunistring-0.9.10/lib/striconveh.c:1006: uninit_use_in_call: Using 
> uninitialized value "*tmp.buf" when calling "memcpy". [Note: The source code 
> implementation of the function has been overridden by a builtin model.]
> # 1004|               }
> # 1005|           }
> # 1006|->       memcpy (result, tmpbuf, length);
> # 1007|       }
> # 1008|     else if (result != *resultp && length + extra_alloc < allocated)
> 
> I think the code is OK and tmpbuf is written to and thus it is initialized 
> before the `memcpy (result, tmpbuf, length);` happens. But maybe it would be 
> good to initialize it more obviously to silence the warning?:
> 
> diff -ru libunistring-0.9.10.orig/lib/striconveh.c 
> libunistring-0.9.10/lib/striconveh.c
> --- libunistring-0.9.10.orig/lib/striconveh.c 2018-05-25 18:02:11.000000000 
> +0200
> +++ libunistring-0.9.10/lib/striconveh.c      2024-04-30 18:22:26.711392895 
> +0200
> @@ -380,6 +380,9 @@
>    size_t length;
>    size_t last_length = (size_t)(-1); /* only needed if offsets != NULL */
>  
> +  for (int i = 0; i < tmpbufsize; ++i)
> +    tmpbuf[i] = '\0';
> +
>    if (*resultp != NULL && *lengthp >= sizeof (tmpbuf))
>      {
>        initial_result = *resultp;

See above. Better work on improving the static analyzers, not on deteriorating 
the
source code.

> Error: OVERRUN (CWE-119):
> libunistring-0.9.10/lib/vasnprintf.c:1906: assignment: Assigning: "length" = 
> "augmented_length".
> libunistring-0.9.10/lib/vasnprintf.c:1996: return_constant: Function call 
> "xsum(((width <= 1844674407370955161UL) ? (size_t)width * 10UL : 
> 18446744073709551615UL), *digitp++ - 48)" may return 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:1996: assignment: Assigning: "width" = 
> "xsum(((width <= 1844674407370955161UL) ? (size_t)width * 10UL : 
> 18446744073709551615UL), *digitp++ - 48)". The value of "width" is now 
> 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:2044: assignment: Assigning: 
> "characters" = "0UL".
> libunistring-0.9.10/lib/vasnprintf.c:2099: assignment: Assigning: "n" = 
> "width - characters". The value of "n" is now 18446744073709551615.
> libunistring-0.9.10/lib/vasnprintf.c:2101: overrun-buffer-arg: Calling 
> "u8_set" with "result + length" and "n" is suspicious because of the very 
> large index, 18446744073709551615. The index may be due to a negative 
> parameter being interpreted as unsigned.
> # 2099|                             size_t n = width - characters;
> # 2100|                             ENSURE_ALLOCATION (xsum (length, n));
> # 2101|->                           DCHAR_SET (result + length, ' ', n);
> # 2102|                             length += n;
> # 2103|                           }
> 
> I think the size calculations are OK and SIZE_MAX is normally not
> reached with the format string specifiers.

... and SIZE_MAX is mentioned for a good purpose.

> Error: OVERRUN (CWE-119):
> libunistring-0.9.10/lib/uninorm/canonical-decomposition.c:88: cond_at_most: 
> Checking "entry < 32768" implies that "entry" may be up to 32767 on the true 
> branch.
> libunistring-0.9.10/lib/uninorm/canonical-decomposition.c:94: alias: 
> Assigning: "p" = "&gl_uninorm_decomp_chars_table[3 * entry]". "p" may now 
> point to as high as byte 98301 of "gl_uninorm_decomp_chars_table" (which 
> consists of 25575 bytes).
> libunistring-0.9.10/lib/uninorm/canonical-decomposition.c:95: overrun-local: 
> Overrunning array of 25575 bytes at byte offset 98301 by dereferencing 
> pointer "p + 0".
> #   93|   
> #   94|             p = &gl_uninorm_decomp_chars_table[3 * entry];
> #   95|->           element = (p[0] << 16) | (p[1] << 8) | p[2];
> #   96|             /* The first element has 5 bits for the decomposition 
> type.  */
> #   97|             if (((element >> 18) & 0x1f) != UC_DECOMP_CANONICAL)
> 
> gl_uninorm_decomp_chars_table is indeed only 25575 bytes long and if entry 
> can be up to 0x8000-1, then 3*entry could bee too big as an index for 
> gl_uninorm_decomp_chars_table

We have extensive unit tests of this functionality, and when compiled with
clang + ASAN, they don't trigger anything.

In summary, while coverity can point to real bugs, nowadays clang + ASAN
provides a better way (more findings with less effort) to check for mistakes.

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]