qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [Qemu-devel] [PATCH for-2.10] Use qemu_tolower() and qemu


From: Eric Blake
Subject: Re: [Qemu-ppc] [Qemu-devel] [PATCH for-2.10] Use qemu_tolower() and qemu_toupper(), not tolower() and toupper()
Date: Thu, 20 Jul 2017 16:29:03 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/20/2017 04:03 PM, Peter Maydell wrote:
> On 20 July 2017 at 19:26, Richard Henderson <address@hidden> wrote:
>> On 07/20/2017 06:31 AM, Peter Maydell wrote:
>>>
>>> gdbstub.c:914:13: warning: array subscript has type 'char'
>>>
>>> This reflects the fact that toupper() and tolower() give
>>> undefined behaviour if they are passed a value that isn't
>>> a valid 'unsigned char' or EOF.
>>
>>
>> Not saying we shouldn't use qemu_tolower etc, but this statement is not true
>> at all.  Officially, the argument to toupper and tolower is type int.
> 
> (I was going to quote POSIX here but I see Eric has done so already.)
> 
>> This sounds like a bug in NetBSD -- though it may not even be that, as they
>> may have done something clever and put the symbol in the middle of the data.
>> A trick that worked before compiler warnings got smarter.
> 
> https://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/sys/ctype_inline.h
> The implementation is
>   #define toupper(c) ((int)((_toupper_tab_ + 1)[(c)]))
> 
> (where I assume _toupper_tab_ is the obvious array.)

In fact, if NetBSD's implementation is anything like Cygwin's, then it
is a safe bet that _toupper_tab_ is a symbol that lands in the middle of
a larger 384-byte array, such that _toupper_tab_[-1] (for (signed
char)'\xfe') happens to resolve to a valid image address, and will have
the same contents as _toupper_tab_[255] (for (unsigned char)'\xfe'),
precisely to cater to so many clueless programs have forgotten about
signed char widening to negative values (even though the standard says
the behavior is undefined, quality-of-implementation demands that you
try to do the sane thing anyway).

The table is of course only MOSTLY symmetric; there are some single-byte
locales where isspace((unsigned char)'\xff') is true but isspace(EOF) is
false (in using the magic-middle-of-array locations, _tab_[0] cannot
always match _tab_[256] in all locales).  And it is wraparound where
'\xff' collides with -1 that explains WHY C99 documents the range to be
ints corresponding to unsigned char.

Side note: NetBSD has a bug in their ctype.h.
toupper(INT64_C(0x100000001)) is supposed to be the same as toupper(1),
since POSIX requires toupper() to be available as a function (and
permits a macro as well).  A function with an int parameter must
gracefully truncate a 64-bit input down to 32-bits, so the macro must do
the same; but the NetBSD macro uses all 64 bits to dereference into
garbage memory locations.  Fortunately, no one runs into the bug in real
life because no one really tries to pass 64-bit numbers to ctype macros.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]