gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSString bug with test and really dodgy patch.


From: Richard Frith-Macdonald
Subject: Re: NSString bug with test and really dodgy patch.
Date: Wed, 3 Oct 2012 08:53:09 +0100

On 3 Oct 2012, at 08:09, Wolfgang Lux wrote:

> Richard Frith-Macdonald wrote:
> 
>> We could probably adapt your patch to use precision as string lengh in those 
>> cases where it will work, but you can't catch all cases that way ... so 
>> maybe it's better if people find out as soon as possible that c-strings have 
>> to be nul terminated.
>> 
>> Sorry about this ... but it's a behavior inherited from the C stdio library 
>> and posix etc standards.  My own feeling is that format strings *ought* to 
>> provide some way of working with unterminated strings, but they just don't, 
>> so you have to copy the data into a big enough buffer, add the nul 
>> terminator, and use that buffer intead of the original data :-(
> 
> I don't think your description of the standards is correct. My copy of the 
> ANSI C'99 standard has this to say on the %s format specifier:
> "If the precision is specified, no more than that many characters are 
> written. If the precision is not specified or is greater than the size of the 
> array, the array shall contain a null character.
> With that specification, I'd say that Chris's code is correct. He uses an 
> array containing 50 bytes and uses precision 50, so the array shouldn't 
> require a NULL terminator.

Oh, that's a different section of the documentationm (I was reading the bit 
dealing with precision, and I just found the bit you quote under the 's' flag).
Which would mean there are apparent inconsistencies ... so I looked further 
(specifically at recent xopen documentation ... which really ought to be 
authoritative for modern software).

And ... that's different again ... the xopen docs make it clear that they are 
talking about *bytes* (so the current implementation is wrong) where other 
documentation talks about characters:

The argument shall be a pointer to an array of char. Bytes from the array shall 
be written up to (but not including) any terminating null byte. If the 
precision is specified, no more than that many bytes shall be written. If the 
precision is not specified or is greater than the size of the array, the 
application shall ensure that the array contains a null byte.
If an l (ell) qualifier is present, the argument shall be a pointer to an array 
of type wchar_t. Wide characters from the array shall be converted to 
characters (each as if by a call to the wcrtomb() function, with the conversion 
state described by an mbstate_t object initialized to zero before the first 
wide character is converted) up to and including a terminating null wide 
character. The resulting characters shall be written up to (but not including) 
the terminating null character (byte). If no precision is specified, the 
application shall ensure that the array contains a null wide character. If a 
precision is specified, no more than that many characters (bytes) shall be 
written (including shift sequences, if any), and the array shall contain a null 
wide character if, to equal the character sequence length given by the 
precision, the function would need to access a wide character one past the end 
of the array. In no case shall a partial character be written.

Interestingly, they are very specific about saying that the precision is a 
number of bytes rather than a number of characters (quite different from the 
older documentation I was looking at before) even in the case where the output 
is wide characters.  They even mention omitting the last character if it's a 
multibyte one and not all bytes would be permitted by the precision. 

Maybe we should update the code to try to match the modern standard, but ... in 
the context of GSFormat adopting a byte-based output precision would be very 
counter-intuitive since an NSString deals with UTF-16 and everyone expects the 
precision to give a number of 16bit characters in the resulting NSString object.

So I'm not sure what to do ... the C standards have changed from working with 
characters to working with bytes (which is good), but we can't simply adopt 
that because it would break OSX compatibility (and people's reasonable 
expectations).

Perhaps what we need is what I suggested (as a complex/inefficient option) in 
an earlier email ... to parse the input string character by character and treat 
the precision as a limit on the number of characters we read from it.
Perhaps tests on OSX to reverse-engineer Apple's behavior are our best bet.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]