bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string types


From: ag
Subject: Re: string types
Date: Sun, 29 Dec 2019 19:13:39 +0200
User-agent: Mutt/1.12.1 (2019-06-15)

On Sun, Dec 29, at 10:19 Bruno Haible wrote:
> I agree with the goal. How to do it precisely, is an art however.

Ok, let's see what do we have until now.

First the Base: (easy) that is malloc maximum requested size from the kernel,
and that is PTRDIFF_MAX. We also here we have to forget SIZE_MAX as it is not
guaranteed that PTRDIFF_MAX equals to SIZE_MAX.

Second the (function returned value) Requirenment: (easy) a signed type.
There is an agreement that introduced functions should return on error -1,
else the interface will be complicated and we do not want complication.
So ptrdiff_t is adequate, since ptrdiff_t is in standard C and include'd
with stddef.h.

The rest:

Catching out of bounds conditions: (rather easy and already implemented in
snprintf) after the destination argument will follow an argument with the
allocated destination size (from the stack or from the heap). Now, snprintf
uses size_t here, but (question) isn't this a contradiction with the above
or not? Not probably but it's better ask to de-confuse things (as clarity
is a requirenment (semantics should be able to understood by mere humans)).

Another concern. What if destination is NULL. Should the internal functions
procceed by allocating a buffer with the requested size? What they will do
if the requested size <= 0?
There are preceding's here, like realpath() which allocates a buffer and
it's up to the user to free it.

Also. Declared as static internal variables considered harmfull. But sometimes
is desirable to have some data in a private place protected or handy to work
without side effects. This is solved however with the new im(muttable) module.

Catching truncation (first priority maybe): There is a choice to complicate
a bit the interface to return more values than -1, but this rejected by the
perfect legal assumption that humans are lazy, probably because they have been
exposed to try/catch (not bad if you ask but innapropriated for C).
The other thing it could be done is to return -1 and set errno accordingly with
the error. But such an error doesn't exists or exists? So ETRUNC should be
introduced. Few programmers will take the risk to make their program dependable
in something that is not standard, but perhaps they will (doubtfull though at
this stage).

The other thing that left is to check the returned value. Now. In snprintf(3)
there are notes about this and a method to calculate truncation (misty though).

       The functions snprintf() and vsnprintf() do not write more than size
       bytes (including the terminating null byte ('\0')).  If the output was
       truncated due to this limit, then the return value is the number of
       characters (excluding the terminating null byte) which would have been
       written to the final string if enough space had been available.  Thus,
       a return value of size or more means that the output was truncated.
       (See also below under NOTES.)

"which would have been written?" why not always the bytes that had been written?

Ok i got it after a break; still difficult to parse though and for what? We
have to admit that this a programmer error. [Sh|H]e should know her strings.
But we still want to help here. How? Three choises comes to mind.

1.
Use a bit map flag argument to control the function behavior. But this adds
verbosity but at the same time allows extensibility. Which conditions could
be covered with that? Perhaps to return an error if destination is NULL and
the function directed with the flag to return in this condition. Same with
the source. Very convenient but still verbose as you have to learn another
set of FLAGS.

2.
Introduce wrappers. Actually wrappers maybe will be used either way.
Or introduce a complete set of same functions, post-fixed with _un (to
denote unsafety, if _s (not sure) means safe).

3. The programmer knows best. Based on that, either continue with the
implementation like it is, or (where is appropriate) use a fourh argument
for the requested bytes to be written. And sleep in full conscience, that
you did your best you could. He should do the same.

Now. What concerns me most is the userspace and all these functions that
takes a variable number of arguments and a format string. I was fighting
in my code to know with a reliable way the actual bytes produced by the
sum of those arguments (as this can be really difficult to catch some of
those described conditions above). You also said at one point that noone
that does system programming will use (because of the overhead this set
of functions). We could go further and say. Noone sane (sorry) would want
to format big strings. Such functions are very prone to errors, but are
easy to work with them. So what should do with them? There is a method
to calculate the size beforehand (means before the declaration) and is
given in the printf(3) Linux man page.

  va_start(ap, fmt);
  size = vsnprintf(p, size, fmt, ap);
  va_end(ap);

So it parses twice varargs. Plus a compiler version (not 9*), gave warning
with -fsanitize=undefined). Could we (users) do better? Can we rely on
something else? No we can't. It's the only way. C strings are like this.

So (just speaking loudly here), is it possible to introduce such a function
that will handle this? Something like a growing buffer? But no. Usually
such usage is with stack allocated strings in function scope, but maybe
with some kind of recursivety (if such a word) when such a hypothetical
function sees at some point that the actual bytes exceeds the allocated
size. Sorry as i said it's about user convenience and safety (at the same
time), but as it proved with the immutable string, perhaps there is a way
with mmap (do not really know).

Lastly since we were talking about assumptions and such. It's better to
thing them like warrantees. And if we really want to go ahead, perhaps
with a way, that even there will be no providence for obsolete systems
or to care in this interface only for systems that should also provide
these warrantees (perhaps systems that were developed in this decade)
then we can wrap all this interface with a big fat:

#if WE_WANT_TO_MOVE_ON
 ...
#else
continue with this you have, but i cannot help you as i want and i can
#endif

Bruno,
Starting from zero always gives a breath of energy. So if we really want
to move on, then the best that it can be done, is to do it like you want
to do it, without any obligations to no[o]ne.
It's always us and (for) us at the end. The art here is that through us,
will benifit the outside of us at exact the same time. This is called dada
i believe.

> Bruno

Thanks,
 Αγαθοκλής
(you know the funny thing: Iggy, Nushrat-Fateh, the unknown to you
(but our great) Manolis, and you are my beloved idols. What a life!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]