[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions |
Date: |
Fri, 18 Jan 2013 19:06:28 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) |
Eduardo Habkost <address@hidden> writes:
> On Fri, Jan 18, 2013 at 11:01:14AM +0100, Markus Armbruster wrote:
>> Eduardo Habkost <address@hidden> writes:
>>
>> > There are lots of duplicate parsing code using strto*() in QEMU, and
>> > most of that code is broken in one way or another. Even the visitors
>> > code have duplicate integer parsing code[1]. This introduces functions
>> > to help parsing unsigned int values: parse_uint() and parse_uint_full().
>> >
>> > Parsing functions for signed ints and floats will be submitted later.
>> >
>> > parse_uint_full() has all the checks made by opts_type_uint64() at
>> > opts-visitor.c:
>> >
>> > - Check for NULL (returns -EINVAL)
>> > - Check for negative numbers (returns -ERANGE)
>> > - Check for empty string (returns -EINVAL)
>> > - Check for overflow or other errno values set by strtoll() (returns
>> > -errno)
>> > - Check for end of string (reject invalid characters after number)
>> > (returns -EINVAL)
>> >
>> > parse_uint() does everything above except checking for the end of the
>> > string, so callers can continue parsing the remainder of string after
>> > the number.
>> >
>> > Unit tests included.
>> >
>> > [1] string-input-visitor.c:parse_int() could use the same parsing code
>> > used by opts-visitor.c:opts_type_int(), instead of duplicating that
>> > logic.
>> [...]
>> > diff --git a/util/cutils.c b/util/cutils.c
>> > index 80bb1dc..7f09251 100644
>> > --- a/util/cutils.c
>> > +++ b/util/cutils.c
>> > @@ -270,6 +270,85 @@ int64_t strtosz(const char *nptr, char **end)
>> > return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
>> > }
>> >
>> > +/* Try to parse an unsigned integer
>> > + *
>> > + * Error checks done by the function:
>> > + * - NULL pointer will return -EINVAL.
>> > + * - Empty strings will return -EINVAL.
>>
>> Same for strings containing only whitespace.
>
> Oh, you're right.
>
>>
>> > + * - Overflow errors or other errno values set by strtoull() will
>>
>> Extra space before 'set'.
>
> Oops.
>
>>
>> > + * return -errno (-ERANGE in case of overflow).
>> > + * - Differently from strtoull(), values starting with a minus sign are
>> > + * rejected (returning -ERANGE).
>> > + *
>> > + * Sets endptr to point to the first invalid character.
>>
>> Mention that unlike strtol() & friends, endptr must not be null?
>> Probably not necessary.
>
> I believe it is implicit if I document it as "Sets *endptr". But worth
> documenting as it is different from strtoull() behavior.
>
>>
>> The two ERANGE cases treat endptr differently: it's either set to point
>> just behind the out-of-range number, or to point to the beginning of the
>> out-of-range number. Ugh.
>
> This should be fixed in the newer version I sent.
>
>>
>> > + Callers may rely
>> > + * on *value and *endptr to be always set by the function, even in case of
>> > + * errors.
>>
>> You neglect to specify what gets assigned to *value in error cases.
>>
>
> Undefined. :-)
>
> Anyway, I agree it not very useful to document that "*value and *endptr
> will be always set" if it is undefined. I will try to reword it.
Yes, please.
I'm not fond of undefined behavior, unless there's a good reason.
If you don't want to assign a defined value, consider not assigning
anything.
>> Your list of error checks isn't quite complete. Here's my try:
>>
>> /**
>> * Parse unsigned integer
>> *
>> * @s: String to parse
>> * @value: Destination for parsed integer value
>> * @endptr: Destination for pointer to first character not consumed
>> * @base: integer base, between 2 and 36 inclusive, or 0
>> *
>> * Parsed syntax is just like strol()'s: arbitrary whitespace, a single
>> * optional '+' or '-', an optional "0x" if @base is 0 or 16, one or
>> * more digits.
>
> The newer version has '-' as _not_ part of the syntax.
>
>> *
>> * If @s is null, or @base is invalid, or @s doesn't start with an
>> * integer in the syntax above, set address@hidden to 0, address@hidden to
>> @s, and
>> * return -EINVAL.
>
> This matches the behavior of the newer version. But I would like to keep
> *value documented as undefined in case of errors.
>
>> *
>> * Set @endptr to point right beyond the parsed integer.
>> *
>> * If the integer is negative, set address@hidden to 0, and return -ERANGE.
>
> This isn't the case in the newer version.
>
>> * If the integer overflows unsigned long long, set address@hidden to
>> * ULLONG_MAX, and return -ERANGE.
>> * Else, set address@hidden to the parsed integer, and return 0.
>> */
>
> Thanks for taking the time to write this. I hate having to look up
> what's the right syntax to use in docstrings, so I often just write
> plain comments before functions. :-)
I don't particularly like GLib-style doc-strings, but it's what we use
when we use anything, which is rarely.
> I have a minor problem with the documentation above: as a developer, the
> most important question I have is: what's the difference between using
> parse_uint() and using strtoull() directly? But maybe it is a good
> thing: instead of referencing the strtoull() specification (and an
> implementation detail), now we have a well-defined and well-documented
> behavior.
I understand why you're asking for the difference to stroull(). Trouble
is strtoull() is complex, and often misunderstood.
>> > + *
>> > + * The 'base' parameter has the same meaning of 'base' on strtoull().
>> > + *
>> > + * Returns 0 on success, negative errno value on error.
>> > + */
>> > +int parse_uint(const char *s, unsigned long long *value, char **endptr,
>> > + int base)
>> > +{
>> > + int r = 0;
>> > + char *endp = (char *)s;
>> > + unsigned long long val = 0;
>> > +
>> > + if (!s) {
>> > + r = -EINVAL;
>> > + goto out;
>> > + }
>> > +
>> > + /* make sure we reject negative numbers: */
>> > + while (isspace((unsigned char)*s)) {
>> > + ++s; endp++;
>>
>> Mixing pre- and post-increment like that is ugly. Recommend to stick to
>> post-increment.
>
> Agreed. I blame the copy & paste I made from opts-visitor. Later I added
> the postfix increment without noticing how ugly it looked like
>
> Anyway, this was changed in the newer patch version.
>
>>
>> I'd set endp after the loop. Right before goto.
>
> Fixed in the newer version.
>
>>
>> Or simply change the specification to set *endptr to point beyond the
>> integer instead of to the '-'. I took the liberty to do exactly that in
>> my proposed function comment.
>
> The newer version was changed to return -EINVAL and set endptr to the
> beginning of the string.
>
>>
>> > + }
>> > + if (*s == '-') {
>> > + r = -ERANGE;
>> > + goto out;
>> > + }
>>
>> This creates the special case that made me go "ugh" above. Suggest to
>> drop the if here, and check right after strtoull() instead, as shown
>> below. That way, we get the same endptr behavior for all out-of-range
>> numbers.
>
> Is returning -EINVAL acceptable for you, as well? In your proposal we
> consider "-1234" part of the language but out-of-range. Returning
> -EINVAL, we consider "-1234" not part of the language, and invalid
> input. The newer version returns -EINVAL.
I prefer -ERANGE on underflow, for symmetry with parsing *signed*
integers. A similar parsing function for signed integers would surely
assign LLONG_MIN and return -ERANGE then.
A secondary, weak argument: caller can more easily distinguish the
failure modes:
* -EINVAL: @s invalid, @base invalid (both programming errors), or @s
doesn't start with an integer. I use "integer" in the mathematical
sense here.
* -ERANGE, *value == 0: integer underflows *value
* -ERANGE, *value == ULLONG_MAX: integer overflows *value.
If we lump underflow into the -EINVAL case, you have to check *endptr ==
'-' to find it.
The argument is weak, because I don't have a caller ready that actually
wants to find it.
>> > +
>> > + errno = 0;
>> > + val = strtoull(s, &endp, base);
>>
>> if (*s = '-') {
>> r = -ERANGE;
>> val = 0;
>> goto out;
>> }
>
> This has another bug: makes endptr point to the middle of the string in
> case we are parsing " xxx" or " -1234".
It makes endptr point right behind the integer, doesn't it?
When parsing " xxx", it points to the first 'x' (we skipped the spaces
already).
When parsing " -1234", it points behind '4'.
>> > + if (errno) {
>> > + r = -errno;
>> > + goto out;
>> > + }
>> > +
>> > + if (endp == s) {
>> > + r = -EINVAL;
>> > + goto out;
>> > + }
>> > +
>> > +out:
>> > + *value = val;
>> > + *endptr = endp;
>> > + return r;
>> > +}
>> > +
>> > +/* Try to parse an unsigned integer, making sure the whole string is
>> > parsed
>> > + *
>> > + * Have the same behavior of parse_uint(), but with an additional check
>> > + * for additional data after the parsed number (in that case, the function
>> > + * will return -EINVAL).
>>
>> What's assigned to *value then?
>
> Undefined. :-)
>
>
>>
>> > + */
>>
>> Alternatively, you could make into parse_uint() do that check when
>> endptr is null, and drop this function. Matter of taste.
>
> I considered doing that, but I believe it would be too subtle.
>
>>
>> > +int parse_uint_full(const char *s, unsigned long long *value, int base)
>> > +{
>> > + char *endp;
>> > + int r;
>> > +
>> > + r = parse_uint(s, value, &endp, base);
>> > + if (r < 0) {
>> > + return r;
>> > + }
>> > + if (*endp) {
>>
>> Answering my own question: the parsed integer is assigned to *value then.
>
> I prefer to document it as undefined. If you want to use the parsed
> integer even if there's additional data after it, you should use
> parse_int() instead.
Maybe.
What I want is a proper function contract. That includes how *value is
changed. *Especially* when it's an unspecified change.
>> > + return -EINVAL;
>> > + }
>> > +
>> > + return 0;
>> > +}
>> > +
>> > int qemu_parse_fd(const char *param)
>> > {
>> > int fd;
- Re: [Qemu-devel] [PATCH 1/8 v4] cutils: unsigned int parsing functions, (continued)
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Markus Armbruster, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions, Andreas Färber, 2013/01/18
- [Qemu-devel] [PATCH 1/8 v5] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v5] cutils: unsigned int parsing functions, Eric Blake, 2013/01/18
- [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Eduardo Habkost, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Eric Blake, 2013/01/18
- Re: [Qemu-devel] [PATCH 1/8 v6] cutils: unsigned int parsing functions, Laszlo Ersek, 2013/01/23
- Re: [Qemu-devel] [PATCH 1/8] cutils: unsigned int parsing functions,
Markus Armbruster <=
Re: [Qemu-devel] [PATCH 0/8] -numa option parsing fixes (v3), Eric Blake, 2013/01/16
Re: [Qemu-devel] [PATCH for-1.4 0/8] -numa option parsing fixes (v3), Eduardo Habkost, 2013/01/28