[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sizeof in Macros. (Was: Specifying dependencies more clearly)

From: Alejandro Colomar
Subject: Re: sizeof in Macros. (Was: Specifying dependencies more clearly)
Date: Tue, 15 Nov 2022 13:56:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

Hi Ralph,

As you noticed, if you don't address me in To or Cc, I may not notice the email. I try to, but sometimes I miss it. Sorry.

On 11/10/22 10:31, Ralph Corderoy wrote:
Howdy Alejandro,

Okay, here we go for a rant.

Consider the cost of lost opportunities.

Since I wrote the code from memory, I had a few typos, but the idea
was there...

      typedef struct {
          size_t  length;
          u_char  *start;
      } str_t;

      #define length(s)   (sizeof(s) - 1)

      #define str_set(str, text)  do       \
      {                                    \
          (str)->length = length(test);    \
          (str)->start = (u_char *) text;  \
      } while (0)

And s/test/text/.

The lack of parenthesis around ‘text’ in the assignment to ‘start’ looks

Yeah, I had a few typos. That's common in me when I write from memory, and don't have a compiler to remind me of stupid bugs. :)

Of course, cowboy programmers don't need terminating NUL bytes

The Unix filesystem didn't terminate full-length filenames in the
sixteen-byte directory entries; two for the inode number, fourteen bytes
for the filename.

utmp(5) still has a lot of entries that don't have a NUL terminator, which is the reason I wrote this:


Not using terminating NULs allows substrings to be sliced from the
original without copying the bytes.

Certainly, it has it's benefits, but it better be accompanied of some way to get the compiler on your side and warn you before you write some important bugs.

And the use of that macro is typically for things like:

         str_t  str;
         str_set(&str, "foo");

And then some new programmer in the team writes a line of code
that's something like:

         str_set(&str, cond ? "someword" : "another");

Then testing reveals some issue if cond is true.  You see only
"somewor"; hmm.
I quickly realize that it is due to the ternary operator decaying
the array into a pointer and sizeof later doing shit.
My patch just changes
      #define length(s)   (sizeof(s) - 1)
      #define length(s)   (nitems(s) - 1)

(nitems() is defined to be the obvious sizeof division (called
ARRAY_SIZE(9) in the Linux kernel)

The bug is the original macro str_set() doesn't document its ‘text’
parameter must be a string constant.

I disagree. Comments tend to be obsolete from the moment they are written. My code usually has quite good documentation in the commit message that adds the macro[1], and if I need to fix the macro, I take some time to make sure the commit message also documents it again completely if the changes are big enough that it's necessary to refresh it, and then the code goes plain, with no comments, but with plenty of features that static analyzers and compilers can use to warn me.

 Macros I have to hand are more
explicit: ‘string constant s’.

     /* DIM gives the number of elements in the one-dimensional array a. */

That is obvious, I'd rather remove the comment.

     #define DIM(a) (sizeof (a) / sizeof (*(a)))

     /* LEN gives the strlen() of string constant s, excluding the
      * terminating NUL. */

That is obvious, and moreover, the compiler can't enforce it. Rather, I suggest the following.

     #define LEN(s) (sizeof (s) - 1)

[1]: This is written from today, since nitems() wasn't readable enough in that context, it seems.

 *      size_t sizeof_array(array);
 *      array   Array to be sized.
 *      Measure the size of an array --equivalent to sizeof(array)--.
 *      This macro has the benefit that it triggers the warning
 *      '-Wsizeof-pointer-div' when the argument is a pointer instead
 *      of an array, which can cause subtle bugs, sometimes impossible
 *      to detect through tests.
 *      int     *p;
 *      int     a[1];
 *      size_t  sp, sa;
 *      sp = sizeof_array(p);  // Warning: -Wsizeof-pointer-div
 *      sa = sizeof_array(a);  // OK
* <>

The above comment is really in the commit message, not in the code.

#define nitems(a)        (sizeof (a) / sizeof (*(a)))
#define sizeof_array(a)  (sizeof(a) + (0 * nxt_nitems(a)))

And then, the change for the infamous length() macro:

-#define length(s)  (sizeof(s) - 1)
+#define length(s)  (nxt_sizeof_array(s) - 1)

By passing a char pointer on what's presumably a 64-bit pointer machine,
P64, sizeof gives the pointer's eight bytes, one is knocked off as if
there were a terminating NUL, and the first seven bytes of "someword"

Your patch switches to nitems() and your description makes me think it's
like DIM() above.  That takes the eight bytes of the pointer given by
the first sizeof and divides it by one, doing nothing, as the second
sizeof gives the size of a u_char.  So we still arrive at eight, knock
one off for the NUL as you write ‘nitems(s) - 1’, giving seven.  So I
don't see why this wouldn't also give "somewor" instead of "someword".
How did it fix the problem?


It would have been caught much earlier. :)

Unless the value needs to be known at compile time, using strlen(3)
would work in all cases, save much human time at the cost of a little
accumulated machine time, and in the simple common case of a string
constant it will probably be evaluated by any optimising compiler into a

I agree. I bet this code has its origins when compilers weren't that good at optimizing. Now that compilers also warn on the sizeof division, the sizeof macro can be good for guaranteeing that it's a constexpr.



     typedef struct {
         size_t length;
         u_char *start;
     } str_t;

     #define str_set(str, text) \
         do { \
             (str)->length = strlen(text); \
             (str)->start = (u_char *)(text); \
         } while (0)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]