bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 3.1.5, patch for multibyte locales


From: Aharon Robbins
Subject: Re: gawk 3.1.5, patch for multibyte locales
Date: Sat, 05 Aug 2006 22:17:48 +0300

Andy,

Here's a longer response to your note.

> Date: Mon, 24 Jul 2006 16:03:07 -0400
> From: "Andrew J. Schorr" <address@hidden>
> Subject: Re: gawk 3.1.5, patch for multibyte locales
> To: Aharon Robbins <address@hidden>
> Cc: address@hidden
>
> Hi Arnold,
>
> On Sun, Jul 23, 2006 at 10:30:56PM +0300, Aharon Robbins wrote:
> > The problem with gawk and multibyte locales has come up enough that it's
> > time to post a patch.  This is extracted from my current code base.
> > It is untested but should do the trick.
>
> Thanks for posting this patch.  I have a few questions.
>
> In node.c:mk_number, you have this patch:
>
> @@ -344,11 +341,7 @@ mk_number(AWKNUM x, unsigned int flags)
>         r->stref = 1;
>         r->stptr = NULL;
>         r->stlen = 0;
> -#if defined MBS_SUPPORT
> -       r->wstptr = NULL;
> -       r->wstlen = 0;
> -       r->flags &= ~WSTRCUR;
> -#endif /* MBS_SUPPORT */
> +       free_wsptr(r);
>  #endif /* GAWKDEBUG */
>         return r;
>  }
>
> That seems to be a typo: it should be free_wstr, not free_wsptr
> (but doesn't really matter because it's inside #ifdef GAWKDEBUG).

Yes, it's a typo. Now fixed. Thanks.

> +/* free_wstr --- release the wide string part of a node */
> +
> +void
> +free_wstr(NODE *n)
> +{
> +
> +     if ((n->flags & WSTRCUR) != 0) {
> +             assert(n->wstptr != NULL);
> +             free(n->wstptr);
> +     }
> +     n->wstptr = NULL;
> +     n->wstlen = 0;
> +     n->flags &= ~WSTRCUR;
> +}
> +
>
> I'm wondering about the safety of this function.  It seems to me
> that this implementation would be safer:
>
> void
> free_wstr(NODE *n)
> {
>
>       if ((n->flags & WSTRCUR) != 0) {
>               assert(n->wstptr != NULL);
>               free(n->wstptr);
>               n->wstptr = NULL;
>               n->wstlen = 0;
>               n->flags &= ~WSTRCUR;
>       }
> }

That's not what I want. The code as I have it guarantees that the
wstptr and wslen elements will be clear, no matter what.

> My concern here relates to whether it is safe to access
> the wstptr field of the NODE union in the event that WSTRCUR
> was not set.  As I understand it, the wstptr and wstlen
> fields of the union are not defined unless the WSTRCUR flag
> is set.

That is true. But those fields are still there, and the point
here is to make sure there's nothing left around in them. I
call free_wstr() from every place that sets the regular n->stptr
field.

> If that flag is not set, then it seems to me that
> we have no idea whether the wstptr and wstlen parts of the union
> are actually currently allocated for this purpose, unless
> we presuppose that this function is never called on a NODE
> of the incorrect type (that might use the fields for another
> purpose).

Exactly.  The function should never be called except on a node
that is being used to hold a value.

> For example, as I found in my debugging, I think
> the exec_count (sub.nodep.reflags) overlaps with wstptr
> (sub.val.wsp) on 64-bit opteron.  So it seems dangerous to
> me to define free_wstr in such a way that it depends on the
> type of the NODE argument.  But if one takes care to call it
> only from safe places, then I suppose it's fine.

That's the current strategy.

> My most serious concern about this patch is how it handles
> the setting of wstptr strings in field variables.  My patch
> to the bug I found involved this fix to node.c:unref:
>
> @@ -508,20 +500,13 @@ unref(register NODE *tmp)
>                               return;
>                       }
>                       free(tmp->stptr);
> -#if defined MBS_SUPPORT
> -                     if (tmp->wstptr != NULL) {
> -                             assert((tmp->flags & WSTRCUR) != 0);
> -                             free(tmp->wstptr);
> -                     }
> -                     tmp->flags &= ~WSTRCUR;
> -                     tmp->wstptr = NULL;
> -                     tmp->wstlen = 0;
> -#endif
> +                     free_wstr(tmp);
>               }
>               freenode(tmp);
>               return;
>       }
>       if ((tmp->flags & FIELD) != 0) {
> +             free_wstr(tmp);
>               freenode(tmp);
>               return;
>       }
>
> I see that your patch does not include the latter change in unref
> (to free the wstr nodes with the FIELD flag set).  As a result,
> I suspect that your patch may leave a memory leak in place.  But
> this is masked by the changes that you made in field.c to add
> calls to free_wstr in various places.  Note that those calls
> will never actually free anything (because the NODE flags are
> always set prior to calling free_wstr, and the WSTRCUR is never
> set, so the free_wstr function merely sets wstptr to NULL).
> But they do mask out the memory leak, since the non-NULL wstptr
> that was being passed to unref is now getting nulled out by the
> calls to free_wstr when the NODE is subsequently allocated from
> the free list (and thus the assertion is not being triggered).
> Was there a problem with the free_wstr called I added to unref?
> That seems to me to be the easiest way to fix the memory leak, although
> one could take care to do this inside field.c by adding a call to
> free_wstr prior to each call to unref...
>
> When I run valgrind on 3.1.5 plus your patch, I see this:
>
> bash-2.05b$ (echo "test"; echo "foo") | LANG=en_US.UTF-8 valgrind 
> --leak-check=full ./gawk '{ if (substr($1,1,1) == substr($0,1,1))             
>                                             
>       print "substr matches"; sub(/foo/,"bar"); print nr++}'
> ==28750== Memcheck, a memory error detector for x86-linux.
> ==28750== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== Using valgrind-2.4.1, a program supervision framework for x86-linux.
> ==28750== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== For more details, rerun with: -v
> ==28750== 
> substr matches
> 0
> substr matches
> 1
> ==28750== 
> ==28750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
> ==28750== malloc/free: in use at exit: 17782 bytes in 91 blocks.
> ==28750== malloc/free: 265 allocs, 174 frees, 61091 bytes allocated.
> ==28750== For counts of detected errors, rerun with: -v
> ==28750== searching for pointers to 91 not-freed blocks.
> ==28750== checked 111856 bytes.
> ==28750== 
> ==28750== 88 bytes in 4 blocks are definitely lost in loss record 21 of 32
> ==28750==    at 0x1B903234: malloc (vg_replace_malloc.c:130)
> ==28750==    by 0x806C369: str2wstr (node.c:710)
> ==28750==    by 0x80577BC: do_substr (builtin.c:1387)
> ==28750==    by 0x807A38C: r_tree_eval (eval.c:991)
> ==28750==    by 0x807A47D: r_tree_eval (eval.c:1232)
> ==28750==    by 0x807A98B: eval_condition (eval.c:1355)
> ==28750==    by 0x8078BF3: interpret (eval.c:482)
> ==28750==    by 0x8078BC5: interpret (eval.c:477)
> ==28750==    by 0x8078B34: interpret (eval.c:456)
> ==28750==    by 0x80655DD: do_input (io.c:457)
> ==28750==    by 0x8069C01: main (main.c:595)
> ==28750== 
> ==28750== LEAK SUMMARY:
> ==28750==    definitely lost: 88 bytes in 4 blocks.
> ==28750==      possibly lost: 0 bytes in 0 blocks.
> ==28750==    still reachable: 17694 bytes in 87 blocks.
> ==28750==         suppressed: 0 bytes in 0 blocks.
> ==28750== Reachable blocks (those to which a pointer was found) are not shown.
> ==28750== To see them, rerun with: --show-reachable=yes
>
> This seems to confirm that memory is still being leaked.
>
> Regards,
> Andy

I see your point. On my 32-bit fedora core 2 system with valgrind 2.2, I don't
see any leaks with your test and my current code base. Even after commenting
out the calls to free_wstr in field.c and in node.c, I don't see leaks.

I will try this on a 64 bit system I have access to at work and also with a
newer version of valgrind.  I think that your addition in unref makes sense
but I need to poke at things a bit more. This all still requires a bit
more thought.

Thanks,

Arnold




reply via email to

[Prev in Thread] Current Thread [Next in Thread]