bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 3.1.5, patch for multibyte locales


From: Aharon Robbins
Subject: Re: gawk 3.1.5, patch for multibyte locales
Date: Sun, 06 Aug 2006 23:05:31 +0300

I looked at this some more.  I duplicated the memory leak using valgrind
3.2.0 on a 64 bit system. Your patch to unref() is correct and my
changes in field.c aren't needed. I'm going to clean this up in
my files and also build the current valgrind on my 32 bit system.

Thanks,

Arnold

> Date: Mon, 24 Jul 2006 16:03:07 -0400
> From: "Andrew J. Schorr" <address@hidden>
> Subject: Re: gawk 3.1.5, patch for multibyte locales
> To: Aharon Robbins <address@hidden>
> Cc: address@hidden
>
> Hi Arnold,
>
> On Sun, Jul 23, 2006 at 10:30:56PM +0300, Aharon Robbins wrote:
> > The problem with gawk and multibyte locales has come up enough that it's
> > time to post a patch.  This is extracted from my current code base.
> > It is untested but should do the trick.
>
> Thanks for posting this patch.  I have a few questions.
>
> In node.c:mk_number, you have this patch:
>
> @@ -344,11 +341,7 @@ mk_number(AWKNUM x, unsigned int flags)
>         r->stref = 1;
>         r->stptr = NULL;
>         r->stlen = 0;
> -#if defined MBS_SUPPORT
> -       r->wstptr = NULL;
> -       r->wstlen = 0;
> -       r->flags &= ~WSTRCUR;
> -#endif /* MBS_SUPPORT */
> +       free_wsptr(r);
>  #endif /* GAWKDEBUG */
>         return r;
>  }
>
> That seems to be a typo: it should be free_wstr, not free_wsptr
> (but doesn't really matter because it's inside #ifdef GAWKDEBUG).
>
> +/* free_wstr --- release the wide string part of a node */
> +
> +void
> +free_wstr(NODE *n)
> +{
> +
> +     if ((n->flags & WSTRCUR) != 0) {
> +             assert(n->wstptr != NULL);
> +             free(n->wstptr);
> +     }
> +     n->wstptr = NULL;
> +     n->wstlen = 0;
> +     n->flags &= ~WSTRCUR;
> +}
> +
>
> I'm wondering about the safety of this function.  It seems to me
> that this implementation would be safer:
>
> void
> free_wstr(NODE *n)
> {
>
>       if ((n->flags & WSTRCUR) != 0) {
>               assert(n->wstptr != NULL);
>               free(n->wstptr);
>               n->wstptr = NULL;
>               n->wstlen = 0;
>               n->flags &= ~WSTRCUR;
>       }
> }
>
> My concern here relates to whether it is safe to access
> the wstptr field of the NODE union in the event that WSTRCUR
> was not set.  As I understand it, the wstptr and wstlen
> fields of the union are not defined unless the WSTRCUR flag
> is set.  If that flag is not set, then it seems to me that
> we have no idea whether the wstptr and wstlen parts of the union
> are actually currently allocated for this purpose, unless
> we presuppose that this function is never called on a NODE
> of the incorrect type (that might use the fields for another
> purpose).  For example, as I found in my debugging, I think
> the exec_count (sub.nodep.reflags) overlaps with wstptr
> (sub.val.wsp) on 64-bit opteron.  So it seems dangerous to
> me to define free_wstr in such a way that it depends on the
> type of the NODE argument.  But if one takes care to call it
> only from safe places, then I suppose it's fine.
>
> My most serious concern about this patch is how it handles
> the setting of wstptr strings in field variables.  My patch
> to the bug I found involved this fix to node.c:unref:
>
> @@ -508,20 +500,13 @@ unref(register NODE *tmp)
>                               return;
>                       }
>                       free(tmp->stptr);
> -#if defined MBS_SUPPORT
> -                     if (tmp->wstptr != NULL) {
> -                             assert((tmp->flags & WSTRCUR) != 0);
> -                             free(tmp->wstptr);
> -                     }
> -                     tmp->flags &= ~WSTRCUR;
> -                     tmp->wstptr = NULL;
> -                     tmp->wstlen = 0;
> -#endif
> +                     free_wstr(tmp);
>               }
>               freenode(tmp);
>               return;
>       }
>       if ((tmp->flags & FIELD) != 0) {
> +             free_wstr(tmp);
>               freenode(tmp);
>               return;
>       }
>
> I see that your patch does not include the latter change in unref
> (to free the wstr nodes with the FIELD flag set).  As a result,
> I suspect that your patch may leave a memory leak in place.  But
> this is masked by the changes that you made in field.c to add
> calls to free_wstr in various places.  Note that those calls
> will never actually free anything (because the NODE flags are
> always set prior to calling free_wstr, and the WSTRCUR is never
> set, so the free_wstr function merely sets wstptr to NULL).
> But they do mask out the memory leak, since the non-NULL wstptr
> that was being passed to unref is now getting nulled out by the
> calls to free_wstr when the NODE is subsequently allocated from
> the free list (and thus the assertion is not being triggered).
> Was there a problem with the free_wstr called I added to unref?
> That seems to me to be the easiest way to fix the memory leak, although
> one could take care to do this inside field.c by adding a call to
> free_wstr prior to each call to unref...
>
> When I run valgrind on 3.1.5 plus your patch, I see this:
>
> bash-2.05b$ (echo "test"; echo "foo") | LANG=en_US.UTF-8 valgrind 
> --leak-check=full ./gawk '{ if (substr($1,1,1) == substr($0,1,1))             
>                                             
>       print "substr matches"; sub(/foo/,"bar"); print nr++}'
> ==28750== Memcheck, a memory error detector for x86-linux.
> ==28750== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== Using valgrind-2.4.1, a program supervision framework for x86-linux.
> ==28750== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== For more details, rerun with: -v
> ==28750== 
> substr matches
> 0
> substr matches
> 1
> ==28750== 
> ==28750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
> ==28750== malloc/free: in use at exit: 17782 bytes in 91 blocks.
> ==28750== malloc/free: 265 allocs, 174 frees, 61091 bytes allocated.
> ==28750== For counts of detected errors, rerun with: -v
> ==28750== searching for pointers to 91 not-freed blocks.
> ==28750== checked 111856 bytes.
> ==28750== 
> ==28750== 88 bytes in 4 blocks are definitely lost in loss record 21 of 32
> ==28750==    at 0x1B903234: malloc (vg_replace_malloc.c:130)
> ==28750==    by 0x806C369: str2wstr (node.c:710)
> ==28750==    by 0x80577BC: do_substr (builtin.c:1387)
> ==28750==    by 0x807A38C: r_tree_eval (eval.c:991)
> ==28750==    by 0x807A47D: r_tree_eval (eval.c:1232)
> ==28750==    by 0x807A98B: eval_condition (eval.c:1355)
> ==28750==    by 0x8078BF3: interpret (eval.c:482)
> ==28750==    by 0x8078BC5: interpret (eval.c:477)
> ==28750==    by 0x8078B34: interpret (eval.c:456)
> ==28750==    by 0x80655DD: do_input (io.c:457)
> ==28750==    by 0x8069C01: main (main.c:595)
> ==28750== 
> ==28750== LEAK SUMMARY:
> ==28750==    definitely lost: 88 bytes in 4 blocks.
> ==28750==      possibly lost: 0 bytes in 0 blocks.
> ==28750==    still reachable: 17694 bytes in 87 blocks.
> ==28750==         suppressed: 0 bytes in 0 blocks.
> ==28750== Reachable blocks (those to which a pointer was found) are not shown.
> ==28750== To see them, rerun with: --show-reachable=yes
>
> This seems to confirm that memory is still being leaked.
>
> Regards,
> Andy




reply via email to

[Prev in Thread] Current Thread [Next in Thread]