[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gawk 3.1.5, patch for multibyte locales
From: |
Aharon Robbins |
Subject: |
Re: gawk 3.1.5, patch for multibyte locales |
Date: |
Sat, 05 Aug 2006 22:17:48 +0300 |
Andy,
Here's a longer response to your note.
> Date: Mon, 24 Jul 2006 16:03:07 -0400
> From: "Andrew J. Schorr" <address@hidden>
> Subject: Re: gawk 3.1.5, patch for multibyte locales
> To: Aharon Robbins <address@hidden>
> Cc: address@hidden
>
> Hi Arnold,
>
> On Sun, Jul 23, 2006 at 10:30:56PM +0300, Aharon Robbins wrote:
> > The problem with gawk and multibyte locales has come up enough that it's
> > time to post a patch. This is extracted from my current code base.
> > It is untested but should do the trick.
>
> Thanks for posting this patch. I have a few questions.
>
> In node.c:mk_number, you have this patch:
>
> @@ -344,11 +341,7 @@ mk_number(AWKNUM x, unsigned int flags)
> r->stref = 1;
> r->stptr = NULL;
> r->stlen = 0;
> -#if defined MBS_SUPPORT
> - r->wstptr = NULL;
> - r->wstlen = 0;
> - r->flags &= ~WSTRCUR;
> -#endif /* MBS_SUPPORT */
> + free_wsptr(r);
> #endif /* GAWKDEBUG */
> return r;
> }
>
> That seems to be a typo: it should be free_wstr, not free_wsptr
> (but doesn't really matter because it's inside #ifdef GAWKDEBUG).
Yes, it's a typo. Now fixed. Thanks.
> +/* free_wstr --- release the wide string part of a node */
> +
> +void
> +free_wstr(NODE *n)
> +{
> +
> + if ((n->flags & WSTRCUR) != 0) {
> + assert(n->wstptr != NULL);
> + free(n->wstptr);
> + }
> + n->wstptr = NULL;
> + n->wstlen = 0;
> + n->flags &= ~WSTRCUR;
> +}
> +
>
> I'm wondering about the safety of this function. It seems to me
> that this implementation would be safer:
>
> void
> free_wstr(NODE *n)
> {
>
> if ((n->flags & WSTRCUR) != 0) {
> assert(n->wstptr != NULL);
> free(n->wstptr);
> n->wstptr = NULL;
> n->wstlen = 0;
> n->flags &= ~WSTRCUR;
> }
> }
That's not what I want. The code as I have it guarantees that the
wstptr and wslen elements will be clear, no matter what.
> My concern here relates to whether it is safe to access
> the wstptr field of the NODE union in the event that WSTRCUR
> was not set. As I understand it, the wstptr and wstlen
> fields of the union are not defined unless the WSTRCUR flag
> is set.
That is true. But those fields are still there, and the point
here is to make sure there's nothing left around in them. I
call free_wstr() from every place that sets the regular n->stptr
field.
> If that flag is not set, then it seems to me that
> we have no idea whether the wstptr and wstlen parts of the union
> are actually currently allocated for this purpose, unless
> we presuppose that this function is never called on a NODE
> of the incorrect type (that might use the fields for another
> purpose).
Exactly. The function should never be called except on a node
that is being used to hold a value.
> For example, as I found in my debugging, I think
> the exec_count (sub.nodep.reflags) overlaps with wstptr
> (sub.val.wsp) on 64-bit opteron. So it seems dangerous to
> me to define free_wstr in such a way that it depends on the
> type of the NODE argument. But if one takes care to call it
> only from safe places, then I suppose it's fine.
That's the current strategy.
> My most serious concern about this patch is how it handles
> the setting of wstptr strings in field variables. My patch
> to the bug I found involved this fix to node.c:unref:
>
> @@ -508,20 +500,13 @@ unref(register NODE *tmp)
> return;
> }
> free(tmp->stptr);
> -#if defined MBS_SUPPORT
> - if (tmp->wstptr != NULL) {
> - assert((tmp->flags & WSTRCUR) != 0);
> - free(tmp->wstptr);
> - }
> - tmp->flags &= ~WSTRCUR;
> - tmp->wstptr = NULL;
> - tmp->wstlen = 0;
> -#endif
> + free_wstr(tmp);
> }
> freenode(tmp);
> return;
> }
> if ((tmp->flags & FIELD) != 0) {
> + free_wstr(tmp);
> freenode(tmp);
> return;
> }
>
> I see that your patch does not include the latter change in unref
> (to free the wstr nodes with the FIELD flag set). As a result,
> I suspect that your patch may leave a memory leak in place. But
> this is masked by the changes that you made in field.c to add
> calls to free_wstr in various places. Note that those calls
> will never actually free anything (because the NODE flags are
> always set prior to calling free_wstr, and the WSTRCUR is never
> set, so the free_wstr function merely sets wstptr to NULL).
> But they do mask out the memory leak, since the non-NULL wstptr
> that was being passed to unref is now getting nulled out by the
> calls to free_wstr when the NODE is subsequently allocated from
> the free list (and thus the assertion is not being triggered).
> Was there a problem with the free_wstr called I added to unref?
> That seems to me to be the easiest way to fix the memory leak, although
> one could take care to do this inside field.c by adding a call to
> free_wstr prior to each call to unref...
>
> When I run valgrind on 3.1.5 plus your patch, I see this:
>
> bash-2.05b$ (echo "test"; echo "foo") | LANG=en_US.UTF-8 valgrind
> --leak-check=full ./gawk '{ if (substr($1,1,1) == substr($0,1,1))
>
> print "substr matches"; sub(/foo/,"bar"); print nr++}'
> ==28750== Memcheck, a memory error detector for x86-linux.
> ==28750== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== Using valgrind-2.4.1, a program supervision framework for x86-linux.
> ==28750== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
> ==28750== For more details, rerun with: -v
> ==28750==
> substr matches
> 0
> substr matches
> 1
> ==28750==
> ==28750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 15 from 1)
> ==28750== malloc/free: in use at exit: 17782 bytes in 91 blocks.
> ==28750== malloc/free: 265 allocs, 174 frees, 61091 bytes allocated.
> ==28750== For counts of detected errors, rerun with: -v
> ==28750== searching for pointers to 91 not-freed blocks.
> ==28750== checked 111856 bytes.
> ==28750==
> ==28750== 88 bytes in 4 blocks are definitely lost in loss record 21 of 32
> ==28750== at 0x1B903234: malloc (vg_replace_malloc.c:130)
> ==28750== by 0x806C369: str2wstr (node.c:710)
> ==28750== by 0x80577BC: do_substr (builtin.c:1387)
> ==28750== by 0x807A38C: r_tree_eval (eval.c:991)
> ==28750== by 0x807A47D: r_tree_eval (eval.c:1232)
> ==28750== by 0x807A98B: eval_condition (eval.c:1355)
> ==28750== by 0x8078BF3: interpret (eval.c:482)
> ==28750== by 0x8078BC5: interpret (eval.c:477)
> ==28750== by 0x8078B34: interpret (eval.c:456)
> ==28750== by 0x80655DD: do_input (io.c:457)
> ==28750== by 0x8069C01: main (main.c:595)
> ==28750==
> ==28750== LEAK SUMMARY:
> ==28750== definitely lost: 88 bytes in 4 blocks.
> ==28750== possibly lost: 0 bytes in 0 blocks.
> ==28750== still reachable: 17694 bytes in 87 blocks.
> ==28750== suppressed: 0 bytes in 0 blocks.
> ==28750== Reachable blocks (those to which a pointer was found) are not shown.
> ==28750== To see them, rerun with: --show-reachable=yes
>
> This seems to confirm that memory is still being leaked.
>
> Regards,
> Andy
I see your point. On my 32-bit fedora core 2 system with valgrind 2.2, I don't
see any leaks with your test and my current code base. Even after commenting
out the calls to free_wstr in field.c and in node.c, I don't see leaks.
I will try this on a 64 bit system I have access to at work and also with a
newer version of valgrind. I think that your addition in unref makes sense
but I need to poke at things a bit more. This all still requires a bit
more thought.
Thanks,
Arnold
- Re: gawk 3.1.5, patch for multibyte locales,
Aharon Robbins <=