bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] fix for non-MBS locales


From: Paolo Bonzini
Subject: Re: [PATCH] fix for non-MBS locales
Date: Tue, 20 Apr 2010 15:15:37 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100301 Fedora/3.0.3-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.3

On 04/20/2010 02:39 PM, Jim Meyering wrote:
Aharon Robbins wrote:
> From 4a422ff749cb4bbed270776038adb9a5977f47b6 Mon Sep 17 00:00:00 2001
From: Arnold D. Robbins<address@hidden>
Date: Tue, 20 Apr 2010 13:45:53 +0300
Subject: [PATCH] Fix add_utf8_anychar to work in non-MBS situation

dfa.c (add_utf8_anychar): Bracket body in `#if MBS_SUPPORT'.
Reported by Anders Wallin, fixed by Arnold Robbins.
---
  src/dfa.c |    2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 7d39e5c..7ed3f2c 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1474,6 +1474,7 @@ addtok_wc (wint_t wc)
  static void
  add_utf8_anychar (void)
  {
+#if MBS_SUPPORT
    static const charclass utf8_classes[5] = {
        {  0,  0,  0,  0, ~0, ~0, 0, 0 },            /* 80-bf: non-lead bytes */
        { ~0, ~0, ~0, ~0, 0, 0, 0, 0 },              /* 00-7f: 1-byte sequence 
*/
@@ -1507,6 +1508,7 @@ add_utf8_anychar (void)
        addtok (CAT);
        addtok (OR);
      }
+#endif

Thanks for the patch.
However, I am reluctant to keep adding #if MBS_SUPPORT directives
and would prefer to remove the conditional that guards the
struct "dfa" member declarations.

What is the use case for compiling without MBS_SUPPORT?
The vast majority of people using reasonable portability targets
will have MBS_SUPPORT defined to 1.  It's defined like this:

   #if defined HAVE_WCSCOLL&&  defined HAVE_ISWCTYPE
   # define MBS_SUPPORT 1
   #else
   # define MBS_SUPPORT 0
   #endif

and gnulib documents that wcscoll and iswctype are missing only on Solaris
2.5.1 and Irix 5.3, and that on AIX and windows wchar_t is a 16-bit type,
and thus insufficient to represent all Unicode characters.

The point is that there is no harm in defining extra members
even on those fringe platforms, and doing that lets us avoid
the harm to readability of in-function #ifdefs.
Here's an alternate patch:

[BTW, along these lines, I have been sitting on a 30-cset series
that converts most of the remaining "#if MBS_SUPPORT" into regular
C (not cpp) tests of the MBS_SUPPORT symbol.  Maybe this will
provide the motivation to rebase and post it. ]

 From 231cbac8e386f2e3ab4e1e01afb66b8cc7d85bd9 Mon Sep 17 00:00:00 2001
From: Jim Meyering<address@hidden>
Date: Tue, 20 Apr 2010 14:31:45 +0200
Subject: [PATCH] dfa: don't #ifdef-out member declarations

* src/dfa.c (struct dfa): Remove "#if MBS_SUPPORT" guard that made
several member declarations conditional on this cpp definition.
(token): Likewise.
Reported by Anders Wallin.
---
  src/dfa.c |    4 ----
  1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 7d39e5c..c3165bd 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -212,7 +212,6 @@ typedef enum

    RPAREN,                     /* RPAREN never appears in the parse tree. */

-#if MBS_SUPPORT
    ANYCHAR,                     /* ANYCHAR is a terminal symbol that matches
                                    any multibyte (or single byte) characters.
                                    It is used only if MB_CUR_MAX>  1.  */
@@ -222,7 +221,6 @@ typedef enum

    WCHAR,                      /* Only returned by lex.  wctok contains
                                     the wide character representation.  */
-#endif /* MBS_SUPPORT */

    CSET                                /* CSET and (and any value greater) is a
                                     terminal symbol that matches any of a
@@ -306,7 +304,6 @@ struct dfa
    int nleaves;                        /* Number of leaves on the parse tree. 
*/
    int nregexps;                       /* Count of parallel regexps being built
                                     with dfaparse(). */
-#if MBS_SUPPORT
    unsigned int mb_cur_max;    /* Cached value of MB_CUR_MAX.  */
    int utf8_anychar_classes[5];        /* To lower ANYCHAR in UTF-8 locales.  
*/

@@ -336,7 +333,6 @@ struct dfa
    struct mb_char_classes *mbcsets;
    int nmbcsets;
    int mbcsets_alloc;
-#endif

    /* Fields filled by the state builder. */
    dfa_state *states;          /* States of the dfa. */
--
1.7.1.rc1.248.gcefbb

ACK.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]