gnu-regexp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Regexp] Re: GNU regexp for Java potential bug


From: Wes Biggs
Subject: [Regexp] Re: GNU regexp for Java potential bug
Date: Wed, 10 Sep 2003 23:17:11 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5b) Gecko/20030827

Hi Marc, it does look like a possible bug. I'm copying to the user's list (maybe someone else has seen it) and I'll look into it.

The package is still maintained, though the "actively" bit is arguable. :-)

Wes

Marc Fraioli wrote:


Hi Wes,

I'm not sure if your java regexp package is still actively maintained, but I thought it couldn't hurt to ask. I'm seeing a problem with an expression that matches multiple lines, where the package loses the first character of the first match (if there are multiple matches, matches other than the first are fine, but the first one loses the initial character). I'm creating the RE object with two options-- RE.REG_DOT_NEWLINE and RE.REG_MULTILINE. The expression is:

([0-9]*.[0-9]*.[0-9]*) *([0-9]*:[0-9]*:.*?)$.*?^AMQ([0-9]*): *(.*?)^$

and some sample of the input I'm searching is:

09/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------


I slightly modified the RETest.java that comes with the package to use RE.REG_DOT_NEWLINE and RE.REG_MULTILINE as I am doing, and used it to test this, and I get output like this:

gnu.regexp version 1.1.4-dev
        Input Text: 09/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------

Regular Expression: ([0-9]*.[0-9]*.[0-9]*) *([0-9]*:[0-9]*:.*?)$.*?^AMQ([0-9]*): *(.*?)^$ Compiled Form: (?:((?:(?:0-9))*.(?:(?:0-9))*.(?:(?:0-9))*)(?: )*((?:(?:0-9))*:(?:(?:0-9))*:(?:.)*?)$(?:.)*?^AMQ((?:(?:0-9))*):(?: )*((?:.)*?)^$)
    Minimum Length: 8
 isMatch() returns: false
   getAllMatches(): 1 matches
Match 0 (1,211): 9/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------

Match found from position 1 to position 211
Match was: '9/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
'
Subexpression #1: from position 1 to position 8
The subexpression matched this text: '9/10/03'
Subexpression #2: from position 10 to position 18
The subexpression matched this text: '09:47:35'
Subexpression #3: from position 22 to position 26
The subexpression matched this text: '9411'
Subexpression #4: from position 28 to position 211
The subexpression matched this text: 'Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
'
substitute(): 0<!--9/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
-->
substituteAll(): 0<!--9/10/03  09:47:35
AMQ9411: Repository manager ended normally.

EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
-->

You may notice that the date in the input text starts with '09', but what the library returns is just '9' (see subexpression #1). Have you by any chance run across this before?

        Thanks for any info,

        Marc







reply via email to

[Prev in Thread] Current Thread [Next in Thread]