Hi Wes,
I'm not sure if your java regexp package is still actively maintained,
but I thought it couldn't hurt to ask. I'm seeing a problem with an
expression that matches multiple lines, where the package loses the
first character of the first match (if there are multiple matches,
matches other than the first are fine, but the first one loses the
initial character). I'm creating the RE object with two options--
RE.REG_DOT_NEWLINE and RE.REG_MULTILINE. The expression is:
([0-9]*.[0-9]*.[0-9]*) *([0-9]*:[0-9]*:.*?)$.*?^AMQ([0-9]*): *(.*?)^$
and some sample of the input I'm searching is:
09/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
I slightly modified the RETest.java that comes with the package to use
RE.REG_DOT_NEWLINE and RE.REG_MULTILINE as I am doing, and used it to
test this, and I get output like this:
gnu.regexp version 1.1.4-dev
Input Text: 09/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
Regular Expression: ([0-9]*.[0-9]*.[0-9]*)
*([0-9]*:[0-9]*:.*?)$.*?^AMQ([0-9]*): *(.*?)^$
Compiled Form: (?:((?:(?:0-9))*.(?:(?:0-9))*.(?:(?:0-9))*)(?:
)*((?:(?:0-9))*:(?:(?:0-9))*:(?:.)*?)$(?:.)*?^AMQ((?:(?:0-9))*):(?:
)*((?:.)*?)^$)
Minimum Length: 8
isMatch() returns: false
getAllMatches(): 1 matches
Match 0 (1,211): 9/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
Match found from position 1 to position 211
Match was: '9/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
'
Subexpression #1: from position 1 to position 8
The subexpression matched this text: '9/10/03'
Subexpression #2: from position 10 to position 18
The subexpression matched this text: '09:47:35'
Subexpression #3: from position 22 to position 26
The subexpression matched this text: '9411'
Subexpression #4: from position 28 to position 211
The subexpression matched this text: 'Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
'
substitute(): 0<!--9/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
-->
substituteAll(): 0<!--9/10/03 09:47:35
AMQ9411: Repository manager ended normally.
EXPLANATION:
The repository manager ended normally.
ACTION:
None.
-------------------------------------------------------------------------------
-->
You may notice that the date in the input text starts with '09', but
what the library returns is just '9' (see subexpression #1). Have you
by any chance run across this before?
Thanks for any info,
Marc