Re: [Regexp] parse out the tag's attribute and text fields from a xml no

gnu-regexp-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Regexp] parse out the tag's attribute and text fields from a xml no

From:	Wes Biggs
Subject:	Re: [Regexp] parse out the tag's attribute and text fields from a xml node
Date:	Tue, 11 Jun 2002 14:50:31 -0700 (PDT)

David,

First a caveat -- regular expressions may not be the
right tool for this job.  XML parsers are much more
finely tuned for doing exactly what you want to do. 
Check out the xerces package from xml.apache.org, or
for an even nicer API, the JDOM project at
www.jdom.org.

But if you insist on using gnu.regexp to accomplish
this task, you'll need to use the "back-reference"
notation, \N where N={0,9} to signify "the text that
matched subexpression N".  So your expression is not
far off.

^\<(\w+)\>([^<]+)\<\/\1\>$

That should work (with double escapes if you're coding
it up).  It won't, however, catch nested XML tags very
effectively as it's very hard to do indefinite
recursion in a regular expression.  For that you're
better off with a full-blown XML parser.

Wes

--- "David L. Parker" <address@hidden>
wrote:
> I'm trying to write a generic regexp that will save
> the node name in one 
> variable and the text of the node in another
> variable?
> 
> XML node:
> <address> 1210 West Dayton Street</address>
> 
> /^\<(\w+)\>([^<]+)\<\/\w+\>$/
> 
> David 

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

[Prev in Thread]

Current Thread

[Next in Thread]

[Regexp] parse out the tag's attribute and text fields from a xml node, David L. Parker, 2002/06/11
- Re: [Regexp] parse out the tag's attribute and text fields from a xml node, Wes Biggs <=

Prev by Date: [Regexp] parse out the tag's attribute and text fields from a xml node
Next by Date: [Regexp] ��!
Previous by thread: [Regexp] parse out the tag's attribute and text fields from a xml node
Next by thread: [Regexp] ��!
Index(es):
- Date
- Thread