Regular Expressions

J

John Smith

I want to remove the first occurance of an xml tag from a string e.g. from
"<x><y>...</y><y>...</y><x>" to "<x><y>...</y><x>" the problem I have is
remove first with "<y[ >].*</y>" goes for the right most match and gives
"<x><x>" how can I make it take the left most match.

Thanks

Jon
 
S

shakah

John said:
I want to remove the first occurance of an xml tag from a string e.g. from
"<x><y>...</y><y>...</y><x>" to "<x><y>...</y><x>" the problem I have is
remove first with "<y[ >].*</y>" goes for the right most match and gives
"<x><x>" how can I make it take the left most match.

Thanks

Jon

How about "<y[^ <]*</y>" ?
 
H

HK

John said:
I want to remove the first occurance of an xml tag from a string e.g. from
"<x><y>...</y><y>...</y><x>" to "<x><y>...</y><x>" the problem I have is
remove first with "<y[ >].*</y>" goes for the right most match and gives
"<x><x>" how can I make it take the left most match.

You could try non-greedy operators, like "<y>.*?</y>". If
you feel adventurous, you could try my package monq.jfa at

http://www.ebi.ac.uk/Rebholz-srv/whatizit/software

which provides the "shortest-match" operator as well
as a description of the difference between shortest-match
and non-greedy matching. You would write

"<y>(.*</y>)!"

and the exclamation mark (shortest-match) works like
a "jump-to-first-occurence".

Harald.
 
L

Lasse Reichstein Nielsen

John Smith said:
I want to remove the first occurance of an xml tag from a string e.g. from
"<x><y>...</y><y>...</y><x>" to "<x><y>...</y><x>" the problem I have is
remove first with "<y[ >].*</y>" goes for the right most match and gives
"<x><x>" how can I make it take the left most match.

You can't, in full generality, if I understand what you want correctly.

You don't want to replace the *tag*, but the *element* (everything
from start tag to matching end tag).

A question you should answer is what the first x element is in.

<x>AB<x>CD</x>EF</x>

The two nested matching pairs of x-start/end tags can both be seen as
the "first". One starts first, the other ends first.

In either case, finding matched pairs in a string where pairs can be
(arbitrarily) nested, is beyond the capability of regular expressions
(and into the realm of context free grammars).

If you *know* that the element you look for can't be nested, i.e., a
not as general problem, you can use non-greedy matching:

"<tagname\\b.*?</tagname>"

This matches from "<tagname" to the first followin "</tagname>" (which
would be entirely wrong for the nested example above :).

Good luck
/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top