brackets content regular expression

netimen · Oct 31, 2008

I have a text containing brackets (or what is the correct term for
'>'?). I'd like to match text in the uppermost level of brackets.

So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
bbb < a <tt > ff > > 2 )?

P.S. sorry for my english.

Paul McGuire · Oct 31, 2008

I have a text containing brackets (or what is the correct term for
'>'?). I'd like to match text in the uppermost level of brackets.

So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
bbb < a <tt > ff > > 2 )?

P.S. sorry for my english.

To match opening and closing parens, delimiters, whatever (I refer to
these '<>' as "angle brackets" when talking about them in this
context, otherwise they are just "less than" and "greater than"), you
will need some kind of stack-based parser. You can write your own
without much trouble - there are built-ins in pyparsing that do most
of the work.

Here is the nestedExpr method:[[['1', 'aaa', ['t', 'bbb', ['a', ['tt'], 'ff']], '2']]]

Note that the results show not the original nested text, but the
parsed words in a fully nested structure.

If all you want is the highest-level text, then you can wrap your
nestedExpr parser inside a call to originalTextFor:
[['< 1 aaa < t bbb < a <tt > ff > > 2 >']]

More on pyparsing at http://pyparsing.wikispaces.com.

-- Paul

Matimus · Oct 31, 2008

I have a text containing brackets (or what is the correct term for
'>'?). I'd like to match text in the uppermost level of brackets.

So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
bbb < a <tt > ff > > 2 )?

P.S. sorry for my english.

I think most people call them "angle brackets". Anyway it should be
easy to just match the outer most brackets:
' 1 aaa < t bbb < a <tt > ff > > 2 '

In this case the regular expression is automatically greedy, matching
the largest area possible. Note however that it won't work if you have
something like this: "<first> <second>".

Matt

netimen · Oct 31, 2008

Thank's but if i have several top-level groups and want them match one
by one:

text = "a d > here starts a new group: < e < f > g >"

I want to match first " b < Ó > d " and then " e < f > g " but not "
b < Ó > d > here starts a new group: < e < f > g "

netimen · Oct 31, 2008

there may be different levels of nesting:

"a d > here starts a new group: < 1 < e < f > g > 2 >
another group: < 3 >"

bearophileHUGS · Oct 31, 2008

netimen:

Thank's but if i have several top-level groups and want them match one
by one:
text = "a d > here starts a new group: < e < f > g >"

What other requirements do you have? If you list them all at once
people will write you the code faster.

bye,
Bearophile

Pierre Quentel · Oct 31, 2008

there may be different levels of nesting:

"a d > here starts a new group: < 1 < e < f Â > g > 2 >
another group: < 3 >"

Hi,

Regular expressions or pyparsing might be overkill for this problem ;
you can use a simple algorithm to read each character, increment a
counter when you find a < and decrement when you find a > ; when the
counter goes back to its initial value you have the end of a top level
group

Something like :

def top_level(txt):
level = 0
start = None
groups = []
for i,car in enumerate(txt):
if car == "<":
level += 1
if not start:
start = i
elif car == ">":
level -= 1
if start and level == 0:
groups.append(txt[start+1:i])
start = None
return groups

print top_level("a said:
[' b < 0 > d ', ' 1 < e < f > g > 2 ', ' 3 ']

Click to expand...

Best,
Pierre

Matimus · Oct 31, 2008

Thank's but if i have several top-level groups and want them match one
by one:

text = "a d > here starts a new group: < e < f > g >"

I want to match first " b < Ó > d " and then " e < f > g " but not "
b < Ó > d > here starts a new group: < e < f > g "

As far as I know, you can't do that with a regular expressions (by
definition regular expressions aren't recursive). You can use a
regular expression to aid you, but there is no magic expression that
will give it to you for free.

In this case it is actually pretty easy to do it without regular
expressions at all:
.... stack = []
.... for i, c in enumerate(text):
.... if c == '<':
.... stack.append(i)
.... elif c == '>':
.... start = stack.pop() + 1
.... if len(stack) == depth:
.... yield text[start:i]
........ print seg
....
b < O > d
e < f > g

Matt

netimen · Nov 1, 2008

Yeah, I know it's quite simple to do manually. I was just interested
if it could be done by regular expressions. Thank you anyway.

Thank's but if i have several top-level groups and want them match one
by one:

Click to expand...

text = "a d > here starts a new group: Â < e < f Â > g >"

Click to expand...

I want to match first " b < Ã“ > d " and then " e < f Â > g " but not "
b < Ã“ > d > here starts a new group: Â < e < f Â > g "

Click to expand...

As far as I know, you can't do that with a regular expressions (by
definition regular expressions aren't recursive). You can use a
regular expression to aid you, but there is no magic expression that
will give it to you for free.

In this case it is actually pretty easy to do it without regular
expressions at all:

... Â Â stack = []
... Â Â for i, c in enumerate(text):
... Â Â Â Â if c == '<':
... Â Â Â Â Â Â stack.append(i)
... Â Â Â Â elif c == '>':
... Â Â Â Â Â Â start = stack.pop() + 1
... Â Â Â Â Â Â if len(stack) == depth:
... Â Â Â Â Â Â Â Â yield text[start:i]
...>>> for seg in get_nested_strings(text):

... Â print seg
...
Â b < O > d
Â e < f Â > g

Matt

Regular expression	0	Jul 21, 2009
Recursion regular expression (xtended)	1	Aug 16, 2010
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
problem with trivial regular expression	9	Dec 14, 2009
regular expression extracting groups	3	Aug 10, 2008
Help with regular expression patterns	0	Nov 28, 2008
Java Regular Expression (java.util.regex ): Multiple Occurences, always guaranteed that it takes the	2	May 25, 2007
Looking for help with Regular Expression	3	May 24, 2006

brackets content regular expression

netimen

Paul McGuire

Matimus

netimen

netimen

bearophileHUGS

Pierre Quentel

Matimus

netimen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads