regular expression

gardsted · Nov 18, 2007

I just can't seem to get it:
I was having some trouble with finding the first <REAPER_PROJECT in the following with this regex:

Should these two approaches behave similarly?
I used hours before I found the second one,
but then again, I'm not so smart...:

kind retards
jorgen / de mente
using python 2.5.1
-------------------------------------------
import re

TESTTXT="""<REAPER_PROJECT 0.1
<METRONOME 6 2.000000
SAMPLES "" ""

>

<TRACK
MAINSEND 1
<VOLENV2
ACT 1

>

<PANENV2
ACT 1

>
>
>

"""
print "The First approach - flags in finditer"
rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT,re.MULTILINE):
print i,i.groups()

print "The Second approach - flags in pattern "
rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT):
print i,i.groups()

Diez B. Roggisch · Nov 18, 2007

gardsted said:
I just can't seem to get it:
I was having some trouble with finding the first <REAPER_PROJECT in the
following with this regex:

Should these two approaches behave similarly?
I used hours before I found the second one,
but then again, I'm not so smart...:

kind retards
jorgen / de mente
using python 2.5.1
-------------------------------------------
import re

TESTTXT="""<REAPER_PROJECT 0.1
<METRONOME 6 2.000000
SAMPLES "" ""<TRACK
MAINSEND 1
<VOLENV2
ACT 1<PANENV2
ACT 1"""
print "The First approach - flags in finditer"
rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT,re.MULTILINE):
print i,i.groups()

print "The Second approach - flags in pattern "
rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT):
print i,i.groups()

What the heck is that format? XML's retarded cousin living in the attic?

Ok, back to the problem then...

This works for me:

rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]+)',re.MULTILINE)
for i in rex.finditer(TESTTXT):
print i,i.groups()

However, you might think of getting rid of the ^ beceause otherwise you
_only_ get the first tag beginning at a line. And making the * a + in
the TAGNAME might also be better.

Diez

gardsted · Nov 18, 2007

Ups - got it - there are no flags in finditer;-)
So rtfm, once again, jorgen!

I just can't seem to get it:
I was having some trouble with finding the first <REAPER_PROJECT in the
following with this regex:

Should these two approaches behave similarly?
I used hours before I found the second one,
but then again, I'm not so smart...:

kind retards
jorgen / de mente
using python 2.5.1
-------------------------------------------
import re

TESTTXT="""<REAPER_PROJECT 0.1
<METRONOME 6 2.000000
SAMPLES "" ""<TRACK
MAINSEND 1
<VOLENV2
ACT 1<PANENV2
ACT 1"""
print "The First approach - flags in finditer"
rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT,re.MULTILINE):
print i,i.groups()

print "The Second approach - flags in pattern "
rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
for i in rex.finditer(TESTTXT):
print i,i.groups()

MonkeeSage · Nov 19, 2007

What the heck is that format? XML's retarded cousin living in the attic?

ROFL...for some reason that makes me think of wierd Ed Edison from
maniac mansion, heh

gardsted · Nov 19, 2007

The retarded cousin - that's me!

I keep getting confused by the caret - sometimes it works - sometimes it's better with backslash-n
Yes - retarded cousin, I guess.

The file format is a config-track for a multitrack recording software, which i need to automate a bit.
I can start it from the command line and have it create a remix (using various vst and other effects)
Sometimes, however, we may have deleted the 'guitar.wav' and thus have to leave
out that track from the config-file or the rendering won't work.

Since it seems 'whitespace matters' in the file I have the following code to get me a tag:
I cost me a broken cup and coffee all over the the kitchen tiles - temper!

I still don't understand why I have to use \n instead of ^ af the start of TAGCONTENTS and TAGEND.
But I can live with it!

Thank you for your kind and humorous help!
kind retards
jorgen / de mente
www.myspace.com/dementedk
------------------------------------------------------------

import re

TESTTXT=open('003autoreaper.rpp').read() # whole file now

def getLevel(levl):
rex = re.compile(
r'(?m)' # multiline
r'(?P<TAGSTART>^ {%d}[<])' # the < character
r'(?P<TAGNAME>[a-zA-Z0-9_]*)' # the tagname
r'(?P<TAGDATA>[\S \t]*?$)' # the rest of the tagstart line
r'(?P<TAGCONTENTS>(\n {%d}[^>][\S \t]*$){0,})' # all the data coming before the >
r'(?P<TAGEND>\n {%d}>[\S \t]*$)' %(levl,levl,levl) # the > character
)
return rex

for i in getLevel(2).finditer(TESTTXT):
myMatch = i.groupdict()
print i.group('TAGNAME'),i.start('TAGSTART'), i.end('TAGEND')
#print i.groups()
if myMatch['TAGNAME'] == 'TRACK':
#print i.groups()
for j in getLevel(6).finditer(TESTTXT,i.start('TAGSTART'), i.end('TAGEND')):
myMatch2 = j.groupdict()
#print j.groups()
print j.group('TAGNAME'),j.start('TAGSTART'), j.end('TAGEND')
if myMatch2['TAGNAME'] == 'SOURCE':
for m in myMatch2:
print m, myMatch2[m]

Paul McGuire · Nov 19, 2007

Sorry about your coffee cup! Would you be interested in a pyparsing
rendition?

-- Paul

from pyparsing import *

def defineGrammar():
ParserElement.setDefaultWhitespaceChars(" \t")

ident = Word(alphanums+"_")
LT,GT = map(Suppress,"<>")
NL = LineEnd().suppress()

real = Word(nums,nums+".")
integer = Word(nums)
quotedString = QuotedString('"')

dataValue = real | integer | Word(alphas,alphanums) | quotedString
dataDef = ident + ZeroOrMore(dataValue) + NL
tagDef = Forward()
tagDef << LT + ident + ZeroOrMore(dataValue) + NL + \
Dict(ZeroOrMore(Group(dataDef) | Group(tagDef))) + GT + NL
tagData = Dict(OneOrMore(Group(tagDef)))
return tagData

results = defineGrammar().parseString(TESTTXT)
print( results.dump() )
print results.REAPER_PROJECT.TRACK.keys()
print results.REAPER_PROJECT.TRACK.PANENV2
print results.REAPER_PROJECT.TRACK.PANENV2.ACT

prints out:

[['REAPER_PROJECT', '0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
'', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
['PANENV2', ['ACT', '1']]]]]
- REAPER_PROJECT: ['0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
'', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
['PANENV2', ['ACT', '1']]]]
- METRONOME: ['6', '2.000000', ['SAMPLES', '', '']]
- SAMPLES: ['', '']
- TRACK: [['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']], ['PANENV2',
['ACT', '1']]]
- MAINSEND: 1
- PANENV2: [['ACT', '1']]
- ACT: 1
- VOLENV2: [['ACT', '1']]
- ACT: 1
['PANENV2', 'MAINSEND', 'VOLENV2']
[['ACT', '1']]
1

gardsted · Nov 19, 2007

Paul said:
Sorry about your coffee cup! Would you be interested in a pyparsing
rendition?

-- Paul

from pyparsing import *

def defineGrammar():
ParserElement.setDefaultWhitespaceChars(" \t")

ident = Word(alphanums+"_")
LT,GT = map(Suppress,"<>")
NL = LineEnd().suppress()

real = Word(nums,nums+".")
integer = Word(nums)
quotedString = QuotedString('"')

dataValue = real | integer | Word(alphas,alphanums) | quotedString
dataDef = ident + ZeroOrMore(dataValue) + NL
tagDef = Forward()
tagDef << LT + ident + ZeroOrMore(dataValue) + NL + \
Dict(ZeroOrMore(Group(dataDef) | Group(tagDef))) + GT + NL
tagData = Dict(OneOrMore(Group(tagDef)))
return tagData

results = defineGrammar().parseString(TESTTXT)
print( results.dump() )
print results.REAPER_PROJECT.TRACK.keys()
print results.REAPER_PROJECT.TRACK.PANENV2
print results.REAPER_PROJECT.TRACK.PANENV2.ACT

prints out:

[['REAPER_PROJECT', '0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
'', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
['PANENV2', ['ACT', '1']]]]]
- REAPER_PROJECT: ['0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
'', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
['PANENV2', ['ACT', '1']]]]
- METRONOME: ['6', '2.000000', ['SAMPLES', '', '']]
- SAMPLES: ['', '']
- TRACK: [['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']], ['PANENV2',
['ACT', '1']]]
- MAINSEND: 1
- PANENV2: [['ACT', '1']]
- ACT: 1
- VOLENV2: [['ACT', '1']]
- ACT: 1
['PANENV2', 'MAINSEND', 'VOLENV2']
[['ACT', '1']]
1

Thank You Paul - I am very interested.
In between drinking coffee and smashing coffee cups, I actually visited your site and my
impression was: wow, If I could only take the time instead of struggling with this
'almost there' re thing!
I am not that good at it actually, but working hard, not worrying about the cups to much...

I will now revisit pyparsing and learn!

I cheated a bit on you and read this: http://www.oreillynet.com/pub/au/2557.

I live in a little danish town, Svendborg, nice by the sea and all.
I learned steel construction in the 80's at the local shipyard,
(now closed), much later (96-98) I received a very short education in
IT-skills on a business school in Odense, the nearest city.
I spent the years 98-05 working for Maersk Data, later IBM.
From 05 and onwards independent.
Struggling hard to keep orders at a bare minimum,
I spend some of my spare time working with the elderly, and some of it
programming python for different purposes at home, and some of it playing
in the band: http://myspace.com/dementedk, and some of it combining the two.

So now You know more or less the same about me as I know about You.
Jorgen

Regular expression problem	13	Mar 10, 2013
re.sub() problem (regular expression)	1	Dec 14, 2007
Problem creating a regular expression to parse open-iscsi, iscsiadmoutput (help?)	5	Jun 13, 2013
Regular expression to structure HTML	11	Oct 2, 2009
Regular expression	1	Jun 20, 2008
regular expression extracting groups	3	Aug 10, 2008
Regular Expression for Finding and Deleting comments	1	Jan 4, 2011
Pathological regular expression	18	Apr 9, 2009

regular expression

gardsted

Diez B. Roggisch

gardsted

MonkeeSage

gardsted

Paul McGuire

gardsted

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads