regular expression

Discussion in 'Python' started by gardsted, Nov 18, 2007.

  1. gardsted

    gardsted Guest

    I just can't seem to get it:
    I was having some trouble with finding the first <REAPER_PROJECT in the following with this regex:

    Should these two approaches behave similarly?
    I used hours before I found the second one,
    but then again, I'm not so smart...:

    kind retards
    jorgen / de mente
    using python 2.5.1
    -------------------------------------------
    import re

    TESTTXT="""<REAPER_PROJECT 0.1
    <METRONOME 6 2.000000
    SAMPLES "" ""
    >

    <TRACK
    MAINSEND 1
    <VOLENV2
    ACT 1
    >

    <PANENV2
    ACT 1
    >
    >
    >

    """
    print "The First approach - flags in finditer"
    rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    for i in rex.finditer(TESTTXT,re.MULTILINE):
    print i,i.groups()

    print "The Second approach - flags in pattern "
    rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    for i in rex.finditer(TESTTXT):
    print i,i.groups()
    gardsted, Nov 18, 2007
    #1
    1. Advertising

  2. gardsted schrieb:
    > I just can't seem to get it:
    > I was having some trouble with finding the first <REAPER_PROJECT in the
    > following with this regex:
    >
    > Should these two approaches behave similarly?
    > I used hours before I found the second one,
    > but then again, I'm not so smart...:
    >
    > kind retards
    > jorgen / de mente
    > using python 2.5.1
    > -------------------------------------------
    > import re
    >
    > TESTTXT="""<REAPER_PROJECT 0.1
    > <METRONOME 6 2.000000
    > SAMPLES "" ""
    > >

    > <TRACK
    > MAINSEND 1
    > <VOLENV2
    > ACT 1
    > >

    > <PANENV2
    > ACT 1
    > >
    > >
    > >

    > """
    > print "The First approach - flags in finditer"
    > rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    > for i in rex.finditer(TESTTXT,re.MULTILINE):
    > print i,i.groups()
    >
    > print "The Second approach - flags in pattern "
    > rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    > for i in rex.finditer(TESTTXT):
    > print i,i.groups()


    What the heck is that format? XML's retarded cousin living in the attic?

    Ok, back to the problem then...

    This works for me:

    rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]+)',re.MULTILINE)
    for i in rex.finditer(TESTTXT):
    print i,i.groups()

    However, you might think of getting rid of the ^ beceause otherwise you
    _only_ get the first tag beginning at a line. And making the * a + in
    the TAGNAME might also be better.

    Diez
    Diez B. Roggisch, Nov 18, 2007
    #2
    1. Advertising

  3. gardsted

    gardsted Guest

    Ups - got it - there are no flags in finditer;-)
    So rtfm, once again, jorgen!

    gardsted wrote:
    > I just can't seem to get it:
    > I was having some trouble with finding the first <REAPER_PROJECT in the
    > following with this regex:
    >
    > Should these two approaches behave similarly?
    > I used hours before I found the second one,
    > but then again, I'm not so smart...:
    >
    > kind retards
    > jorgen / de mente
    > using python 2.5.1
    > -------------------------------------------
    > import re
    >
    > TESTTXT="""<REAPER_PROJECT 0.1
    > <METRONOME 6 2.000000
    > SAMPLES "" ""
    > >

    > <TRACK
    > MAINSEND 1
    > <VOLENV2
    > ACT 1
    > >

    > <PANENV2
    > ACT 1
    > >
    > >
    > >

    > """
    > print "The First approach - flags in finditer"
    > rex = re.compile(r'^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    > for i in rex.finditer(TESTTXT,re.MULTILINE):
    > print i,i.groups()
    >
    > print "The Second approach - flags in pattern "
    > rex = re.compile(r'(?m)^<(?P<TAGNAME>[a-zA-Z0-9_]*)')
    > for i in rex.finditer(TESTTXT):
    > print i,i.groups()
    gardsted, Nov 18, 2007
    #3
  4. gardsted

    MonkeeSage Guest

    On Nov 18, 3:54 pm, "Diez B. Roggisch" <> wrote:

    > What the heck is that format? XML's retarded cousin living in the attic?


    ROFL...for some reason that makes me think of wierd Ed Edison from
    maniac mansion, heh ;)
    MonkeeSage, Nov 19, 2007
    #4
  5. gardsted

    gardsted Guest

    The retarded cousin - that's me!

    I keep getting confused by the caret - sometimes it works - sometimes it's better with backslash-n
    Yes - retarded cousin, I guess.

    The file format is a config-track for a multitrack recording software, which i need to automate a bit.
    I can start it from the command line and have it create a remix (using various vst and other effects)
    Sometimes, however, we may have deleted the 'guitar.wav' and thus have to leave
    out that track from the config-file or the rendering won't work.

    Since it seems 'whitespace matters' in the file I have the following code to get me a tag:
    I cost me a broken cup and coffee all over the the kitchen tiles - temper!

    I still don't understand why I have to use \n instead of ^ af the start of TAGCONTENTS and TAGEND.
    But I can live with it!

    Thank you for your kind and humorous help!
    kind retards
    jorgen / de mente
    www.myspace.com/dementedk
    ------------------------------------------------------------

    import re

    TESTTXT=open('003autoreaper.rpp').read() # whole file now

    def getLevel(levl):
    rex = re.compile(
    r'(?m)' # multiline
    r'(?P<TAGSTART>^ {%d}[<])' # the < character
    r'(?P<TAGNAME>[a-zA-Z0-9_]*)' # the tagname
    r'(?P<TAGDATA>[\S \t]*?$)' # the rest of the tagstart line
    r'(?P<TAGCONTENTS>(\n {%d}[^>][\S \t]*$){0,})' # all the data coming before the >
    r'(?P<TAGEND>\n {%d}>[\S \t]*$)' %(levl,levl,levl) # the > character
    )
    return rex

    for i in getLevel(2).finditer(TESTTXT):
    myMatch = i.groupdict()
    print i.group('TAGNAME'),i.start('TAGSTART'), i.end('TAGEND')
    #print i.groups()
    if myMatch['TAGNAME'] == 'TRACK':
    #print i.groups()
    for j in getLevel(6).finditer(TESTTXT,i.start('TAGSTART'), i.end('TAGEND')):
    myMatch2 = j.groupdict()
    #print j.groups()
    print j.group('TAGNAME'),j.start('TAGSTART'), j.end('TAGEND')
    if myMatch2['TAGNAME'] == 'SOURCE':
    for m in myMatch2:
    print m, myMatch2[m]
    gardsted, Nov 19, 2007
    #5
  6. gardsted

    Paul McGuire Guest

    Sorry about your coffee cup! Would you be interested in a pyparsing
    rendition?

    -- Paul


    from pyparsing import *

    def defineGrammar():
    ParserElement.setDefaultWhitespaceChars(" \t")

    ident = Word(alphanums+"_")
    LT,GT = map(Suppress,"<>")
    NL = LineEnd().suppress()

    real = Word(nums,nums+".")
    integer = Word(nums)
    quotedString = QuotedString('"')

    dataValue = real | integer | Word(alphas,alphanums) | quotedString
    dataDef = ident + ZeroOrMore(dataValue) + NL
    tagDef = Forward()
    tagDef << LT + ident + ZeroOrMore(dataValue) + NL + \
    Dict(ZeroOrMore(Group(dataDef) | Group(tagDef))) + GT + NL
    tagData = Dict(OneOrMore(Group(tagDef)))
    return tagData

    results = defineGrammar().parseString(TESTTXT)
    print( results.dump() )
    print results.REAPER_PROJECT.TRACK.keys()
    print results.REAPER_PROJECT.TRACK.PANENV2
    print results.REAPER_PROJECT.TRACK.PANENV2.ACT


    prints out:

    [['REAPER_PROJECT', '0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
    '', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
    ['PANENV2', ['ACT', '1']]]]]
    - REAPER_PROJECT: ['0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
    '', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
    ['PANENV2', ['ACT', '1']]]]
    - METRONOME: ['6', '2.000000', ['SAMPLES', '', '']]
    - SAMPLES: ['', '']
    - TRACK: [['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']], ['PANENV2',
    ['ACT', '1']]]
    - MAINSEND: 1
    - PANENV2: [['ACT', '1']]
    - ACT: 1
    - VOLENV2: [['ACT', '1']]
    - ACT: 1
    ['PANENV2', 'MAINSEND', 'VOLENV2']
    [['ACT', '1']]
    1
    Paul McGuire, Nov 19, 2007
    #6
  7. gardsted

    gardsted Guest

    Paul McGuire wrote:
    > Sorry about your coffee cup! Would you be interested in a pyparsing
    > rendition?
    >
    > -- Paul
    >
    >
    > from pyparsing import *
    >
    > def defineGrammar():
    > ParserElement.setDefaultWhitespaceChars(" \t")
    >
    > ident = Word(alphanums+"_")
    > LT,GT = map(Suppress,"<>")
    > NL = LineEnd().suppress()
    >
    > real = Word(nums,nums+".")
    > integer = Word(nums)
    > quotedString = QuotedString('"')
    >
    > dataValue = real | integer | Word(alphas,alphanums) | quotedString
    > dataDef = ident + ZeroOrMore(dataValue) + NL
    > tagDef = Forward()
    > tagDef << LT + ident + ZeroOrMore(dataValue) + NL + \
    > Dict(ZeroOrMore(Group(dataDef) | Group(tagDef))) + GT + NL
    > tagData = Dict(OneOrMore(Group(tagDef)))
    > return tagData
    >
    > results = defineGrammar().parseString(TESTTXT)
    > print( results.dump() )
    > print results.REAPER_PROJECT.TRACK.keys()
    > print results.REAPER_PROJECT.TRACK.PANENV2
    > print results.REAPER_PROJECT.TRACK.PANENV2.ACT
    >
    >
    > prints out:
    >
    > [['REAPER_PROJECT', '0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
    > '', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
    > ['PANENV2', ['ACT', '1']]]]]
    > - REAPER_PROJECT: ['0.1', ['METRONOME', '6', '2.000000', ['SAMPLES',
    > '', '']], ['TRACK', ['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']],
    > ['PANENV2', ['ACT', '1']]]]
    > - METRONOME: ['6', '2.000000', ['SAMPLES', '', '']]
    > - SAMPLES: ['', '']
    > - TRACK: [['MAINSEND', '1'], ['VOLENV2', ['ACT', '1']], ['PANENV2',
    > ['ACT', '1']]]
    > - MAINSEND: 1
    > - PANENV2: [['ACT', '1']]
    > - ACT: 1
    > - VOLENV2: [['ACT', '1']]
    > - ACT: 1
    > ['PANENV2', 'MAINSEND', 'VOLENV2']
    > [['ACT', '1']]
    > 1


    Thank You Paul - I am very interested.
    In between drinking coffee and smashing coffee cups, I actually visited your site and my
    impression was: wow, If I could only take the time instead of struggling with this
    'almost there' re thing!
    I am not that good at it actually, but working hard, not worrying about the cups to much...

    I will now revisit pyparsing and learn!

    I cheated a bit on you and read this: http://www.oreillynet.com/pub/au/2557.

    I live in a little danish town, Svendborg, nice by the sea and all.
    I learned steel construction in the 80's at the local shipyard,
    (now closed), much later (96-98) I received a very short education in
    IT-skills on a business school in Odense, the nearest city.
    I spent the years 98-05 working for Maersk Data, later IBM.
    From 05 and onwards independent.
    Struggling hard to keep orders at a bare minimum,
    I spend some of my spare time working with the elderly, and some of it
    programming python for different purposes at home, and some of it playing
    in the band: http://myspace.com/dementedk, and some of it combining the two.

    So now You know more or less the same about me as I know about You.
    Jorgen
    gardsted, Nov 19, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    440
    Mary Chipman
    Jun 15, 2004
  2. VSK
    Replies:
    2
    Views:
    2,267
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    829
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,917
    vforvikash
    Dec 29, 2008
  5. Noman Shapiro
    Replies:
    0
    Views:
    219
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page