matching patterns after regex?

Discussion in 'Python' started by Martin, Aug 12, 2009.

  1. Martin

    Martin Guest

    Hi,

    I have a string (see below) and ideally I would like to pull out the
    decimal number which follows the bounding coordinate information. For
    example ideal from this string I would return...

    s = '\nGROUP = ARCHIVEDMETADATA\n
    GROUPTYPE = MASTERGROUP\n\n GROUP =
    BOUNDINGRECTANGLE\n\n OBJECT =
    NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    VALUE = 19.9999999982039\n END_OBJECT =
    NORTHBOUNDINGCOORDINATE\n\n OBJECT =
    SOUTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    VALUE = 9.99999999910197\n END_OBJECT =
    SOUTHBOUNDINGCOORDINATE\n\n OBJECT =
    EASTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    VALUE = 10.6506458717851\n END_OBJECT =
    EASTBOUNDINGCOORDINATE\n\n OBJECT =
    WESTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    VALUE = 4.3188348375893e-15\n END_OBJECT
    = WESTBOUNDINGCOORDINATE\n\n END_GROUP


    NORTHBOUNDINGCOORDINATE = 19.9999999982039
    SOUTHBOUNDINGCOORDINATE = 9.99999999910197
    EASTBOUNDINGCOORDINATE = 10.6506458717851
    WESTBOUNDINGCOORDINATE = 4.3188348375893e-15

    so far I have only managed to extract the numbers by doing re.findall
    ("[\d.]*\d", s), which returns

    ['1',
    '19.9999999982039',
    '1',
    '9.99999999910197',
    '1',
    '10.6506458717851',
    '1',
    '4.3188348375893',
    '15',
    etc.

    Now the first problem that I can see is that my string match chops off
    the "e-15" part and I am not sure how to incorporate the potential for
    that in my pattern match. Does anyone have any suggestions as to how I
    could also match this? Ideally I would have a statement which printed
    the number between the two bounding coordinate strings for example

    NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    VALUE = 19.9999999982039\n END_OBJECT =
    NORTHBOUNDINGCOORDINATE\n\n

    Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
    decimal number before it hit the next string
    "NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
    suggestions would be appreciated.

    Many thanks

    Martin
     
    Martin, Aug 12, 2009
    #1
    1. Advertising

  2. Martin

    Bernard Guest

    On 12 août, 06:15, Martin <> wrote:
    > Hi,
    >
    > I have a string (see below) and ideally I would like to pull out the
    > decimal number which follows the bounding coordinate information. For
    > example ideal from this string I would return...
    >
    > s = '\nGROUP                  = ARCHIVEDMETADATA\n
    > GROUPTYPE            = MASTERGROUP\n\n  GROUP                  =
    > BOUNDINGRECTANGLE\n\n    OBJECT                 =
    > NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
    > VALUE                = 19.9999999982039\n    END_OBJECT             =
    > NORTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
    > SOUTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
    > VALUE                = 9.99999999910197\n    END_OBJECT             =
    > SOUTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
    > EASTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
    > VALUE                = 10.6506458717851\n    END_OBJECT             =
    > EASTBOUNDINGCOORDINATE\n\n    OBJECT                 =
    > WESTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
    > VALUE                = 4.3188348375893e-15\n    END_OBJECT
    > = WESTBOUNDINGCOORDINATE\n\n  END_GROUP
    >
    > NORTHBOUNDINGCOORDINATE = 19.9999999982039
    > SOUTHBOUNDINGCOORDINATE = 9.99999999910197
    > EASTBOUNDINGCOORDINATE = 10.6506458717851
    > WESTBOUNDINGCOORDINATE = 4.3188348375893e-15
    >
    > so far I have only managed to extract the numbers by doing re.findall
    > ("[\d.]*\d", s), which returns
    >
    > ['1',
    >  '19.9999999982039',
    >  '1',
    >  '9.99999999910197',
    >  '1',
    >  '10.6506458717851',
    >  '1',
    >  '4.3188348375893',
    >  '15',
    > etc.
    >
    > Now the first problem that I can see is that my string match chops off
    > the "e-15" part and I am not sure how to incorporate the potential for
    > that in my pattern match. Does anyone have any suggestions as to how I
    > could also match this? Ideally I would have a statement which printed
    > the number between the two bounding coordinate strings for example
    >
    > NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
    > VALUE                = 19.9999999982039\n    END_OBJECT             =
    > NORTHBOUNDINGCOORDINATE\n\n
    >
    > Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
    > decimal number before it hit the next string
    > "NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
    > suggestions would be appreciated.
    >
    > Many thanks
    >
    > Martin


    Hey Martin,

    here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\.
    \w-]+)

    the first match corresponds to the whateverBOUNDINGCOORDINATE and the
    second match is the value.

    please provide some more entries if you'd like me to test my regex
    some more :)

    cheers

    Bernard
     
    Bernard, Aug 12, 2009
    #2
    1. Advertising

  3. Martin

    Martin Guest

    On Aug 12, 12:53 pm, Bernard <> wrote:
    > On 12 août, 06:15, Martin <> wrote:
    >
    >
    >
    > > Hi,

    >
    > > I have a string (see below) and ideally I would like to pull out the
    > > decimal number which follows the bounding coordinate information. For
    > > example ideal from this string I would return...

    >
    > > s = '\nGROUP = ARCHIVEDMETADATA\n
    > > GROUPTYPE = MASTERGROUP\n\n GROUP =
    > > BOUNDINGRECTANGLE\n\n OBJECT =
    > > NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    > > VALUE = 19.9999999982039\n END_OBJECT =
    > > NORTHBOUNDINGCOORDINATE\n\n OBJECT =
    > > SOUTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    > > VALUE = 9.99999999910197\n END_OBJECT =
    > > SOUTHBOUNDINGCOORDINATE\n\n OBJECT =
    > > EASTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    > > VALUE = 10.6506458717851\n END_OBJECT =
    > > EASTBOUNDINGCOORDINATE\n\n OBJECT =
    > > WESTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    > > VALUE = 4.3188348375893e-15\n END_OBJECT
    > > = WESTBOUNDINGCOORDINATE\n\n END_GROUP

    >
    > > NORTHBOUNDINGCOORDINATE = 19.9999999982039
    > > SOUTHBOUNDINGCOORDINATE = 9.99999999910197
    > > EASTBOUNDINGCOORDINATE = 10.6506458717851
    > > WESTBOUNDINGCOORDINATE = 4.3188348375893e-15

    >
    > > so far I have only managed to extract the numbers by doing re.findall
    > > ("[\d.]*\d", s), which returns

    >
    > > ['1',
    > > '19.9999999982039',
    > > '1',
    > > '9.99999999910197',
    > > '1',
    > > '10.6506458717851',
    > > '1',
    > > '4.3188348375893',
    > > '15',
    > > etc.

    >
    > > Now the first problem that I can see is that my string match chops off
    > > the "e-15" part and I am not sure how to incorporate the potential for
    > > that in my pattern match. Does anyone have any suggestions as to how I
    > > could also match this? Ideally I would have a statement which printed
    > > the number between the two bounding coordinate strings for example

    >
    > > NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
    > > VALUE = 19.9999999982039\n END_OBJECT =
    > > NORTHBOUNDINGCOORDINATE\n\n

    >
    > > Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
    > > decimal number before it hit the next string
    > > "NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
    > > suggestions would be appreciated.

    >
    > > Many thanks

    >
    > > Martin

    >
    > Hey Martin,
    >
    > here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\.
    > \w-]+)
    >
    > the first match corresponds to the whateverBOUNDINGCOORDINATE and the
    > second match is the value.
    >
    > please provide some more entries if you'd like me to test my regex
    > some more :)
    >
    > cheers
    >
    > Bernard


    Thanks Bernard it doesn't seem to be working for me...

    I tried

    re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

    is that what you meant? Apologies if not, that results in a syntax
    error:

    In [557]: re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
    ------------------------------------------------------------
    File "<ipython console>", line 1
    re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
    ^
    SyntaxError: unexpected character after line continuation character

    Thanks
     
    Martin, Aug 12, 2009
    #3
  4. On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:

    > I tried
    >
    > re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)


    You need to put quotes around strings.

    In this case, because you're using regular expressions, you should use a
    raw string:

    re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

    will probably work.





    --
    Steven
     
    Steven D'Aprano, Aug 12, 2009
    #4
  5. Martin

    Martin Guest

    On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
    > > I tried

    >
    > > re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

    >
    > You need to put quotes around strings.
    >
    > In this case, because you're using regular expressions, you should use a
    > raw string:
    >
    > re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    >
    > will probably work.
    >
    > --
    > Steven


    Thanks I see.

    so I tried it and if I use it as it is, it matches the first instance:
    I
    n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]

    So I adjusted the first part of the regex, on the basis I could sub
    NORTH for SOUTH etc.

    In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
    \w-]+)",s)
    Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]

    But in both cases it doesn't return the decimal value rather the value
    that comes after NUM_VAL = , rather than VALUE = ?
     
    Martin, Aug 12, 2009
    #5
  6. Martin

    Martin Guest

    On Aug 12, 1:42 pm, Martin <> wrote:
    > On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    >
    >
    >
    > cybersource.com.au> wrote:
    > > On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
    > > > I tried

    >
    > > > re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

    >
    > > You need to put quotes around strings.

    >
    > > In this case, because you're using regular expressions, you should use a
    > > raw string:

    >
    > > re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

    >
    > > will probably work.

    >
    > > --
    > > Steven

    >
    > Thanks I see.
    >
    > so I tried it and if I use it as it is, it matches the first instance:
    > I
    > n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    > Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
    >
    > So I adjusted the first part of the regex, on the basis I could sub
    > NORTH for SOUTH etc.
    >
    > In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
    > \w-]+)",s)
    > Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
    >
    > But in both cases it doesn't return the decimal value rather the value
    > that comes after NUM_VAL = , rather than VALUE = ?


    I think I kind of got that to work...but I am clearly not quite
    understanding how it works as I tried to use it again to match
    something else.

    In this case I want to print the values 0.000000 and 2223901.039333
    from a string like this...

    YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t

    I tried which I though was matching the statement and printing the
    decimal number after the equals sign??

    re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)

    where s is the string

    Many thanks for the help
     
    Martin, Aug 12, 2009
    #6
  7. Martin

    Bernard Guest

    On 12 août, 12:43, Martin <> wrote:
    > On Aug 12, 1:42 pm, Martin <> wrote:
    >
    >
    >
    >
    >
    > > On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-

    >
    > > cybersource.com.au> wrote:
    > > > On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
    > > > > I tried

    >
    > > > > re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

    >
    > > > You need to put quotes around strings.

    >
    > > > In this case, because you're using regular expressions, you should use a
    > > > raw string:

    >
    > > > re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

    >
    > > > will probably work.

    >
    > > > --
    > > > Steven

    >
    > > Thanks I see.

    >
    > > so I tried it and if I use it as it is, it matches the first instance:
    > > I
    > > n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    > > Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]

    >
    > > So I adjusted the first part of the regex, on the basis I could sub
    > > NORTH for SOUTH etc.

    >
    > > In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\..
    > > \w-]+)",s)
    > > Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]

    >
    > > But in both cases it doesn't return the decimal value rather the value
    > > that comes after NUM_VAL = , rather than VALUE = ?

    >
    > I think I kind of got that to work...but I am clearly not quite
    > understanding how it works as I tried to use it again to match
    > something else.
    >
    > In this case I want to print the values 0.000000 and 2223901.039333
    > from a string like this...
    >
    > YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t
    >
    > I tried which I though was matching the statement and printing the
    > decimal number after the equals sign??
    >
    > re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)
    >
    > where s is the string
    >
    > Many thanks for the help


    You have to do it with 2 matches in the same regex:

    regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"

    The first match is before the , and the second one is after the , :)

    You should probably learn how to play with regexes.
    I personnaly use a visual tool called RX Toolkit[1] that comes with
    Komodo IDE.

    [1] http://docs.activestate.com/komodo/4.4/regex.html
     
    Bernard, Aug 12, 2009
    #7
  8. Bernard wrote:
    > On 12 août, 12:43, Martin <> wrote:
    >> On Aug 12, 1:42 pm, Martin <> wrote:
    >>
    >>
    >>
    >>
    >>
    >>> On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    >>> cybersource.com.au> wrote:
    >>>> On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
    >>>>> I tried
    >>>>> re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
    >>>> You need to put quotes around strings.
    >>>> In this case, because you're using regular expressions, you should use a
    >>>> raw string:
    >>>> re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    >>>> will probably work.
    >>>> --
    >>>> Steven
    >>> Thanks I see.
    >>> so I tried it and if I use it as it is, it matches the first instance:
    >>> I
    >>> n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    >>> Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
    >>> So I adjusted the first part of the regex, on the basis I could sub
    >>> NORTH for SOUTH etc.
    >>> In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
    >>> \w-]+)",s)
    >>> Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
    >>> But in both cases it doesn't return the decimal value rather the value
    >>> that comes after NUM_VAL = , rather than VALUE = ?

    >> I think I kind of got that to work...but I am clearly not quite
    >> understanding how it works as I tried to use it again to match
    >> something else.
    >>
    >> In this case I want to print the values 0.000000 and 2223901.039333
    >> from a string like this...
    >>
    >> YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t
    >>
    >> I tried which I though was matching the statement and printing the
    >> decimal number after the equals sign??
    >>
    >> re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)
    >>
    >> where s is the string
    >>
    >> Many thanks for the help

    >
    > You have to do it with 2 matches in the same regex:
    >
    > regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"
    >
    > The first match is before the , and the second one is after the , :)
    >
    > You should probably learn how to play with regexes.
    > I personnaly use a visual tool called RX Toolkit[1] that comes with
    > Komodo IDE.
    >
    > [1] http://docs.activestate.com/komodo/4.4/regex.html

    Haven't tried it myself but how about this?
    http://re-try.appspot.com/

    --
    Kindest regards.

    Mark Lawrence.
     
    Mark Lawrence, Aug 12, 2009
    #8
  9. Martin

    Martin Guest

    On Aug 12, 10:29 pm, Mark Lawrence <> wrote:
    > Bernard wrote:
    > > On 12 août, 12:43, Martin <> wrote:
    > >> On Aug 12, 1:42 pm, Martin <> wrote:

    >
    > >>> On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
    > >>> cybersource.com.au> wrote:
    > >>>> On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
    > >>>>> I tried
    > >>>>> re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
    > >>>> You need to put quotes around strings.
    > >>>> In this case, because you're using regular expressions, you should use a
    > >>>> raw string:
    > >>>> re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    > >>>> will probably work.
    > >>>> --
    > >>>> Steven
    > >>> Thanks I see.
    > >>> so I tried it and if I use it as it is, it matches the first instance:
    > >>> I
    > >>> n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
    > >>> Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
    > >>> So I adjusted the first part of the regex, on the basis I could sub
    > >>> NORTH for SOUTH etc.
    > >>> In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
    > >>> \w-]+)",s)
    > >>> Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
    > >>> But in both cases it doesn't return the decimal value rather the value
    > >>> that comes after NUM_VAL = , rather than VALUE = ?
    > >> I think I kind of got that to work...but I am clearly not quite
    > >> understanding how it works as I tried to use it again to match
    > >> something else.

    >
    > >> In this case I want to print the values 0.000000 and 2223901.039333
    > >> from a string like this...

    >
    > >> YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t

    >
    > >> I tried which I though was matching the statement and printing the
    > >> decimal number after the equals sign??

    >
    > >> re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)

    >
    > >> where s is the string

    >
    > >> Many thanks for the help

    >
    > > You have to do it with 2 matches in the same regex:

    >
    > > regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"

    >
    > > The first match  is before the , and the second one is after the , :)

    >
    > > You should probably learn how to play with regexes.
    > > I personnaly use a visual tool called RX Toolkit[1] that comes with
    > > Komodo IDE.

    >
    > > [1]http://docs.activestate.com/komodo/4.4/regex.html

    >
    > Haven't tried it myself but how about this?http://re-try.appspot.com/
    >
    > --
    > Kindest regards.
    >
    > Mark Lawrence.


    Thanks Mark and Bernard. I have managed to get it working and I
    appreciate the help with understanding the syntax. The web links are
    also very useful, I'll give them a go.

    Martin
     
    Martin, Aug 13, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. crichmon
    Replies:
    4
    Views:
    497
    Mabden
    Jul 7, 2004
  2. Xah Lee
    Replies:
    1
    Views:
    954
    Ilias Lazaridis
    Sep 22, 2006
  3. Xah Lee
    Replies:
    8
    Views:
    465
    Ilias Lazaridis
    Sep 26, 2006
  4. Martin
    Replies:
    27
    Views:
    771
    Mart.
    Sep 11, 2009
  5. Xah Lee
    Replies:
    2
    Views:
    223
    Xah Lee
    Sep 25, 2006
Loading...

Share This Page