How to form a dict out of a string by doing regex ?

Discussion in 'Python' started by Satyajit Sarangi, Jun 15, 2011.

  1. data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
    -4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
    POINT (-1.4062500000000000 -11.1621093750000000), POINT
    (-11.9531250000000000,-10.8984375000000000), POLYGON
    ((-21.6210937500000000 1.8457031250000000,2.4609375000000000
    2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
    -22.6757812500000000 -3.3398437500000000, -22.1484375000000000
    -2.6367187500000000, -21.6210937500000000
    1.8457031250000000)),LINESTRING (-11.9531250000000000
    11.3378906250000000, 7.7343750000000000 11.5136718750000000,
    12.3046875000000000 2.5488281250000000, 12.2167968750000000
    1.6699218750000000, 14.5019531250000000 3.9550781250000000))"

    This is my string .
    How do I traverse through it and form 3 dicts of Point , Polygon and
    Linestring containing the co-ordinates ?
     
    Satyajit Sarangi, Jun 15, 2011
    #1
    1. Advertising

  2. Satyajit Sarangi

    Mel Guest

    Satyajit Sarangi wrote:

    >
    >
    > data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
    > -4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
    > POINT (-1.4062500000000000 -11.1621093750000000), POINT
    > (-11.9531250000000000,-10.8984375000000000), POLYGON
    > ((-21.6210937500000000 1.8457031250000000,2.4609375000000000
    > 2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
    > -22.6757812500000000 -3.3398437500000000, -22.1484375000000000
    > -2.6367187500000000, -21.6210937500000000
    > 1.8457031250000000)),LINESTRING (-11.9531250000000000
    > 11.3378906250000000, 7.7343750000000000 11.5136718750000000,
    > 12.3046875000000000 2.5488281250000000, 12.2167968750000000
    > 1.6699218750000000, 14.5019531250000000 3.9550781250000000))"
    >
    > This is my string .
    > How do I traverse through it and form 3 dicts of Point , Polygon and
    > Linestring containing the co-ordinates ?


    Except for those space-separated number pairs, it could be a job for some
    well-crafted classes (e.g. `class GEOMETRYCOLLECTION ...`, `class POINT
    ....`) and eval.

    My approach would be to use a loop with regexes to recognize the leading
    element and pick out its arguments, then use the string split and strip
    methods beyond that point. Like (untested):

    recognizer = re.compile (r'(?(POINT|POLYGON|LINESTRING)\s*\(+(.*?)\)+,(.*)')
    # regex is not good with nested brackets,
    # so kill off outer nested brackets..
    s1 = 'GEOMETRYCOLLECTION ('
    if data.startswith (s1):
    data = data (len (s1):-1)

    while data:
    match = recognizer.match (data)
    if not match:
    break # nothing usable in data
    ## now the matched groups will be:
    ## 1: the keyword
    ## 2: the arguments inside the smallest bracketed sequence
    ## 3: the rest of data
    ## so use str.split and str.match to pull out the individual arguments,
    ## and lastly
    data = match.group (3)

    This is all from memory. I might have got some details wrong in recognizer.

    Mel.
     
    Mel, Jun 15, 2011
    #2
    1. Advertising

  3. Satyajit Sarangi

    Terry Reedy Guest

    On 6/15/2011 10:42 AM, Satyajit Sarangi wrote:
    >
    >
    > data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
    > -4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
    > POINT (-1.4062500000000000 -11.1621093750000000), POINT
    > (-11.9531250000000000,-10.8984375000000000), POLYGON
    > ((-21.6210937500000000 1.8457031250000000,2.4609375000000000
    > 2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
    > -22.6757812500000000 -3.3398437500000000, -22.1484375000000000
    > -2.6367187500000000, -21.6210937500000000
    > 1.8457031250000000)),LINESTRING (-11.9531250000000000
    > 11.3378906250000000, 7.7343750000000000 11.5136718750000000,
    > 12.3046875000000000 2.5488281250000000, 12.2167968750000000
    > 1.6699218750000000, 14.5019531250000000 3.9550781250000000))"
    >
    > This is my string .


    If this what you are given by an unchangable external source or can you
    get something a bit better? One object per line would make the problem
    pretty simple, with no regex required.

    > How do I traverse through it and form 3 dicts of Point , Polygon and
    > Linestring containing the co-ordinates ?


    Dicts map keys to values. I do not see any key values above. It looks
    like you really want three sets.


    --
    Terry Jan Reedy
     
    Terry Reedy, Jun 15, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Skip Montanaro
    Replies:
    0
    Views:
    428
    Skip Montanaro
    Aug 15, 2003
  2. Alexander Kozlovsky

    dict!ident as equivalent of dict["ident"]

    Alexander Kozlovsky, May 21, 2006, in forum: Python
    Replies:
    5
    Views:
    380
    Alexander Kozlovsky
    May 22, 2006
  3. Paul Melis

    dict.has_key(x) versus 'x in dict'

    Paul Melis, Dec 6, 2006, in forum: Python
    Replies:
    48
    Views:
    1,353
    Kent Johnson
    Dec 15, 2006
  4. Almad
    Replies:
    8
    Views:
    426
    Terry Reedy
    Dec 14, 2006
  5. Drew
    Replies:
    19
    Views:
    1,369
    Duncan Booth
    Mar 15, 2007
Loading...

Share This Page