pyparsing and svg

Discussion in 'Python' started by Donn Ingle, Nov 8, 2007.

  1. Donn Ingle

    Donn Ingle Guest

    Hi - I have been trying, but I need some help:
    Here's my test code to parse an SVG path element, can anyone steer me right?

    d1="""
    M 209.12237 , 172.2415
    L 286.76739 , 153.51369
    L 286.76739 , 275.88534
    L 209.12237 , 294.45058
    L 209.12237 , 172.2415
    z """

    #Try it with no enters
    d1="""M 209.12237,172.2415 L 286.76739,153.51369 L 286.76739,275.88534 L
    209.12237,294.45058 L 209.12237,172.2415 z """

    #Try it with no spaces
    d1="""M209.12237,172.2415L286.76739,153.51369L286.76739,275.88534L209.12237,294.45058L209.12237,172.2415z"""

    #For later, has more commands
    d2="""
    M 269.78326 , 381.27104
    C 368.52151 , 424.27023
    90.593578 , -18.581883
    90.027729 , 129.28708
    C 89.461878 , 277.15604
    171.04501 , 338.27184
    269.78326 , 381.27104
    z """

    ## word :: M, L, C, Z
    ## number :: group of numbers
    ## dot :: "."
    ## comma :: ","
    ## couple :: number dot number comma number dot number
    ## command :: word
    ##
    ## phrase :: command couple

    from pyparsing import Word, Literal, alphas, nums, Optional, OneOrMore

    command = Word("MLCZ")
    comma = Literal(",")
    dot = Literal(".")
    float = nums + dot + nums
    couple = float + comma + float

    phrase = OneOrMore(command + OneOrMore(couple | ( couple + couple ) ) )

    print phrase

    print phrase.parseString(d1.upper())

    Thanks
    \d
    Donn Ingle, Nov 8, 2007
    #1
    1. Advertising

  2. Donn Ingle

    Paul McGuire Guest

    On Nov 8, 3:14 am, Donn Ingle <> wrote:

    > float = nums + dot + nums


    Should be:

    float = Combine(Word(nums) + dot + Word(nums))

    nums is a string that defines the set of numeric digits for composing
    Word instances. nums is not an expression by itself.

    For that matter, I see in your later tests that some values have a
    leading minus sign, so you should really go with:

    float = Combine(Optional("-") + Word(nums) + dot + Word(nums))



    Some other comments:

    1. Read up on the Word class, you are not using it quite right.

    command = Word("MLCZ")

    will work with your test set, but it is not the form I would choose.
    Word(characterstring) will match any "word" made up of the characters
    in the input string. So Word("MLCZ") will match
    M
    L
    C
    Z
    MM
    LC
    MCZL
    MMLCLLZCZLLM

    I would suggest instead using:

    command = Literal("M") | "L" | "C" | "Z"

    or

    command = oneOf("M L C Z")

    2. Change comma to

    comma = Literal(",").suppress()

    The comma is important to the parsing process, but the ',' token is
    not much use in the returned set of matched tokens, get rid of it (by
    using suppress).

    3. Group your expressions, such as

    couple = Group(float + comma + float)

    It will really simplify getting at the resulting parsed tokens.


    4. What is the purpose of (couple + couple)? This is sufficient:

    phrase = OneOrMore(command + Group(OneOrMore(couple)) )

    (Note use of Group to return the coord pairs as a sublist.)


    5. Results names!

    phrase = OneOrMore(command("command") + Group(OneOrMore(couple))
    ("coords") )

    will allow you to access these fields by name instead of by index.
    This will make your parser code *way* more readable.


    -- Paul
    Paul McGuire, Nov 8, 2007
    #2
    1. Advertising

  3. Donn Ingle

    Arkanes Guest

    Paul McGuire wrote:
    > On Nov 8, 3:14 am, Donn Ingle <> wrote:
    >
    >
    >> float = nums + dot + nums
    >>

    >
    > Should be:
    >
    > float = Combine(Word(nums) + dot + Word(nums))
    >
    > nums is a string that defines the set of numeric digits for composing
    > Word instances. nums is not an expression by itself.
    >
    > For that matter, I see in your later tests that some values have a
    > leading minus sign, so you should really go with:
    >
    > float = Combine(Optional("-") + Word(nums) + dot + Word(nums))
    >
    >
    >


    I have a working path data parser (in pyparsing) at
    http://code.google.com/p/wxpsvg.

    Parsing the numeric values initially gave me a lot of trouble - I
    translated the BNF in the spec literally and there was a *ton* of
    backtracking going on with every numeric value. I ended up using a more
    generous grammar, and letting pythons float() reject invalid values.

    I couldn't get repeating path elements (like M 100 100 200 200, which is
    the same as M 100 100 M 200 200) working right in the grammar, so I
    expand those with post-processing.

    The parser itself can be seen at
    http://wxpsvg.googlecode.com/svn/trunk/svg/pathdata.py

    > Some other comments:
    >
    > 1. Read up on the Word class, you are not using it quite right.
    >
    > command = Word("MLCZ")
    >
    > will work with your test set, but it is not the form I would choose.
    > Word(characterstring) will match any "word" made up of the characters
    > in the input string. So Word("MLCZ") will match
    > M
    > L
    > C
    > Z
    > MM
    > LC
    > MCZL
    > MMLCLLZCZLLM
    >
    > I would suggest instead using:
    >
    > command = Literal("M") | "L" | "C" | "Z"
    >
    > or
    >
    > command = oneOf("M L C Z")
    >
    > 2. Change comma to
    >
    > comma = Literal(",").suppress()
    >
    > The comma is important to the parsing process, but the ',' token is
    > not much use in the returned set of matched tokens, get rid of it (by
    > using suppress).
    >
    > 3. Group your expressions, such as
    >
    > couple = Group(float + comma + float)
    >
    > It will really simplify getting at the resulting parsed tokens.
    >
    >
    > 4. What is the purpose of (couple + couple)? This is sufficient:
    >
    > phrase = OneOrMore(command + Group(OneOrMore(couple)) )
    >
    > (Note use of Group to return the coord pairs as a sublist.)
    >
    >
    > 5. Results names!
    >
    > phrase = OneOrMore(command("command") + Group(OneOrMore(couple))
    > ("coords") )
    >
    > will allow you to access these fields by name instead of by index.
    > This will make your parser code *way* more readable.
    >
    >
    > -- Paul
    >
    >
    Arkanes, Nov 8, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Arne Nordmann
    Replies:
    0
    Views:
    515
    Arne Nordmann
    Jun 28, 2004
  2. Mardagg
    Replies:
    0
    Views:
    796
    Mardagg
    May 12, 2006
  3. Mardagg
    Replies:
    0
    Views:
    648
    Mardagg
    May 12, 2006
  4. milof83

    svg to svg saving problem

    milof83, Aug 18, 2006, in forum: Java
    Replies:
    1
    Views:
    812
    Roland de Ruiter
    Aug 18, 2006
  5. Helmut Jarausch

    Re: svg-chart 1.1 SVG Charting Library

    Helmut Jarausch, May 20, 2008, in forum: Python
    Replies:
    0
    Views:
    462
    Helmut Jarausch
    May 20, 2008
Loading...

Share This Page