Re: pyparsing wrong output

Discussion in 'Python' started by Gabriel Genellina, Feb 13, 2010.

  1. En Fri, 12 Feb 2010 10:41:40 -0300, Eknath Venkataramani
    <> escribió:

    > I am trying to write a parser in pyparsing.
    > Help Me. http://paste.pocoo.org/show/177078/ is the code and this is
    > input
    > file: http://paste.pocoo.org/show/177076/ .
    > I get output as:
    > <generator object at 0xb723b80c>


    There is nothing wrong with pyparsing here. scanString() returns a
    generator, like this:

    py> g = (x for x in range(20) if x % 3 == 1)
    py> g
    <generator object <genexpr> at 0x00E50D78>

    A generator is like a function that may be suspended and restarted. It
    yields one value at a time, and only runs when you ask for the next value:

    py> next(g)
    1
    py> next(g)
    4

    You may use a `for` loop to consume the generator:

    py> for i in g:
    .... print i
    ....
    7
    10
    13
    16
    19

    Once it run to exhaustion, asking for more elements always raises
    StopIteration:

    py> next(g)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    StopIteration

    Try something like this:

    results = x.scanString(filedata)
    for result in results:
    print result

    See http://docs.python.org/tutorial/classes.html#generators

    --
    Gabriel Genellina
     
    Gabriel Genellina, Feb 13, 2010
    #1
    1. Advertising

  2. Gabriel Genellina

    Paul McGuire Guest

    On Feb 12, 6:41 pm, "Gabriel Genellina" <>
    wrote:
    > En Fri, 12 Feb 2010 10:41:40 -0300, Eknath Venkataramani  
    > <> escribió:
    >
    > > I am trying to write a parser in pyparsing.
    > > Help Me.http://paste.pocoo.org/show/177078/is the code and this is  
    > > input
    > > file:http://paste.pocoo.org/show/177076/.
    > > I get output as:
    > > <generator object at 0xb723b80c>

    >
    > There is nothing wrong with pyparsing here. scanString() returns a  
    > generator, like this:
    >
    > py> g = (x for x in range(20) if x % 3 == 1)
    > py> g
    > <generator object <genexpr> at 0x00E50D78>
    >

    Unfortunately, your grammar doesn't match the input text, so your
    generator doesn't return anything.

    I think you are taking sort of brute force approach to this problem,
    and you need to think a little more abstractly. You can't just pick a
    fragment and then write an expression for it, and then the next and
    then stitch them together - well you *can* but it helps to think both
    abstract and concrete at the same time.

    With the exception of your one key of "\'", this is a pretty basic
    recursive grammar. Recursive grammars are a little complicated to
    start with, so I'll start with a non-recursive part. And I'll work
    more bottom-up or inside-out.

    Let's start by looking at these items:

    count => 8,
    baajaar => 0.87628353,
    kiraae => 0.02341598,
    lii => 0.02178813,
    adr => 0.01978462,
    gyiimn => 0.01765590,

    Each item has a name (which you called "eng", so I'll keep that
    expression), a '=>' and *something*. In the end, we won't really care
    about the '=>' strings, they aren't really part of the keys or the
    associated values, they are just delimiting strings - they are
    important during parsing, but afterwards we don't really care about
    them. So we'll start with a pyparsing expression for this:

    keyval = eng + Suppress('=>') + something

    Sometimes, the something is an integer, sometimes it's a floating
    point number. I'll define some more generic forms for these than your
    original number, and a separate expression for a real number:

    integer = Combine(Optional('-') + Word(nums))
    realnum = Combine(Optional('-') + Word(nums) + '.' + Word(nums))

    When we parse for these two, we need to be careful to check for a
    realnum before an integer, so that we don't accidentally parse the
    leading of "3.1415" as the integer "3".

    something = realnum | integer

    So now we can parse this fragment using a delimitedList expression
    (which takes care of the intervening commas, and also suppresses them
    from the results:

    filedata = """
    count => 8,
    baajaar => 0.87628353,
    kiraae => 0.02341598,
    lii => 0.02178813,
    adr => 0.01978462,
    gyiimn => 0.01765590,"""
    print delimitedList(keyval).parseString(filedata)

    Gives:
    ['count', '8', 'baajaar', '0.87628353', 'kiraae', '0.02341598',
    'lii', '0.02178813', 'adr', '0.01978462', 'gyiimn', '0.01765590']

    Right off the bat, we see that we want a little more structure to
    these results, so that the keys and values are grouped naturally by
    the parser. The easy way to do this is with Group, as in:

    keyval = Group(eng + Suppress('=>') + something)

    With this one change, we now get:

    [['count', '8'], ['baajaar', '0.87628353'],
    ['kiraae', '0.02341598'], ['lii', '0.02178813'],
    ['adr', '0.01978462'], ['gyiimn', '0.01765590']]

    Now we need to add the recursive part of your grammar. A nested input
    looks like:

    confident => {
    count => 4,
    trans => {
    ashhvsht => 0.75100505,
    phraarmnbh => 0.08341708,
    },
    },

    So in addition to integers and reals, our "something" could also be a
    nested list of keyvals:

    something = realnum | integer | (lparen + delimitedList(keyval) +
    rparen)

    This is *almost* right, with just a couple of tweaks:
    - the list of keyvals may have a comma after the last item before the
    closing '}'
    - we really want to suppress the opening and closing braces (lparen
    and rparen)
    - for similar structure reasons, we'll enclose the list of keyvals in
    a Group to retain the data hierarchy

    lparen,rparen = map(Suppress, "{}")
    something = realnum | integer |
    Group(lparen + delimitedList(keyval) + Optional(',') + rparen)

    The recursive problem is that we have defined keyval using something,
    and something using keyval. You can't do that in Python. So we use
    the pyparsing class Forward to "forward" declare something:

    something = Forward()
    keyval = Group(eng + Suppress('=>') + something)

    To define something as a Forward, we use the '<<' shift operator:

    something << (realnum | integer |
    Group(lparen + delimitedList(keyval) + Optional(',') +
    rparen))

    Our grammar now looks like:

    lparen,rparen = map(Suppress, "{}")

    something = Forward()
    keyval = Group(eng + Suppress('=>') + something)
    something << (realnum | integer |
    Group(lparen + delimitedList(keyval) + Optional(',') +
    rparen))

    To parse your entire input file, use a delimitedList(keyval)

    results = delimitedList(keyval).parseString(filedata)

    (There is one problem - one of your keynames is "\'". I don't know if
    this is a typo or intentional. If you need to accommodate even this
    as a keyname, just change your definition of eng to Word(alphas
    +r"\'").)

    Now if I parse your original string, I get (using the pprint module to
    format the results):

    [['markets',
    [['count', '8'],
    ['trans',
    [['baajaar', '0.87628353'],
    ['kiraae', '0.02341598'],
    ['lii', '0.02178813'],
    ['adr', '0.01978462'],
    ['gyiimn', '0.01765590'],
    ['baaaaromn', '0.01765590'],
    ['sdk', '0.01728024'],
    ['kaanuun', '0.00613574'],
    ',']],
    ',']],
    ['confident',
    [['count', '4'],
    ['trans',
    [['ashhvsht', '0.75100505'],
    ['phraarmnbh', '0.08341708'],
    ['athmvishhvaas', '0.08090452'],
    ['milte', '0.03768845'],
    ['utnii', '0.02110553'],
    ['anaa', '0.01432161'],
    ['jitne', '0.01155779'],
    ',']],
    ',']],
    ['consumers',
    [['count', '34'],
    ['trans',
    [['upbhokhtaaomn', '0.48493883'],
    ['upbhokhtaa', '0.27374792'],
    ['zrurtomn', '0.02753605'],
    ['suuchnaa', '0.02707965'],
    ['ghraahkomn', '0.02580174'],
    ['ne', '0.02574089'],
    ["\\'", '0.01947301'],
    ['jnmt', '0.01527414'],
    ',']],
    ',']]]

    But there is one more card up pyparsing's sleeve. Just as your
    original parser used "english" to apply a results name to your keys,
    it would be nice if our parser would return not a list of key-value
    pairs, but an actual dict-like object. Pyparsing's Dict class
    enhances the results in just this way. Use Dict to wrap our
    repetitive structures, and it will automatically define results names
    for us, reading the first element of each group as the key, and the
    remaining items in the group as the value:

    something << (realnum | integer |
    Dict(lparen + delimitedList(keyval) +
    Optional(',').suppress() + rparen))

    results = Dict(delimitedList(keyval)).parseString(filedata)
    print results.dump()

    Gives this hierarchical structure:

    - confident:
    - count: 4
    - trans:
    - anaa: 0.01432161
    - ashhvsht: 0.75100505
    - athmvishhvaas: 0.08090452
    - jitne: 0.01155779
    - milte: 0.03768845
    - phraarmnbh: 0.08341708
    - utnii: 0.02110553
    - consumers:
    - count: 34
    - trans:
    - \': 0.01947301
    - ghraahkomn: 0.02580174
    - jnmt: 0.01527414
    - ne: 0.02574089
    - suuchnaa: 0.02707965
    - upbhokhtaa: 0.27374792
    - upbhokhtaaomn: 0.48493883
    - zrurtomn: 0.02753605
    - markets:
    - count: 8
    - trans:
    - adr: 0.01978462
    - baaaaromn: 0.01765590
    - baajaar: 0.87628353
    - gyiimn: 0.01765590
    - kaanuun: 0.00613574
    - kiraae: 0.02341598
    - lii: 0.02178813
    - sdk: 0.01728024

    You can access these fields by name like dict elements:

    print results.keys()
    print results["confident"].keys()
    print results["confident"]["trans"]["jitne"]

    If the names are valid Python identifiers (which "\'" is *not*), you
    can access their fields like attributes of an object:

    print results.confident.trans.jitne
    for k in results.keys():
    print k, results[k].count

    Prints:

    ['confident', 'markets', 'consumers']
    ['count', 'trans']
    0.01155779
    0.01155779
    confident 4
    markets 8
    consumers 34

    I've posted the full program at http://pyparsing.pastebin.com/f1d0e2182.

    Welcome to pyparsing!

    -- Paul
     
    Paul McGuire, Feb 13, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul McGuire
    Replies:
    0
    Views:
    282
    Paul McGuire
    Dec 24, 2003
  2. Paul McGuire

    ANN: pyparsing 1.0.4 released

    Paul McGuire, Jan 9, 2004, in forum: Python
    Replies:
    0
    Views:
    329
    Paul McGuire
    Jan 9, 2004
  3. Paul McGuire
    Replies:
    1
    Views:
    279
    Dan Dang Griffith
    Apr 28, 2004
  4. =?iso-8859-2?q?Bo=B9tjan_Jerko?=

    pyparsing

    =?iso-8859-2?q?Bo=B9tjan_Jerko?=, May 13, 2004, in forum: Python
    Replies:
    4
    Views:
    533
    =?iso-8859-2?q?Bo=B9tjan_Jerko?=
    May 14, 2004
  5. Steve
    Replies:
    3
    Views:
    528
    Paul McGuire
    Sep 12, 2007
Loading...

Share This Page