Accessing a parse tree

Discussion in 'Python' started by Clarendon, Apr 16, 2009.

  1. Clarendon

    Clarendon Guest

    Hello!

    I need a program that accesses a parse tree based on the designated
    words (terminals) within the tree. For instance, in:

    I came a long way in changing my habit.

    (ROOT
    (S
    (NP (PRP I))
    (VP (VBD came)
    (NP (DT a) (JJ long) (NN way))
    (PP (IN in)
    (S
    (VP (VBG changing)
    (NP (PRP$ my) (NN habit))))))


    the designated words are "a long way". I need the program to recognize
    how many parentheses there are after them. Currently two: NN way)).
    Then I need it to see how many parentheses there are before it.
    Currently there are two as well: (NP (DT. Then the program should that
    the designated wordssee are followed by (PP (IN in) and then by (S
    (VP (VBG.

    I looked at the NLTK Tree class but it does not seem to have a method
    that works with designated words. Is there some kind of tree navigator
    that does something like this? If I need to write one myself, I would
    appreciate any tips about where to start.

    Thanks.
    Clarendon, Apr 16, 2009
    #1
    1. Advertising

  2. Clarendon

    John Machin Guest

    On Apr 17, 8:55 am, Clarendon <> wrote:
    > Hello!
    >
    > I need a program that accesses a parse tree based on the designated
    > words (terminals) within the tree. For instance, in:
    >
    > I came a long way in changing my habit.
    >
    > (ROOT
    >   (S
    >     (NP (PRP I))
    >     (VP (VBD came)
    >       (NP (DT a) (JJ long) (NN way))
    >       (PP (IN in)
    >         (S
    >           (VP (VBG changing)
    >             (NP (PRP$ my) (NN habit))))))
    >
    > the designated words are "a long way". I need the program to recognize
    > how many parentheses there are after them. Currently two: NN way)).
    > Then I need it to see how many parentheses there are before it.
    > Currently there are two as well: (NP (DT.


    Why is the answer not (S (VP (NP (DT ? You may need to explain what
    you mean by "before" and "after" ... also the parentheses are an
    artifact of this particular method of representing a parse tree. What
    in general terms are you trying to do?

    > Then the program should


    some text is missing here

    > that
    > the designated wordssee are followed by (PP (IN in) and then by  (S


    what is "wordssee"?

    > (VP (VBG.
    >
    > I looked at the NLTK Tree class but it does not seem to have a method
    > that works with designated words. Is there some kind of tree navigator
    > that does something like this? If I need to write one myself, I would
    > appreciate any tips about where to start.


    Having a clear statement of requirements would make a good start.
    John Machin, Apr 17, 2009
    #2
    1. Advertising

  3. Clarendon

    Aaron Brady Guest

    On Apr 16, 10:16 pm, John Machin <> wrote:
    > On Apr 17, 8:55 am, Clarendon <> wrote:
    >
    >
    >
    > > Hello!

    >
    > > I need a program that accesses a parse tree based on the designated
    > > words (terminals) within the tree. For instance, in:

    >
    > > I came a long way in changing my habit.

    >
    > > (ROOT
    > >   (S
    > >     (NP (PRP I))
    > >     (VP (VBD came)
    > >       (NP (DT a) (JJ long) (NN way))
    > >       (PP (IN in)
    > >         (S
    > >           (VP (VBG changing)
    > >             (NP (PRP$ my) (NN habit))))))

    >
    > > the designated words are "a long way". I need the program to recognize
    > > how many parentheses there are after them. Currently two: NN way)).
    > > Then I need it to see how many parentheses there are before it.
    > > Currently there are two as well: (NP (DT.

    >
    > Why is the answer not (S (VP (NP (DT ? You may need to explain what
    > you mean by "before" and "after" ... also the parentheses are an
    > artifact of this particular method of representing a parse tree. What
    > in general terms are you trying to do?


    It sounds like the OP wants a 'flatten' function: a tuple of tuple A,
    where tuple A is ( category, word, ref to original, parent index,
    depth ). Unproduced:

    >>> x= flatten( sentence_representation )
    >>> x[ :9 ]

    ( 'ROOT', None, <object>, None, 0 )
    ( 'S', None, <object>, 0, 1 )
    ( 'NP', None, <object>, 1, 2 )
    ( 'PRP', 'I', <object>, 2, 3 )
    ( 'VP', None, <object>, 1, 2 )
    ( 'VBD', 'came', <object>, 4, 2 )
    ( 'NP', None, <object>, 4, 3 )
    ( 'DT', 'a', <object>, 6, 4 )
    ( 'JJ', 'long', <object>, 6, 4 )
    ( 'NN', 'way', <object>, 6, 4 )

    I'm not certain it's accurate. The depth field is redundant but may
    be useful to have.
    Aaron Brady, Apr 17, 2009
    #3
  4. Clarendon

    Clarendon Guest

    Thank you very much for this information. It seems to point me to the
    right direction. However, I do not fully understand the flatten
    function and its output. Some indices seem to be inaccurate. I tried
    to find this function at nltk.tree.Tree.flatten, but it returns a
    flattened tree, not a tuple.

    So your flatten function must be a different one, and it's not one of
    the builtins, either. Could you tell me where I can find the
    documentation about this flatten function?
    Clarendon, Apr 17, 2009
    #4
  5. Clarendon

    Clarendon Guest

    Dear John Machin

    So sorry about the typo. It should be: "the program should *see* that
    the designated *words* are..."

    "a long way" has two parentheses to the left -- (VP (DT -- before it
    hits a separate group -- VBD came). If there are three parenthesis,
    for instance (NP, this will means that what follows "a long way" is a
    modifier of "a long way", as illustrated below:

    (VP (VBD came)
    (NP
    (NP (DT a) (JJ long) (NN way))
    (PP (IN in)

    So I am trying to capture the grammatical relations following the
    designated words.
    Clarendon, Apr 17, 2009
    #5
  6. Clarendon

    Aaron Brady Guest

    On Apr 17, 4:03 am, Clarendon <> wrote:
    > Thank you very much for this information. It seems to point me to the
    > right direction. However, I do not fully understand the flatten
    > function and its output. Some indices seem to be inaccurate. I tried
    > to find this function at nltk.tree.Tree.flatten, but it returns a
    > flattened tree, not a tuple.
    >
    > So your flatten function must be a different one, and it's not one of
    > the builtins, either. Could you tell me where I can find the
    > documentation about this flatten function?


    No, it is a different one. I don't even have it. We'd have to write
    it.

    The indices weren't included in the flattened tree, but if you're
    writing it, it can.

    0: ( 'ROOT', None, <object>, None --no parent--, 0 )
    1: ( 'S', None, <object>, 0 --parent is 'ROOT'--, 1 )
    2: ( 'NP', None, <object>, 1 --parent is 'S'--, 2 )
    3: ( 'PRP', 'I', <object>, 2 --parent is 'NP'--, 3 )
    4: ( 'VP', None, <object>, 1 --parent is 'S', 2 )
    5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, 2 )

    I screwed up the 'depth' field on #5. It should be:
    5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, **3** )

    Otherwise I'm not sure what you mean by 'indices seem to be
    inaccurate'. I'm still not completely sure though. After all, I did
    it by hand, not by program.

    If your package comes with a flatten function, it would be a good
    place to start. Flatten functions can get hairy. What is its code,
    and what is its output?

    Here's an example:

    >>> a= [ 'p', [ [ 'q', 'r' ], 's', 't' ], 'u' ]
    >>> a

    ['p', [['q', 'r'], 's', 't'], 'u']
    >>> def flatten( x ):

    .... for y in x:
    .... if isinstance( y, list ):
    .... for z in flatten( y ):
    .... yield z
    .... else:
    .... yield y
    ....
    >>> list( flatten( a ) )

    ['p', 'q', 'r', 's', 't', 'u']
    Aaron Brady, Apr 17, 2009
    #6
  7. Clarendon

    John Machin Guest

    On 17/04/2009 7:32 PM, Clarendon wrote:
    > Dear John Machin


    I presume that you replied to me instead of the list accidentally.

    >
    > So sorry about the typo. It should be: "the program should *see* that
    > the designated *words* are..."
    >
    > "a long way" has two parentheses to the left -- (VP (DT -- before it
    > hits a separate group -- VBD came).


    Like I said, the parentheses are an artifact of one particular visual
    representation of the parse tree. Your effort at clarification has
    introduced new unexplained terminology ("separate group"). BTW if you
    plan to persist with parentheses, you might at least display the tree in
    a somewhat more consistent fashion, discovering in the process that you
    are two parentheses short:
    (ROOT
    (S
    (NP
    (PRP I)
    )
    (VP
    (VBD came)
    (NP
    (DT a)
    (JJ long)
    (NN way)
    )
    (PP
    (IN in)
    (S
    (VP
    (VBG changing)
    (NP
    (PRP$ my)
    (NN habit)
    )
    )
    )
    )
    )
    Now look at this:
    ROOT
    S
    NP
    PRP I
    VP
    VBD came
    NP
    DT a
    JJ long
    NN way
    PP
    IN in
    etc etc
    No parentheses, and no loss of information.

    In fact if you keep the parentheses and lose all whitespace except a
    space between each node-type an a terminal word, you'll see that the
    parenthesis notation is just one way of serialising the tree.

    You have a tree structure, with the parsed information built on top of
    the words (terminals). A very quick flip through the NLTK tutorial gave
    me the impression that it would be highly unlikely not to have all you
    need -- and a bazillion other things, which is probably why you can't
    find what you want :) I certainly saw having parents mentioned as an option

    Suggestions:
    1. Get a pencil and a piece of paper, write "ROOT" at the top in the
    centre, and write "I came a long way in ......" spaced across the
    bottom. Fill in the parse tree.
    2. Express your requirement in terms of moving around the tree,
    following pointers to parent, left/elder sibling (if any), right/younger
    sibling (if any), and children. E.g. the 3 parse nodes for "a long way"
    are "DT JJ NN" and their parent is "NP". NP's left sibling is a VBD node
    ("came") and its right sibling is a PP ("in .....")
    3. Then have another look at the NLTK docs
    4. Ask questions on the NLTK mailing list.

    HTH,
    John
    John Machin, Apr 17, 2009
    #7
  8. Clarendon

    Aaron Brady Guest

    On Apr 17, 8:22 am, John Machin <> wrote:
    > On 17/04/2009 7:32 PM, Clarendon wrote:
    >
    > > Dear John Machin

    >
    > I presume that you replied to me instead of the list accidentally.
    >
    >
    >
    > > So sorry about the typo. It should be: "the program should *see* that
    > > the designated *words* are..."

    >
    > > "a long way" has two parentheses to the left -- (VP (DT -- before it
    > > hits a separate group -- VBD came).

    >

    snip
    > need -- and a bazillion other things, which is probably why you can't
    > find what you want :) I certainly saw having parents mentioned as an option


    Is it opt-in or opt-out?
    Aaron Brady, Apr 17, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. asahin

    build control tree, parse aspx page

    asahin, Sep 2, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    1,125
    asahin
    Sep 2, 2005
  2. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,110
  3. Chad Whitacre

    parse tree has symbols not in the grammar?

    Chad Whitacre, Apr 27, 2005, in forum: Python
    Replies:
    0
    Views:
    329
    Chad Whitacre
    Apr 27, 2005
  4. Replies:
    19
    Views:
    1,120
    Daniel Vallstrom
    Mar 15, 2005
  5. 7stud --

    optparse: parse v. parse! ??

    7stud --, Feb 20, 2008, in forum: Ruby
    Replies:
    3
    Views:
    183
    7stud --
    Feb 20, 2008
Loading...

Share This Page