Accessing a parse tree

C

Clarendon

Hello!

I need a program that accesses a parse tree based on the designated
words (terminals) within the tree. For instance, in:

I came a long way in changing my habit.

(ROOT
(S
(NP (PRP I))
(VP (VBD came)
(NP (DT a) (JJ long) (NN way))
(PP (IN in)
(S
(VP (VBG changing)
(NP (PRP$ my) (NN habit))))))


the designated words are "a long way". I need the program to recognize
how many parentheses there are after them. Currently two: NN way)).
Then I need it to see how many parentheses there are before it.
Currently there are two as well: (NP (DT. Then the program should that
the designated wordssee are followed by (PP (IN in) and then by (S
(VP (VBG.

I looked at the NLTK Tree class but it does not seem to have a method
that works with designated words. Is there some kind of tree navigator
that does something like this? If I need to write one myself, I would
appreciate any tips about where to start.

Thanks.
 
J

John Machin

Hello!

I need a program that accesses a parse tree based on the designated
words (terminals) within the tree. For instance, in:

I came a long way in changing my habit.

(ROOT
  (S
    (NP (PRP I))
    (VP (VBD came)
      (NP (DT a) (JJ long) (NN way))
      (PP (IN in)
        (S
          (VP (VBG changing)
            (NP (PRP$ my) (NN habit))))))

the designated words are "a long way". I need the program to recognize
how many parentheses there are after them. Currently two: NN way)).
Then I need it to see how many parentheses there are before it.
Currently there are two as well: (NP (DT.

Why is the answer not (S (VP (NP (DT ? You may need to explain what
you mean by "before" and "after" ... also the parentheses are an
artifact of this particular method of representing a parse tree. What
in general terms are you trying to do?
Then the program should

some text is missing here
that
the designated wordssee are followed by (PP (IN in) and then by  (S

what is "wordssee"?
(VP (VBG.

I looked at the NLTK Tree class but it does not seem to have a method
that works with designated words. Is there some kind of tree navigator
that does something like this? If I need to write one myself, I would
appreciate any tips about where to start.

Having a clear statement of requirements would make a good start.
 
A

Aaron Brady

Why is the answer not (S (VP (NP (DT ? You may need to explain what
you mean by "before" and "after" ... also the parentheses are an
artifact of this particular method of representing a parse tree. What
in general terms are you trying to do?

It sounds like the OP wants a 'flatten' function: a tuple of tuple A,
where tuple A is ( category, word, ref to original, parent index,
depth ). Unproduced:
x= flatten( sentence_representation )
x[ :9 ]
( 'ROOT', None, <object>, None, 0 )
( 'S', None, <object>, 0, 1 )
( 'NP', None, <object>, 1, 2 )
( 'PRP', 'I', <object>, 2, 3 )
( 'VP', None, <object>, 1, 2 )
( 'VBD', 'came', <object>, 4, 2 )
( 'NP', None, <object>, 4, 3 )
( 'DT', 'a', <object>, 6, 4 )
( 'JJ', 'long', <object>, 6, 4 )
( 'NN', 'way', <object>, 6, 4 )

I'm not certain it's accurate. The depth field is redundant but may
be useful to have.
 
C

Clarendon

Thank you very much for this information. It seems to point me to the
right direction. However, I do not fully understand the flatten
function and its output. Some indices seem to be inaccurate. I tried
to find this function at nltk.tree.Tree.flatten, but it returns a
flattened tree, not a tuple.

So your flatten function must be a different one, and it's not one of
the builtins, either. Could you tell me where I can find the
documentation about this flatten function?
 
C

Clarendon

Dear John Machin

So sorry about the typo. It should be: "the program should *see* that
the designated *words* are..."

"a long way" has two parentheses to the left -- (VP (DT -- before it
hits a separate group -- VBD came). If there are three parenthesis,
for instance (NP, this will means that what follows "a long way" is a
modifier of "a long way", as illustrated below:

(VP (VBD came)
(NP
(NP (DT a) (JJ long) (NN way))
(PP (IN in)

So I am trying to capture the grammatical relations following the
designated words.
 
A

Aaron Brady

Thank you very much for this information. It seems to point me to the
right direction. However, I do not fully understand the flatten
function and its output. Some indices seem to be inaccurate. I tried
to find this function at nltk.tree.Tree.flatten, but it returns a
flattened tree, not a tuple.

So your flatten function must be a different one, and it's not one of
the builtins, either. Could you tell me where I can find the
documentation about this flatten function?

No, it is a different one. I don't even have it. We'd have to write
it.

The indices weren't included in the flattened tree, but if you're
writing it, it can.

0: ( 'ROOT', None, <object>, None --no parent--, 0 )
1: ( 'S', None, <object>, 0 --parent is 'ROOT'--, 1 )
2: ( 'NP', None, <object>, 1 --parent is 'S'--, 2 )
3: ( 'PRP', 'I', <object>, 2 --parent is 'NP'--, 3 )
4: ( 'VP', None, <object>, 1 --parent is 'S', 2 )
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, 2 )

I screwed up the 'depth' field on #5. It should be:
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, **3** )

Otherwise I'm not sure what you mean by 'indices seem to be
inaccurate'. I'm still not completely sure though. After all, I did
it by hand, not by program.

If your package comes with a flatten function, it would be a good
place to start. Flatten functions can get hairy. What is its code,
and what is its output?

Here's an example:
a= [ 'p', [ [ 'q', 'r' ], 's', 't' ], 'u' ]
a ['p', [['q', 'r'], 's', 't'], 'u']
def flatten( x ):
.... for y in x:
.... if isinstance( y, list ):
.... for z in flatten( y ):
.... yield z
.... else:
.... yield y
....['p', 'q', 'r', 's', 't', 'u']
 
J

John Machin

Dear John Machin

I presume that you replied to me instead of the list accidentally.
So sorry about the typo. It should be: "the program should *see* that
the designated *words* are..."

"a long way" has two parentheses to the left -- (VP (DT -- before it
hits a separate group -- VBD came).

Like I said, the parentheses are an artifact of one particular visual
representation of the parse tree. Your effort at clarification has
introduced new unexplained terminology ("separate group"). BTW if you
plan to persist with parentheses, you might at least display the tree in
a somewhat more consistent fashion, discovering in the process that you
are two parentheses short:
(ROOT
(S
(NP
(PRP I)
)
(VP
(VBD came)
(NP
(DT a)
(JJ long)
(NN way)
)
(PP
(IN in)
(S
(VP
(VBG changing)
(NP
(PRP$ my)
(NN habit)
)
)
)
)
)
Now look at this:
ROOT
S
NP
PRP I
VP
VBD came
NP
DT a
JJ long
NN way
PP
IN in
etc etc
No parentheses, and no loss of information.

In fact if you keep the parentheses and lose all whitespace except a
space between each node-type an a terminal word, you'll see that the
parenthesis notation is just one way of serialising the tree.

You have a tree structure, with the parsed information built on top of
the words (terminals). A very quick flip through the NLTK tutorial gave
me the impression that it would be highly unlikely not to have all you
need -- and a bazillion other things, which is probably why you can't
find what you want :) I certainly saw having parents mentioned as an option

Suggestions:
1. Get a pencil and a piece of paper, write "ROOT" at the top in the
centre, and write "I came a long way in ......" spaced across the
bottom. Fill in the parse tree.
2. Express your requirement in terms of moving around the tree,
following pointers to parent, left/elder sibling (if any), right/younger
sibling (if any), and children. E.g. the 3 parse nodes for "a long way"
are "DT JJ NN" and their parent is "NP". NP's left sibling is a VBD node
("came") and its right sibling is a PP ("in .....")
3. Then have another look at the NLTK docs
4. Ask questions on the NLTK mailing list.

HTH,
John
 
A

Aaron Brady

I presume that you replied to me instead of the list accidentally.




snip
need -- and a bazillion other things, which is probably why you can't
find what you want :) I certainly saw having parents mentioned as an option

Is it opt-in or opt-out?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,573
Members
45,046
Latest member
Gavizuho

Latest Threads

Top