Accessing a parse tree

Clarendon · Apr 16, 2009

Hello!

I need a program that accesses a parse tree based on the designated
words (terminals) within the tree. For instance, in:

I came a long way in changing my habit.

(ROOT
(S
(NP (PRP I))
(VP (VBD came)
(NP (DT a) (JJ long) (NN way))
(PP (IN in)
(S
(VP (VBG changing)
(NP (PRP$ my) (NN habit))))))

the designated words are "a long way". I need the program to recognize
how many parentheses there are after them. Currently two: NN way)).
Then I need it to see how many parentheses there are before it.
Currently there are two as well: (NP (DT. Then the program should that
the designated wordssee are followed by (PP (IN in) and then by (S
(VP (VBG.

I looked at the NLTK Tree class but it does not seem to have a method
that works with designated words. Is there some kind of tree navigator
that does something like this? If I need to write one myself, I would
appreciate any tips about where to start.

Thanks.

John Machin · Apr 17, 2009

Hello!

I need a program that accesses a parse tree based on the designated
words (terminals) within the tree. For instance, in:

I came a long way in changing my habit.

(ROOT
(S
(NP (PRP I))
(VP (VBD came)
(NP (DT a) (JJ long) (NN way))
(PP (IN in)
(S
(VP (VBG changing)
(NP (PRP$ my) (NN habit))))))

the designated words are "a long way". I need the program to recognize
how many parentheses there are after them. Currently two: NN way)).
Then I need it to see how many parentheses there are before it.
Currently there are two as well: (NP (DT.

Why is the answer not (S (VP (NP (DT ? You may need to explain what
you mean by "before" and "after" ... also the parentheses are an
artifact of this particular method of representing a parse tree. What
in general terms are you trying to do?

Then the program should

some text is missing here

that
the designated wordssee are followed by (PP (IN in) and then by (S

what is "wordssee"?

(VP (VBG.

I looked at the NLTK Tree class but it does not seem to have a method
that works with designated words. Is there some kind of tree navigator
that does something like this? If I need to write one myself, I would
appreciate any tips about where to start.

Having a clear statement of requirements would make a good start.

Aaron Brady · Apr 17, 2009

Why is the answer not (S (VP (NP (DT ? You may need to explain what
you mean by "before" and "after" ... also the parentheses are an
artifact of this particular method of representing a parse tree. What
in general terms are you trying to do?

It sounds like the OP wants a 'flatten' function: a tuple of tuple A,
where tuple A is ( category, word, ref to original, parent index,
depth ). Unproduced:

x= flatten( sentence_representation )
x[ :9 ]

Click to expand...

Click to expand...

( 'ROOT', None, <object>, None, 0 )
( 'S', None, <object>, 0, 1 )
( 'NP', None, <object>, 1, 2 )
( 'PRP', 'I', <object>, 2, 3 )
( 'VP', None, <object>, 1, 2 )
( 'VBD', 'came', <object>, 4, 2 )
( 'NP', None, <object>, 4, 3 )
( 'DT', 'a', <object>, 6, 4 )
( 'JJ', 'long', <object>, 6, 4 )
( 'NN', 'way', <object>, 6, 4 )

I'm not certain it's accurate. The depth field is redundant but may
be useful to have.

Clarendon · Apr 17, 2009

Thank you very much for this information. It seems to point me to the
right direction. However, I do not fully understand the flatten
function and its output. Some indices seem to be inaccurate. I tried
to find this function at nltk.tree.Tree.flatten, but it returns a
flattened tree, not a tuple.

So your flatten function must be a different one, and it's not one of
the builtins, either. Could you tell me where I can find the
documentation about this flatten function?

Clarendon · Apr 17, 2009

Dear John Machin

So sorry about the typo. It should be: "the program should *see* that
the designated *words* are..."

"a long way" has two parentheses to the left -- (VP (DT -- before it
hits a separate group -- VBD came). If there are three parenthesis,
for instance (NP, this will means that what follows "a long way" is a
modifier of "a long way", as illustrated below:

(VP (VBD came)
(NP
(NP (DT a) (JJ long) (NN way))
(PP (IN in)

So I am trying to capture the grammatical relations following the
designated words.

Aaron Brady · Apr 17, 2009

Thank you very much for this information. It seems to point me to the
right direction. However, I do not fully understand the flatten
function and its output. Some indices seem to be inaccurate. I tried
to find this function at nltk.tree.Tree.flatten, but it returns a
flattened tree, not a tuple.

So your flatten function must be a different one, and it's not one of
the builtins, either. Could you tell me where I can find the
documentation about this flatten function?

No, it is a different one. I don't even have it. We'd have to write
it.

The indices weren't included in the flattened tree, but if you're
writing it, it can.

0: ( 'ROOT', None, <object>, None --no parent--, 0 )
1: ( 'S', None, <object>, 0 --parent is 'ROOT'--, 1 )
2: ( 'NP', None, <object>, 1 --parent is 'S'--, 2 )
3: ( 'PRP', 'I', <object>, 2 --parent is 'NP'--, 3 )
4: ( 'VP', None, <object>, 1 --parent is 'S', 2 )
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, 2 )

I screwed up the 'depth' field on #5. It should be:
5: ( 'VBD', 'came', <object>, 4 --parent is 'VP'--, **3** )

Otherwise I'm not sure what you mean by 'indices seem to be
inaccurate'. I'm still not completely sure though. After all, I did
it by hand, not by program.

If your package comes with a flatten function, it would be a good
place to start. Flatten functions can get hairy. What is its code,
and what is its output?

Here's an example:

a= [ 'p', [ [ 'q', 'r' ], 's', 't' ], 'u' ]
a ['p', [['q', 'r'], 's', 't'], 'u']
def flatten( x ):

Click to expand...

Click to expand...

.... for y in x:
.... if isinstance( y, list ):
.... for z in flatten( y ):
.... yield z
.... else:
.... yield y
....['p', 'q', 'r', 's', 't', 'u']

John Machin · Apr 17, 2009

Dear John Machin

I presume that you replied to me instead of the list accidentally.

So sorry about the typo. It should be: "the program should *see* that
the designated *words* are..."

"a long way" has two parentheses to the left -- (VP (DT -- before it
hits a separate group -- VBD came).

Like I said, the parentheses are an artifact of one particular visual
representation of the parse tree. Your effort at clarification has
introduced new unexplained terminology ("separate group"). BTW if you
plan to persist with parentheses, you might at least display the tree in
a somewhat more consistent fashion, discovering in the process that you
are two parentheses short:
(ROOT
(S
(NP
(PRP I)
)
(VP
(VBD came)
(NP
(DT a)
(JJ long)
(NN way)
)
(PP
(IN in)
(S
(VP
(VBG changing)
(NP
(PRP$ my)
(NN habit)
)
)
)
)
)
Now look at this:
ROOT
S
NP
PRP I
VP
VBD came
NP
DT a
JJ long
NN way
PP
IN in
etc etc
No parentheses, and no loss of information.

In fact if you keep the parentheses and lose all whitespace except a
space between each node-type an a terminal word, you'll see that the
parenthesis notation is just one way of serialising the tree.

You have a tree structure, with the parsed information built on top of
the words (terminals). A very quick flip through the NLTK tutorial gave
me the impression that it would be highly unlikely not to have all you
need -- and a bazillion other things, which is probably why you can't
find what you want

I certainly saw having parents mentioned as an option

Suggestions:
1. Get a pencil and a piece of paper, write "ROOT" at the top in the
centre, and write "I came a long way in ......" spaced across the
bottom. Fill in the parse tree.
2. Express your requirement in terms of moving around the tree,
following pointers to parent, left/elder sibling (if any), right/younger
sibling (if any), and children. E.g. the 3 parse nodes for "a long way"
are "DT JJ NN" and their parent is "NP". NP's left sibling is a VBD node
("came") and its right sibling is a PP ("in .....")
3. Then have another look at the NLTK docs
4. Ask questions on the NLTK mailing list.

HTH,
John

Aaron Brady · Apr 17, 2009

I presume that you replied to me instead of the list accidentally.

snip
need -- and a bazillion other things, which is probably why you can't
find what you want I certainly saw having parents mentioned as an option

Is it opt-in or opt-out?

Splitting Tree	2	Dec 2, 2012
How to convert an string like this into a tree Object?	7	Sep 15, 2009
How to parse a sentence using grammars provided by nltk?	0	Aug 12, 2010
Trying to parse a HUGE(1gb) xml file	41	Dec 20, 2010
For Peer Review	1	Apr 2, 2010
Radix Tree	1	Nov 7, 2005
Parsing HTML into a tree	1	Apr 25, 2007
is there an easy way to parse a nested list ?	2	Mar 15, 2009

Accessing a parse tree

Clarendon

John Machin

Aaron Brady

Clarendon

Clarendon

Aaron Brady

John Machin

Aaron Brady

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads