Recommended data structure

I

Ike

I have a seemingly unusual data structure and I was hoping someone could
suggest how best to "hold" it in memory, in Java.

Essentially, I am parsing swaths of text, into sentences, where each word is
labelled with it's part of speech (noun, verb, etc), and the sentence then
"diagrammed," much as you may have done in grade school. Not diagrammed on
the screen, per se, but, broken down into a tree (I am actually using a
JTree for this at the moment), where, say, the sentence branches into both a
subject and a predicate, and under the predicate branch, there is a verb,
and under the verb branch, there may be one or more 'qualifiers' - in the
case of a verb, a qualifier would be an adverb.

The tree can therefore, depending on home much text is input, become huge.

I also need to traverse and search the tree, i.g., find all sentences that
use a certain noun, etc.

As I said, I presently hold this in a JTree, and cannot do that for long. I
also would like to be able to store the tree from one session to the next.

How would a seasoned Java programmer appraoch this? Thank you, Ike
 
P

Phillip Lord

Ike> I have a seemingly unusual data structure and I was hoping
Ike> someone could suggest how best to "hold" it in memory, in Java.

Ike> Essentially, I am parsing swaths of text, into sentences, where
Ike> each word is labelled with it's part of speech (noun, verb,
Ike> etc), and the sentence then "diagrammed," much as you may have
Ike> done in grade school. Not diagrammed on the screen, per se,
Ike> but, broken down into a tree (I am actually using a JTree for
Ike> this at the moment), where, say, the sentence branches into
Ike> both a subject and a predicate, and under the predicate branch,
Ike> there is a verb, and under the verb branch, there may be one or
Ike> more 'qualifiers' - in the case of a verb, a qualifier would be
Ike> an adverb.

Ike> The tree can therefore, depending on home much text is input,
Ike> become huge.

Ike> I also need to traverse and search the tree, i.g., find all
Ike> sentences that use a certain noun, etc.

Ike> As I said, I presently hold this in a JTree, and cannot do that
Ike> for long. I also would like to be able to store the tree from
Ike> one session to the next.

Ike> How would a seasoned Java programmer appraoch this? Thank you,
Ike> Ike


Look for someone who has already implemented this for me.

You might want to have a look at GATE. It already implements things
like parts-of-speech taggers, and probably some functionality for
search over it.

http://gate.ac.uk/


Cheers

Phil
 
A

ABoyne

The tree can therefore, depending on home much text is input, become
huge.

.... the tree,....
....the tree from one session...
How would a seasoned Java programmer appraoch this? Thank you, Ike

XML?
 
A

Andy Fish

ABoyne said:

Unfortunately, if the tree is large, manipulating it in memory using XML
will be far more memory hungry than other simpler data structures.

I would go for a combined approach. create the structure in memory using
"common sense" objects, e.g. one for a sentence which has instance variables
pointing to verb phrase, noun phrase etc. so you will build a map in memory
of the sentenctes.

where you need efficient access to, say, all sentences containing a certain
noun, back up the main data structure with other indexes, say a hashmap
where the key is a noun and the value is an ArrayList of all the sentences
that use it. there will be a trade-off between memory required and work
involved in maintaining the index vs. speed of access.

for storing the map and reloading it later, you can either use java
serialization, or convert it to XML. the former is almost certainly a lot
less work and will take up less space, but with XML you will have the
advantage of being able to process it with other tools

if the amount of data is huge, store it in a RDBMS. at the expense of speed,
you will be able to use SQL commands to do the analysis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top