Recommended data structure

Discussion in 'Java' started by Ike, Jun 7, 2004.

  1. Ike

    Ike Guest

    I have a seemingly unusual data structure and I was hoping someone could
    suggest how best to "hold" it in memory, in Java.

    Essentially, I am parsing swaths of text, into sentences, where each word is
    labelled with it's part of speech (noun, verb, etc), and the sentence then
    "diagrammed," much as you may have done in grade school. Not diagrammed on
    the screen, per se, but, broken down into a tree (I am actually using a
    JTree for this at the moment), where, say, the sentence branches into both a
    subject and a predicate, and under the predicate branch, there is a verb,
    and under the verb branch, there may be one or more 'qualifiers' - in the
    case of a verb, a qualifier would be an adverb.

    The tree can therefore, depending on home much text is input, become huge.

    I also need to traverse and search the tree, i.g., find all sentences that
    use a certain noun, etc.

    As I said, I presently hold this in a JTree, and cannot do that for long. I
    also would like to be able to store the tree from one session to the next.

    How would a seasoned Java programmer appraoch this? Thank you, Ike
     
    Ike, Jun 7, 2004
    #1
    1. Advertising

  2. Ike

    Phillip Lord Guest

    >>>>> "Ike" == Ike <> writes:

    Ike> I have a seemingly unusual data structure and I was hoping
    Ike> someone could suggest how best to "hold" it in memory, in Java.

    Ike> Essentially, I am parsing swaths of text, into sentences, where
    Ike> each word is labelled with it's part of speech (noun, verb,
    Ike> etc), and the sentence then "diagrammed," much as you may have
    Ike> done in grade school. Not diagrammed on the screen, per se,
    Ike> but, broken down into a tree (I am actually using a JTree for
    Ike> this at the moment), where, say, the sentence branches into
    Ike> both a subject and a predicate, and under the predicate branch,
    Ike> there is a verb, and under the verb branch, there may be one or
    Ike> more 'qualifiers' - in the case of a verb, a qualifier would be
    Ike> an adverb.

    Ike> The tree can therefore, depending on home much text is input,
    Ike> become huge.

    Ike> I also need to traverse and search the tree, i.g., find all
    Ike> sentences that use a certain noun, etc.

    Ike> As I said, I presently hold this in a JTree, and cannot do that
    Ike> for long. I also would like to be able to store the tree from
    Ike> one session to the next.

    Ike> How would a seasoned Java programmer appraoch this? Thank you,
    Ike> Ike


    Look for someone who has already implemented this for me.

    You might want to have a look at GATE. It already implements things
    like parts-of-speech taggers, and probably some functionality for
    search over it.

    http://gate.ac.uk/


    Cheers

    Phil
     
    Phillip Lord, Jun 7, 2004
    #2
    1. Advertising

  3. Ike

    ABoyne Guest


    > The tree can therefore, depending on home much text is input, become
    > huge.
    >
    > .... the tree,....
    > ....the tree from one session...
    > How would a seasoned Java programmer appraoch this? Thank you, Ike
    >
    >
    >


    XML?
     
    ABoyne, Jun 7, 2004
    #3
  4. Ike

    Andy Fish Guest

    "ABoyne" <> wrote in message
    news:Xns9501B29AEFF9ABOYNEUKIBMCOM@9.20.142.8...
    >
    > > The tree can therefore, depending on home much text is input, become
    > > huge.
    > >
    > > .... the tree,....
    > > ....the tree from one session...
    > > How would a seasoned Java programmer appraoch this? Thank you, Ike
    > >
    > >
    > >

    >
    > XML?
    >


    Unfortunately, if the tree is large, manipulating it in memory using XML
    will be far more memory hungry than other simpler data structures.

    I would go for a combined approach. create the structure in memory using
    "common sense" objects, e.g. one for a sentence which has instance variables
    pointing to verb phrase, noun phrase etc. so you will build a map in memory
    of the sentenctes.

    where you need efficient access to, say, all sentences containing a certain
    noun, back up the main data structure with other indexes, say a hashmap
    where the key is a noun and the value is an ArrayList of all the sentences
    that use it. there will be a trade-off between memory required and work
    involved in maintaining the index vs. speed of access.

    for storing the map and reloading it later, you can either use java
    serialization, or convert it to XML. the former is almost certainly a lot
    less work and will take up less space, but with XML you will have the
    advantage of being able to process it with other tools

    if the amount of data is huge, store it in a RDBMS. at the expense of speed,
    you will be able to use SQL commands to do the analysis
     
    Andy Fish, Jun 7, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. manstey
    Replies:
    5
    Views:
    389
    Paul McGuire
    May 3, 2006
  2. ozgwei
    Replies:
    4
    Views:
    3,437
  3. Excluded_Middle

    Pointers to structure and array of structure.

    Excluded_Middle, Oct 24, 2004, in forum: C Programming
    Replies:
    4
    Views:
    757
    Martin Ambuhl
    Oct 26, 2004
  4. Wolfgang Meiners
    Replies:
    6
    Views:
    334
    Wolfgang Meiners
    May 3, 2010
  5. A
    Replies:
    27
    Views:
    1,602
    Jorgen Grahn
    Apr 17, 2011
Loading...

Share This Page