initialize data structure or read text file

Discussion in 'Java' started by Joan, Aug 29, 2005.

  1. Joan

    Joan Guest

    Basic idea of program is to input a stock ticker symbol,
    like YHOO for Yahoo, and to return the name of the company.
    If I was going to do it a lot, then I suspect a hash of some
    kind would have fast retrieval of the info. However, if I expect
    to use it only a few times (one time initialization though)
    a hash would take up a long time compared to just reading
    the file a few times. Does anyone have experience to share?
    TIA
    Joan, Aug 29, 2005
    #1
    1. Advertising

  2. Joan

    Eric Sosman Guest

    Joan wrote:
    > Basic idea of program is to input a stock ticker symbol,
    > like YHOO for Yahoo, and to return the name of the company.
    > If I was going to do it a lot, then I suspect a hash of some
    > kind would have fast retrieval of the info. However, if I expect
    > to use it only a few times (one time initialization though)
    > a hash would take up a long time compared to just reading
    > the file a few times. Does anyone have experience to share?


    A 2GHz CPU clock ticks once every 0.5 nanoseconds.

    One 10-ms I/O operation takes 10,000,000 nanoseconds.

    ... but the only real answer is: Measure, measure,
    measure! "Premature optimization is the root of all evil."

    --
    Eric Sosman, Aug 29, 2005
    #2
    1. Advertising

  3. Joan

    Joan Guest

    "Eric Sosman" <> wrote in message
    news:df00ja$jtm$...
    >
    >
    > Joan wrote:
    >> Basic idea of program is to input a stock ticker symbol,
    >> like YHOO for Yahoo, and to return the name of the company.
    >> If I was going to do it a lot, then I suspect a hash of some
    >> kind would have fast retrieval of the info. However, if I
    >> expect
    >> to use it only a few times (one time initialization though)
    >> a hash would take up a long time compared to just reading
    >> the file a few times. Does anyone have experience to share?

    >
    > A 2GHz CPU clock ticks once every 0.5 nanoseconds.
    >
    > One 10-ms I/O operation takes 10,000,000 nanoseconds.
    >
    > ... but the only real answer is: Measure, measure,
    > measure! "Premature optimization is the root of all evil."
    >
    > --
    >
    >


    Thanks, but I was hoping to avoid having to program it twice.
    So I think I will go for what python calls a dictionary. Since I
    have strings I don't have to write an equals method.
    Joan, Aug 30, 2005
    #3
  4. Joan

    Roedy Green Guest

    On Mon, 29 Aug 2005 15:59:39 -0500, "Joan" <> wrote
    or quoted :

    >Basic idea of program is to input a stock ticker symbol,
    >like YHOO for Yahoo, and to return the name of the company.
    >If I was going to do it a lot, then I suspect a hash of some
    >kind would have fast retrieval of the info. However, if I expect
    >to use it only a few times (one time initialization though)
    >a hash would take up a long time compared to just reading
    >the file a few times. Does anyone have experience to share?


    A HashTable will always be faster than reading a file. Opening a file
    an a huge production.

    The alterative for short lists is a simple array you search, perhaps
    using binary search.

    A HashMap or HashTable is no big deal. It is very quick no matter how
    big it grows. It is low maintenance. It automatically grows as
    needed. It is a well understood programming idiom. About the only
    problem with them is efficiently initialising them in static init
    code. The brute force code generates a bloated class file.

    I have used three techniques for trimming that intialisation code down
    to size:

    1. serialising the HashMap, and loading it as a resource.

    2. writing the data out as Java source code for two arrays, key and
    value, that are then used to initialise the HashTable with a static
    init loop.

    3. loading the key-value pairs from a resource, a properties file or
    something very like a properties file.

    See http://mindprod.com/jgloss/hashtable.html
    http://mindprod.com/jgloss/hashcode.html
    http://mindprod.com/hashmap.html

    The key to speed is getting a decent hashCode implementation.

    At some point I should see if I can invent another sort of HashMap
    where you feed it objects with embedded keys, rather than separate
    key/value pairs. Then you could initialise them with an array of
    objects that implement a Keyed interface.





    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Aug 30, 2005
    #4
  5. Joan

    Joan Guest

    "Roedy Green" <> wrote in message
    news:...
    > On Mon, 29 Aug 2005 15:59:39 -0500, "Joan"
    > <> wrote
    > or quoted :
    >
    >>Basic idea of program is to input a stock ticker symbol,
    >>like YHOO for Yahoo, and to return the name of the company.
    >>If I was going to do it a lot, then I suspect a hash of some
    >>kind would have fast retrieval of the info. However, if I
    >>expect
    >>to use it only a few times (one time initialization though)
    >>a hash would take up a long time compared to just reading
    >>the file a few times. Does anyone have experience to share?

    >
    > A HashTable will always be faster than reading a file. Opening
    > a file
    > an a huge production.
    >
    > The alterative for short lists is a simple array you search,
    > perhaps
    > using binary search.
    >
    > A HashMap or HashTable is no big deal. It is very quick no
    > matter how
    > big it grows. It is low maintenance. It automatically grows
    > as
    > needed. It is a well understood programming idiom. About the
    > only
    > problem with them is efficiently initialising them in static
    > init
    > code. The brute force code generates a bloated class file.
    >
    > I have used three techniques for trimming that intialisation
    > code down
    > to size:
    >
    > 1. serialising the HashMap, and loading it as a resource.
    >
    > 2. writing the data out as Java source code for two arrays, key
    > and
    > value, that are then used to initialise the HashTable with a
    > static
    > init loop.
    >
    > 3. loading the key-value pairs from a resource, a properties
    > file or
    > something very like a properties file.
    >
    > See http://mindprod.com/jgloss/hashtable.html
    > http://mindprod.com/jgloss/hashcode.html
    > http://mindprod.com/hashmap.html
    >
    > The key to speed is getting a decent hashCode implementation.
    >
    > At some point I should see if I can invent another sort of
    > HashMap
    > where you feed it objects with embedded keys, rather than
    > separate
    > key/value pairs. Then you could initialise them with an array
    > of
    > objects that implement a Keyed interface.
    >


    Thanks for the discussion Roedy. What I ended up with is quite
    simple.
    (leaving out the Try/Catch stuff and opening the file)

    // here is how I init the tree
    TreeMap<String, String> tmNYSE = new TreeMap<String, String>();
    while ((line = in.readLine()) != null) {
    String[] t = line.split(":");
    tmNYSE.put(t[1], t[0]);
    }
    // here is how I use it
    // user types key into a text field
    String key = jTextField1.getText().toUpperCase();
    // get value from the tree
    String value = tmNYSE.get(key);
    // that's it -- and it runs quite fast.
    Joan, Aug 30, 2005
    #5
  6. Joan

    Roedy Green Guest

    On Mon, 29 Aug 2005 22:23:37 -0500, "Joan" <> wrote
    or quoted :

    >TreeMap<String, String> tmNYSE = new TreeMap<String, String>();


    Initialising a map from a file is not going to hurt you. But reading
    the entire file every lookup could be problem.

    TreeMaps are when you need to alphabetically traverse the keys, or use
    imprecise keys and you get the entry just before or after. If you
    have exact keys, HashMap is faster and smaller.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
    Roedy Green, Aug 30, 2005
    #6
  7. Joan

    Eric Sosman Guest

    Joan wrote:
    > "Eric Sosman" <> wrote in message
    > news:df00ja$jtm$...
    >
    >>
    >>Joan wrote:
    >>
    >>>Basic idea of program is to input a stock ticker symbol,
    >>>like YHOO for Yahoo, and to return the name of the company.
    >>>If I was going to do it a lot, then I suspect a hash of some
    >>>kind would have fast retrieval of the info. However, if I
    >>>expect
    >>>to use it only a few times (one time initialization though)
    >>>a hash would take up a long time compared to just reading
    >>>the file a few times. Does anyone have experience to share?

    >>
    >> A 2GHz CPU clock ticks once every 0.5 nanoseconds.
    >>
    >> One 10-ms I/O operation takes 10,000,000 nanoseconds.
    >>
    >> ... but the only real answer is: Measure, measure,
    >>measure! "Premature optimization is the root of all evil."
    >>
    >>--
    >>
    >>

    >
    >
    > Thanks, but I was hoping to avoid having to program it twice.
    > So I think I will go for what python calls a dictionary. Since I
    > have strings I don't have to write an equals method.


    Presumably your program does something in addition to
    looking up the ticker symbol. All those other bits and
    pieces can stay the same; the only piece you need to "program
    twice" is the chunk that translates symbols to names. That
    is, the rest of the program just calls a method like

    String symbolToName(String symbol) { ... }

    .... and the only thing you need to change is the way this
    method is implemented. The rest of the program remains
    the same.

    Even for this one method, "program it twice" is an
    overstatement. Both the read-the-file-each-time approach
    and the read-file-to-load-HashMap approach share the same
    file-reading and -parsing code. The first approach requires
    "extra" code to rewind before (or after) each search and to
    stop reading once the desired item is found; the second
    requires a couple of lines to manage the HashMap. Two
    minutes' work, tops. Five if you're reading the Javadoc
    over a 28000 bps dialup connection.

    --
    Eric Sosman, Aug 30, 2005
    #7
  8. Joan

    Joan Guest

    "Roedy Green" <> wrote in message
    news:...
    > On Mon, 29 Aug 2005 22:23:37 -0500, "Joan"
    > <> wrote
    > or quoted :
    >
    >>TreeMap<String, String> tmNYSE = new TreeMap<String, String>();

    >
    > Initialising a map from a file is not going to hurt you. But
    > reading
    > the entire file every lookup could be problem.
    >
    > TreeMaps are when you need to alphabetically traverse the keys,
    > or use
    > imprecise keys and you get the entry just before or after. If
    > you
    > have exact keys, HashMap is faster and smaller.


    I don't see this imprecise feature to TreeMap. If I spell a
    ticker wrong
    it doesn't give me a nearby value it gives me 'null'. The file is
    alphabetically
    organized so I was thinking about a binary search, but that level
    of
    detail in TreeMap doesn't seem to be accessible to the
    programmer.
    Joan, Aug 30, 2005
    #8
  9. Joan wrote:
    >
    > "Roedy Green" <> wrote in message
    > news:...
    >> TreeMaps are when you need to alphabetically traverse the keys, or use
    >> imprecise keys and you get the entry just before or after. If you
    >> have exact keys, HashMap is faster and smaller.

    >
    >
    > I don't see this imprecise feature to TreeMap. If I spell a ticker wrong
    > it doesn't give me a nearby value it gives me 'null'. The file is
    > alphabetically
    > organized so I was thinking about a binary search, but that level of
    > detail in TreeMap doesn't seem to be accessible to the programmer.


    I wouldn't think you'd need that level of access. Since the TreeMap
    implementation keeps the keys ordered, I would expect the containsKey
    method to do a search at least as efficiently as a binary search.

    BK
    Babu Kalakrishnan, Aug 31, 2005
    #9
  10. Joan

    Oliver Wong Guest

    "Joan" <> wrote in message
    news:...
    > The file is alphabetically
    > organized so I was thinking about a binary search, but that level of
    > detail in TreeMap doesn't seem to be accessible to the programmer.


    From the javadocs
    (http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):

    Red-Black tree based implementation of the SortedMap interface. This class
    guarantees that the map will be in ascending key order, sorted according to
    the natural order for the key's class (see Comparable), or by the comparator
    provided at creation time, depending on which constructor is used.

    This implementation provides guaranteed log(n) time cost for the
    containsKey, get, put and remove operations. Algorithms are adaptations of
    those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.

    - Oliver
    Oliver Wong, Aug 31, 2005
    #10
  11. Joan

    Joan Guest

    "Oliver Wong" <> wrote in message
    news:vRnRe.252008$on1.66289@clgrps13...
    > "Joan" <> wrote in message
    > news:...
    >> The file is alphabetically
    >> organized so I was thinking about a binary search, but that
    >> level of
    >> detail in TreeMap doesn't seem to be accessible to the
    >> programmer.

    >
    > From the javadocs
    > (http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):
    >
    > Red-Black tree based implementation of the SortedMap interface.
    > This class guarantees that the map will be in ascending key
    > order, sorted according to the natural order for the key's
    > class (see Comparable), or by the comparator provided at
    > creation time, depending on which constructor is used.
    >
    > This implementation provides guaranteed log(n) time cost for
    > the containsKey, get, put and remove operations. Algorithms are
    > adaptations of those in Cormen, Leiserson, and Rivest's
    > Introduction to Algorithms.
    >
    > - Oliver


    Thanks for the reference to the Cormen book. $76.00 at Amazon is
    a little
    steep for me right now, but my local library has the first
    edition (1990) so
    I'll go look at that to start.

    Here's an interesting tidbit. The authors page for the book
    http://theory.lcs.mit.edu/~clr/
    states, regarding the first edition, "First published in 1990."
    I cut/paste that quote so I would get it exactly right.
    So I wonder when the first edition was second published ;-)
    Joan, Aug 31, 2005
    #11
  12. Joan

    Joan Guest

    <snip>
    <toppost>
    <report>
    <title>
    ERRATA 1.2
    </title>
    <length>
    27 pages
    </length
    </report>
    <comment>
    The errata file for the first edition is 27 pages.
    </comment>
    </toppost>
    </snip>

    > Thanks for the reference to the Cormen book. $76.00 at Amazon
    > is a little
    > steep for me right now, but my local library has the first
    > edition (1990) so
    > I'll go look at that to start.
    >
    > Here's an interesting tidbit. The authors page for the book
    > http://theory.lcs.mit.edu/~clr/
    > states, regarding the first edition, "First published in 1990."
    > I cut/paste that quote so I would get it exactly right.
    > So I wonder when the first edition was second published ;-)
    Joan, Aug 31, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ramprasad A Padmanabhan

    how to initialize a structure

    Ramprasad A Padmanabhan, Oct 29, 2003, in forum: C Programming
    Replies:
    6
    Views:
    19,920
  2. Chad
    Replies:
    16
    Views:
    846
    Realos Osis
    Dec 14, 2005
  3. Replies:
    3
    Views:
    1,519
  4. Replies:
    2
    Views:
    636
    John Harrison
    Nov 20, 2005
  5. meisterbartsch
    Replies:
    2
    Views:
    751
    meisterbartsch
    Jun 12, 2007
Loading...

Share This Page