initialize data structure or read text file

J

Joan

Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?
TIA
 
E

Eric Sosman

Joan said:
Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?

A 2GHz CPU clock ticks once every 0.5 nanoseconds.

One 10-ms I/O operation takes 10,000,000 nanoseconds.

... but the only real answer is: Measure, measure,
measure! "Premature optimization is the root of all evil."
 
J

Joan

Eric Sosman said:
A 2GHz CPU clock ticks once every 0.5 nanoseconds.

One 10-ms I/O operation takes 10,000,000 nanoseconds.

... but the only real answer is: Measure, measure,
measure! "Premature optimization is the root of all evil."

Thanks, but I was hoping to avoid having to program it twice.
So I think I will go for what python calls a dictionary. Since I
have strings I don't have to write an equals method.
 
R

Roedy Green

Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?

A HashTable will always be faster than reading a file. Opening a file
an a huge production.

The alterative for short lists is a simple array you search, perhaps
using binary search.

A HashMap or HashTable is no big deal. It is very quick no matter how
big it grows. It is low maintenance. It automatically grows as
needed. It is a well understood programming idiom. About the only
problem with them is efficiently initialising them in static init
code. The brute force code generates a bloated class file.

I have used three techniques for trimming that intialisation code down
to size:

1. serialising the HashMap, and loading it as a resource.

2. writing the data out as Java source code for two arrays, key and
value, that are then used to initialise the HashTable with a static
init loop.

3. loading the key-value pairs from a resource, a properties file or
something very like a properties file.

See http://mindprod.com/jgloss/hashtable.html
http://mindprod.com/jgloss/hashcode.html
http://mindprod.com/hashmap.html

The key to speed is getting a decent hashCode implementation.

At some point I should see if I can invent another sort of HashMap
where you feed it objects with embedded keys, rather than separate
key/value pairs. Then you could initialise them with an array of
objects that implement a Keyed interface.
 
J

Joan

Roedy Green said:
A HashTable will always be faster than reading a file. Opening
a file
an a huge production.

The alterative for short lists is a simple array you search,
perhaps
using binary search.

A HashMap or HashTable is no big deal. It is very quick no
matter how
big it grows. It is low maintenance. It automatically grows
as
needed. It is a well understood programming idiom. About the
only
problem with them is efficiently initialising them in static
init
code. The brute force code generates a bloated class file.

I have used three techniques for trimming that intialisation
code down
to size:

1. serialising the HashMap, and loading it as a resource.

2. writing the data out as Java source code for two arrays, key
and
value, that are then used to initialise the HashTable with a
static
init loop.

3. loading the key-value pairs from a resource, a properties
file or
something very like a properties file.

See http://mindprod.com/jgloss/hashtable.html
http://mindprod.com/jgloss/hashcode.html
http://mindprod.com/hashmap.html

The key to speed is getting a decent hashCode implementation.

At some point I should see if I can invent another sort of
HashMap
where you feed it objects with embedded keys, rather than
separate
key/value pairs. Then you could initialise them with an array
of
objects that implement a Keyed interface.

Thanks for the discussion Roedy. What I ended up with is quite
simple.
(leaving out the Try/Catch stuff and opening the file)

// here is how I init the tree
TreeMap<String, String> tmNYSE = new TreeMap<String, String>();
while ((line = in.readLine()) != null) {
String[] t = line.split(":");
tmNYSE.put(t[1], t[0]);
}
// here is how I use it
// user types key into a text field
String key = jTextField1.getText().toUpperCase();
// get value from the tree
String value = tmNYSE.get(key);
// that's it -- and it runs quite fast.
 
R

Roedy Green

TreeMap<String, String> tmNYSE = new TreeMap<String, String>();

Initialising a map from a file is not going to hurt you. But reading
the entire file every lookup could be problem.

TreeMaps are when you need to alphabetically traverse the keys, or use
imprecise keys and you get the entry just before or after. If you
have exact keys, HashMap is faster and smaller.
 
E

Eric Sosman

Joan said:
Thanks, but I was hoping to avoid having to program it twice.
So I think I will go for what python calls a dictionary. Since I
have strings I don't have to write an equals method.

Presumably your program does something in addition to
looking up the ticker symbol. All those other bits and
pieces can stay the same; the only piece you need to "program
twice" is the chunk that translates symbols to names. That
is, the rest of the program just calls a method like

String symbolToName(String symbol) { ... }

.... and the only thing you need to change is the way this
method is implemented. The rest of the program remains
the same.

Even for this one method, "program it twice" is an
overstatement. Both the read-the-file-each-time approach
and the read-file-to-load-HashMap approach share the same
file-reading and -parsing code. The first approach requires
"extra" code to rewind before (or after) each search and to
stop reading once the desired item is found; the second
requires a couple of lines to manage the HashMap. Two
minutes' work, tops. Five if you're reading the Javadoc
over a 28000 bps dialup connection.
 
J

Joan

Roedy Green said:
Initialising a map from a file is not going to hurt you. But
reading
the entire file every lookup could be problem.

TreeMaps are when you need to alphabetically traverse the keys,
or use
imprecise keys and you get the entry just before or after. If
you
have exact keys, HashMap is faster and smaller.

I don't see this imprecise feature to TreeMap. If I spell a
ticker wrong
it doesn't give me a nearby value it gives me 'null'. The file is
alphabetically
organized so I was thinking about a binary search, but that level
of
detail in TreeMap doesn't seem to be accessible to the
programmer.
 
B

Babu Kalakrishnan

Joan said:
I don't see this imprecise feature to TreeMap. If I spell a ticker wrong
it doesn't give me a nearby value it gives me 'null'. The file is
alphabetically
organized so I was thinking about a binary search, but that level of
detail in TreeMap doesn't seem to be accessible to the programmer.

I wouldn't think you'd need that level of access. Since the TreeMap
implementation keeps the keys ordered, I would expect the containsKey
method to do a search at least as efficiently as a binary search.

BK
 
O

Oliver Wong

Joan said:
The file is alphabetically
organized so I was thinking about a binary search, but that level of
detail in TreeMap doesn't seem to be accessible to the programmer.

From the javadocs
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):

Red-Black tree based implementation of the SortedMap interface. This class
guarantees that the map will be in ascending key order, sorted according to
the natural order for the key's class (see Comparable), or by the comparator
provided at creation time, depending on which constructor is used.

This implementation provides guaranteed log(n) time cost for the
containsKey, get, put and remove operations. Algorithms are adaptations of
those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.

- Oliver
 
J

Joan

Oliver Wong said:
From the javadocs
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):

Red-Black tree based implementation of the SortedMap interface.
This class guarantees that the map will be in ascending key
order, sorted according to the natural order for the key's
class (see Comparable), or by the comparator provided at
creation time, depending on which constructor is used.

This implementation provides guaranteed log(n) time cost for
the containsKey, get, put and remove operations. Algorithms are
adaptations of those in Cormen, Leiserson, and Rivest's
Introduction to Algorithms.

- Oliver

Thanks for the reference to the Cormen book. $76.00 at Amazon is
a little
steep for me right now, but my local library has the first
edition (1990) so
I'll go look at that to start.

Here's an interesting tidbit. The authors page for the book
http://theory.lcs.mit.edu/~clr/
states, regarding the first edition, "First published in 1990."
I cut/paste that quote so I would get it exactly right.
So I wonder when the first edition was second published ;-)
 
J

Joan

<snip>
<toppost>
<report>
<title>
ERRATA 1.2
</title>
<length>
27 pages
</length
</report>
<comment>
The errata file for the first edition is 27 pages.
</comment>
</toppost>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top