initialize data structure or read text file

Joan · Aug 29, 2005

Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?
TIA

Eric Sosman · Aug 29, 2005

Joan said:
Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?

A 2GHz CPU clock ticks once every 0.5 nanoseconds.

One 10-ms I/O operation takes 10,000,000 nanoseconds.

... but the only real answer is: Measure, measure,
measure! "Premature optimization is the root of all evil."

Joan · Aug 30, 2005

Eric Sosman said:
A 2GHz CPU clock ticks once every 0.5 nanoseconds.

One 10-ms I/O operation takes 10,000,000 nanoseconds.

... but the only real answer is: Measure, measure,
measure! "Premature optimization is the root of all evil."

Thanks, but I was hoping to avoid having to program it twice.
So I think I will go for what python calls a dictionary. Since I
have strings I don't have to write an equals method.

Roedy Green · Aug 30, 2005

Basic idea of program is to input a stock ticker symbol,
like YHOO for Yahoo, and to return the name of the company.
If I was going to do it a lot, then I suspect a hash of some
kind would have fast retrieval of the info. However, if I expect
to use it only a few times (one time initialization though)
a hash would take up a long time compared to just reading
the file a few times. Does anyone have experience to share?

A HashTable will always be faster than reading a file. Opening a file
an a huge production.

The alterative for short lists is a simple array you search, perhaps
using binary search.

A HashMap or HashTable is no big deal. It is very quick no matter how
big it grows. It is low maintenance. It automatically grows as
needed. It is a well understood programming idiom. About the only
problem with them is efficiently initialising them in static init
code. The brute force code generates a bloated class file.

I have used three techniques for trimming that intialisation code down
to size:

1. serialising the HashMap, and loading it as a resource.

2. writing the data out as Java source code for two arrays, key and
value, that are then used to initialise the HashTable with a static
init loop.

3. loading the key-value pairs from a resource, a properties file or
something very like a properties file.

See http://mindprod.com/jgloss/hashtable.html
http://mindprod.com/jgloss/hashcode.html
http://mindprod.com/hashmap.html

The key to speed is getting a decent hashCode implementation.

At some point I should see if I can invent another sort of HashMap
where you feed it objects with embedded keys, rather than separate
key/value pairs. Then you could initialise them with an array of
objects that implement a Keyed interface.

Joan · Aug 30, 2005

Roedy Green said:
A HashTable will always be faster than reading a file. Opening
a file
an a huge production.

The alterative for short lists is a simple array you search,
perhaps
using binary search.

A HashMap or HashTable is no big deal. It is very quick no
matter how
big it grows. It is low maintenance. It automatically grows
as
needed. It is a well understood programming idiom. About the
only
problem with them is efficiently initialising them in static
init
code. The brute force code generates a bloated class file.

I have used three techniques for trimming that intialisation
code down
to size:

1. serialising the HashMap, and loading it as a resource.

2. writing the data out as Java source code for two arrays, key
and
value, that are then used to initialise the HashTable with a
static
init loop.

3. loading the key-value pairs from a resource, a properties
file or
something very like a properties file.

See http://mindprod.com/jgloss/hashtable.html
http://mindprod.com/jgloss/hashcode.html
http://mindprod.com/hashmap.html

The key to speed is getting a decent hashCode implementation.

At some point I should see if I can invent another sort of
HashMap
where you feed it objects with embedded keys, rather than
separate
key/value pairs. Then you could initialise them with an array
of
objects that implement a Keyed interface.

Thanks for the discussion Roedy. What I ended up with is quite
simple.
(leaving out the Try/Catch stuff and opening the file)

// here is how I init the tree
TreeMap<String, String> tmNYSE = new TreeMap<String, String>();
while ((line = in.readLine()) != null) {
String[] t = line.split(":");
tmNYSE.put(t[1], t[0]);
}
// here is how I use it
// user types key into a text field
String key = jTextField1.getText().toUpperCase();
// get value from the tree
String value = tmNYSE.get(key);
// that's it -- and it runs quite fast.

Roedy Green · Aug 30, 2005

TreeMap<String, String> tmNYSE = new TreeMap<String, String>();

Initialising a map from a file is not going to hurt you. But reading
the entire file every lookup could be problem.

TreeMaps are when you need to alphabetically traverse the keys, or use
imprecise keys and you get the entry just before or after. If you
have exact keys, HashMap is faster and smaller.

Eric Sosman · Aug 30, 2005

Joan said:
Thanks, but I was hoping to avoid having to program it twice.
So I think I will go for what python calls a dictionary. Since I
have strings I don't have to write an equals method.

Presumably your program does something in addition to
looking up the ticker symbol. All those other bits and
pieces can stay the same; the only piece you need to "program
twice" is the chunk that translates symbols to names. That
is, the rest of the program just calls a method like

String symbolToName(String symbol) { ... }

.... and the only thing you need to change is the way this
method is implemented. The rest of the program remains
the same.

Even for this one method, "program it twice" is an
overstatement. Both the read-the-file-each-time approach
and the read-file-to-load-HashMap approach share the same
file-reading and -parsing code. The first approach requires
"extra" code to rewind before (or after) each search and to
stop reading once the desired item is found; the second
requires a couple of lines to manage the HashMap. Two
minutes' work, tops. Five if you're reading the Javadoc
over a 28000 bps dialup connection.

Joan · Aug 30, 2005

Roedy Green said:
Initialising a map from a file is not going to hurt you. But
reading
the entire file every lookup could be problem.

TreeMaps are when you need to alphabetically traverse the keys,
or use
imprecise keys and you get the entry just before or after. If
you
have exact keys, HashMap is faster and smaller.

I don't see this imprecise feature to TreeMap. If I spell a
ticker wrong
it doesn't give me a nearby value it gives me 'null'. The file is
alphabetically
organized so I was thinking about a binary search, but that level
of
detail in TreeMap doesn't seem to be accessible to the
programmer.

Babu Kalakrishnan · Aug 31, 2005

Joan said:
I don't see this imprecise feature to TreeMap. If I spell a ticker wrong
it doesn't give me a nearby value it gives me 'null'. The file is
alphabetically
organized so I was thinking about a binary search, but that level of
detail in TreeMap doesn't seem to be accessible to the programmer.

I wouldn't think you'd need that level of access. Since the TreeMap
implementation keeps the keys ordered, I would expect the containsKey
method to do a search at least as efficiently as a binary search.

BK

Oliver Wong · Aug 31, 2005

Joan said:
The file is alphabetically
organized so I was thinking about a binary search, but that level of
detail in TreeMap doesn't seem to be accessible to the programmer.

From the javadocs
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):

Red-Black tree based implementation of the SortedMap interface. This class
guarantees that the map will be in ascending key order, sorted according to
the natural order for the key's class (see Comparable), or by the comparator
provided at creation time, depending on which constructor is used.

This implementation provides guaranteed log(n) time cost for the
containsKey, get, put and remove operations. Algorithms are adaptations of
those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.

- Oliver

Joan · Aug 31, 2005

Oliver Wong said:
From the javadocs
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html):

Red-Black tree based implementation of the SortedMap interface.
This class guarantees that the map will be in ascending key
order, sorted according to the natural order for the key's
class (see Comparable), or by the comparator provided at
creation time, depending on which constructor is used.

This implementation provides guaranteed log(n) time cost for
the containsKey, get, put and remove operations. Algorithms are
adaptations of those in Cormen, Leiserson, and Rivest's
Introduction to Algorithms.

- Oliver

Thanks for the reference to the Cormen book. $76.00 at Amazon is
a little
steep for me right now, but my local library has the first
edition (1990) so
I'll go look at that to start.

Here's an interesting tidbit. The authors page for the book
http://theory.lcs.mit.edu/~clr/
states, regarding the first edition, "First published in 1990."
I cut/paste that quote so I would get it exactly right.
So I wonder when the first edition was second published ;-)

Joan · Aug 31, 2005

<snip>
<toppost>
<report>
<title>
ERRATA 1.2
</title>
<length>
27 pages
</length
</report>
<comment>
The errata file for the first edition is 27 pages.
</comment>
</toppost>

read and write stock prices to file using arrays	6	Apr 9, 2010
Data saving in condition of changing reality	0	Apr 29, 2022
Find and count strings of text from multiple files	17	Dec 16, 2021
Sorting Structure Data within Vector	1	Mar 27, 2013
File Read Performance Issue	17	Aug 17, 2013
read data from a text file	1	Feb 25, 2011
Reading a file into a data structure....	8	Oct 13, 2011
I'm tempted to quit out of frustration	1	Aug 13, 2023

initialize data structure or read text file

Joan

Eric Sosman

Joan

Roedy Green

Joan

Roedy Green

Eric Sosman

Joan

Babu Kalakrishnan

Oliver Wong

Joan

Joan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads