efficient data structure and pointer question ...

J

John Galt

Hi All,

Suppose I have a TreeMap, and I want to update one of the values in
it.
I would do:

TreeMap tree;
Object count = tree.get(Object word);
count = count + 1; // assume that "+" is defined somehow
tree.put(Object word, count);

My Questions:

1. My question is, assuming that get() and put() each run in lg(n)
time, won't this be inefficient as opposed to how it could be done in
C, say:

struct node *p;
p = tree_search(char *word); // only one lookup, returns node pointer
p->count++;

2. So ultimately, can it be done as its done in C? How can "pointers"
(however they exist) be used in Java?

3. I am using a TreeMap to build an index to find the words that occur
in the most documents - in some large set of documents. Is there a
more efficient data structure in Java?

thanks in advance,
John
 
C

Chris Smith

John said:
I would do:

TreeMap tree;
Object count = tree.get(Object word);
count = count + 1; // assume that "+" is defined somehow
tree.put(Object word, count);
Okay...

1. My question is, assuming that get() and put() each run in lg(n)
time, won't this be inefficient as opposed to how it could be done in
C, say:

struct node *p;
p = tree_search(char *word); // only one lookup, returns node pointer
p->count++;

Yes and no. It will generally take twice as long, since the search is
repeated twice. That's a constant factor, but no change in the time-
complexity of the code.
2. So ultimately, can it be done as its done in C? How can "pointers"
(however they exist) be used in Java?

Sure it can:

Node p = tree.get(word);
p.count++;

(Of course, for all the normal reasons, you're better off encapsulating
Node instead of exposing a public field. I was just providing a direct
transliteration of the C code. In reality, the last line would probably
be p.bumpCount() or something like that.)
3. I am using a TreeMap to build an index to find the words that occur
in the most documents - in some large set of documents. Is there a
more efficient data structure in Java?

A HashMap is generally more efficient than a TreeMap if you don't need
ordering.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
B

Barry White

John said:
Hi All,

Suppose I have a TreeMap, and I want to update one of the values in
it.
I would do:

TreeMap tree;
Object count = tree.get(Object word);
count = count + 1; // assume that "+" is defined somehow
tree.put(Object word, count);

My Questions:

1. My question is, assuming that get() and put() each run in lg(n)
time, won't this be inefficient as opposed to how it could be done in
C, say:

Once you 'get' the object you don't have to 'put' it back again. Get
returns a reference to the object which you can then modify, it does not
remove the object from the data structure.
struct node *p;
p = tree_search(char *word); // only one lookup, returns node pointer
p->count++;

2. So ultimately, can it be done as its done in C? How can "pointers"
(however they exist) be used in Java?

3. I am using a TreeMap to build an index to find the words that occur
in the most documents - in some large set of documents. Is there a
more efficient data structure in Java?

A HashMap is indeed more effient with order 1 time complexity but it
does carry a space overhead with it. A HashMap with 0.6 load factor will
occupy roughly 40% more memory than a TreeMap.

How valuable is memory? How often will you need to perform get/put
operations? It's your choice..

might be useful:
http://java.sun.com/developer/JDCTechTips/2002/tt0709.html

Barry
 
J

John Galt

First, apologies for any previous double-posts -- my news reader needs
some serious pepto..
TreeMap tree;
Object count = tree.get(Object word);
count = count + 1; // assume that "+" is defined somehow
tree.put(Object word, count);
[...]

Once you 'get' the object you don't have to 'put' it back again. Get
returns a reference to the object which you can then modify, it does not
remove the object from the data structure.

Okay, but 'count' is a Short. Because Short doesn't have a setter, I
have to do:

count = new Short ( count.shortValue()+1 );

In this case, I still have to do a put, right?

Some more questions:
- If I write a class myShort with a single short as a member field,
and with ctor, setter, getter, will it be smaller/faster than Short?
- (I nuked the TreeMap in favor of a HashMap -- thanks to the two
responders to my OP) The HashMap now has 200K words in it. I have to
sort them and write them out to a file. I plan to iterate over the
HashMap and put each element into a TreeSet and then iterate over the
(sorted) TreeSet. I figure this will take the same time -- nlg(n) --
as avg-case quicksort. Is there a better way?
- I want to use PrintWriter(BufferedWriter(FileWriter))) to write out
the file one line at a time. Is this the most efficient way for
line-at-a-time IO? Bumping up the default buffer-size of the
BufferedWriter would give how much more performance?

thanks in advance,
John
 
C

Chris Smith

John said:
Okay, but 'count' is a Short. Because Short doesn't have a setter, I
have to do:

count = new Short ( count.shortValue()+1 );

In this case, I still have to do a put, right?

Yep. But you could use a different kind of Object other than Short, and
make it mutable.
Some more questions:
- If I write a class myShort with a single short as a member field,
and with ctor, setter, getter, will it be smaller/faster than Short?

It won't be smaller, but it will be about the same size. It will
certainly be faster to do the task above, since you won't need to do the
put(Object) call at the end. It won't be any faster for normal uses of
the object, but not slower either. For concurrency, though, you'll need
to synchronize access to the object, which will make it substantially
slower.
- (I nuked the TreeMap in favor of a HashMap -- thanks to the two
responders to my OP) The HashMap now has 200K words in it. I have to
sort them and write them out to a file. I plan to iterate over the
HashMap and put each element into a TreeSet and then iterate over the
(sorted) TreeSet. I figure this will take the same time -- nlg(n) --
as avg-case quicksort. Is there a better way?

Nope, n*log(n) is the best you'll do. If you needed them sorted,
though, iot's probably best to stick with TreeMap, depending on how
frequently you need the sorted copy.
- I want to use PrintWriter(BufferedWriter(FileWriter))) to write out
the file one line at a time. Is this the most efficient way for
line-at-a-time IO? Bumping up the default buffer-size of the
BufferedWriter would give how much more performance?

It's certainly the easiest way to do line-at-a-time I/O. You could
probably work out something faster, but first you'd want to determine
that this is a performance-sensitive task in your application.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top