help on how to save/load this data structure?

K

Kevin

Hi Guys,
I am wondering if any suggestions on how to do the coding for this data
structure and requirements:

The story:

1) There are a large number of log data, which is line by line (text).
Each line has an line ID (integer). Basically, we can think each line
of data is logs at that time, say, each second a line is added to the
log. The total lines are more than 10 millions.
2) There are a large number of possible events (say 200K, with event ID
to identify them). When one even occurs, it will generate a value in
the log data. Since the events can occur con-currently, so one line of
data may have many values in it.

The abstract data structure:

It is required that one event ID (Integer) corresponds to many line IDs
(Integer), in which this event occurs.
If the total size is small, we can use a naïve way as: save all the
IDs into a hash, with event ID as key, and an ArrayList (or Hashtable
since we do not need the lineIDs to be in order) as value to the hash,
each item in the ArrayList is line ID (Integer).
There are some methods that can save some memory, such as customized
array and do not use Integer (8 bytes each one), etc. But with the
above mentioned size, these ways are just no help.

The required operations on the data:

The application needs to build such a data structure which supports
these two operations:
1) Given an event ID, find all the line IDs of that event.
2) Given a group of event IDs, find all the line IDs of the group
(basically a "union" of the set of line IDs of each event ID).

Any idea of how to build such a big structure? I think there should not
be any way to fit them into memory (java 1.4's stack size is max
1.3G, on win32, I think). If we can swap some of them out to a file,
read them in only when needed, how to construct the structure so we can
do the job more efficiently? Or will it be better (faster) if we put
all the IDs into a database table and use SQL to get them?

Thanks a lot and you have a great day. :)

By the way, any faster way to write/read large number of int to and
from a file? Some days ago, I did a test using ObjectOutputStream's
writInt(), if I remember right, it took about 3 seconds to write 10^7
int to a file, which resulted in a file about 38M.
 
W

Wendy Smoak

Kevin said:
Or will it be better (faster) if we put
all the IDs into a database table and use SQL to get them?

I think you answered your own question. :)
 
K

Kevin

I never program SQL in Java before. Would that be slow to issue SQL
calls? I have the feeling that large number of SQL calls will be slow
(especially I can only find a normal, not super fast, machine for DB
server).
 
K

Kevin

By the way, myself don't mind using database or not. But it seems the
end user would like a "stand-alone" program, using database will make
him kind of unhappy. :-(
 
R

Robert Mischke

Kevin said:
By the way, myself don't mind using database or not. But it seems the
end user would like a "stand-alone" program, using database will make
him kind of unhappy. :-(

There are "embedded" databases which don't require a separate server,
for example http://hsqldb.sourceforge.net/ .

To your original question: Yes, I think a database is the way to go -
that's what databases exist for :)

Robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top