what's better way to store a million keys in mem?

J

John_Woo

Hi,

A application, needs to check whether a user already logined, by
looking at static var in memory.

I don't have good idea. what's doing is, use Hashmap <not sure
Hashtable is better interms of inserting/removing/finding> to store ID
<string> -- key and status <character> -- value.

question:
what the better way to implement this goal, it should support up to 1
million of keys, in terms of high inserting/removing/finding, and
happened more frequently.
 
D

dsjoblom

John_Woo said:
Hi,

A application, needs to check whether a user already logined, by
looking at static var in memory.

I don't have good idea. what's doing is, use Hashmap <not sure
Hashtable is better interms of inserting/removing/finding> to store ID
<string> -- key and status <character> -- value.

question:
what the better way to implement this goal, it should support up to 1
million of keys, in terms of high inserting/removing/finding, and
happened more frequently.

If you *must* keep track of users by keeping user data in memory, this
is probably one of the quickest solutions. But it will eat up a lot of
memory.

If we assume an overhead of 8 bytes per object and 4 byte references, a
HashMap will use /at least/ 28 bytes per entry and that does not
include any data you put in. So if we assume your keys and values are
say 50 bytes on average, a HashMap will use /at least/ around 80
megabytes for 1 million entries. And that can probably be safely
rounded up to about 100 MB. If you run a 64-bit machine, you are even
worse off. While this may not sound too much, all of this memory is
memory that the main part of your application can't use.

HashMaps (by design) are also notorious for thrashing the CPU cache,
because of bad locality of data, and this is made even more painful in
java, because everything in the map is *also* a reference which
augments the problem.

I'd give a real database a try for this. There is of course a bit of
overhead, but you use a lot less memory and I guess it will also be
faster. Of course, you should benchmark and compare any solutions
suggested.

Regards,
Daniel Sjöblom
 
M

Mark Space

If you *must* keep track of users by keeping user data in memory, this
is probably one of the quickest solutions. But it will eat up a lot of
memory.

And it won't be stored in memory anyway. Every system now-a-days uses
virtual memory. Which means that memory that isn't used much is
off-load to disk, and only loaded back when needed. So your data will
be stored on disk anyway.

A database is a good idea. Maybe check out file locking as an
alternative. For example, have a list of users and passwords in a file.
Maybe call it /etc/passwd or something. Each time a user logs in,
lock the first byte of his/her entry in the /etc/passwd file. This may
take more memory and time than a Java hash map but who knows, sometimes
these low level system operations are highly optimized so, on the other
hand, you also might come out ahead.

Off the top of my head, I think the other way to do this is to write a
very efficient program in C, then let Java talk to it over a socket or
something.

Final thoughts: one million users is A LOT. I mean, REALLY A LOT.
There are only 40,000 or so user ports on a Unix system with TCP/IP.
Have you very carefully investigated the requirements of this system you
are building? I'm starting to seem more than just "a database" or "use
C." I think you'll need to build out some kind of load balancing array,
and that's non-trivial. Even a couple thousand users logged in at any
given time is going to stress out a single system greatly. Better think
this through carefully if it's important...
 
M

Mark Space

Mark said:
And it won't be stored in memory anyway. Every system now-a-days uses
virtual memory. Which means that memory that isn't used much is
off-load to disk, and only loaded back when needed. So your data will
be stored on disk anyway.

A database is a good idea. Maybe check out file locking as an
alternative. For example, have a list of users and passwords in a file.
Maybe call it /etc/passwd or something. Each time a user logs in, lock
the first byte of his/her entry in the /etc/passwd file. This may take

Oops, John I think I got confuzzled with your previous post. Ignore the
stuff about locking users. Everything else about one million users still
applies I think, at least in principle.
 
V

Vincent Cantin

I'd give a real database a try for this. There is of course a bit of
overhead, but you use a lot less memory and I guess it will also be
faster. Of course, you should benchmark and compare any solutions
suggested.

I also think that you should use a database. You can use a ORM
(Object-Relation Mapping) like the javax.persistence API of JEE5 or
hibernate directly, for example.
 
S

steve

Hi,

A application, needs to check whether a user already logined, by
looking at static var in memory.

I don't have good idea. what's doing is, use Hashmap <not sure
Hashtable is better interms of inserting/removing/finding> to store ID
<string> -- key and status <character> -- value.

question:
what the better way to implement this goal, it should support up to 1
million of keys, in terms of high inserting/removing/finding, and
happened more frequently.

this will not scale with a single CPU, you will need multiple cpu's and
servers , if you tend to have a million users. and if you use multiple
servers you cannot store the keys in memory, unless you share the memory
between processors.
so therefore you are looking at some sort of database or shared disk storage.
something like oracle , will allow you to have multiple physical servers &
cpu accessing the same database, as if it is a single entity, you can even
partition the database into areas of say 100,000 so that the loading on the
diskdrives is partitioned and spread evenly between the different cpu's.



Steve
 
J

John_Woo

steve said:
this will not scale with a single CPU, you will need multiple cpu's and
servers , if you tend to have a million users. and if you use multiple
servers you cannot store the keys in memory, unless you share the memory
between processors.
so therefore you are looking at some sort of database or shared disk storage.
something like oracle , will allow you to have multiple physical servers &
cpu accessing the same database, as if it is a single entity, you can even
partition the database into areas of say 100,000 so that the loading on the
diskdrives is partitioned and spread evenly between the different cpu's.



Steve

Thanks Steve, that sounds.
Currently we have multi-cpus, may use multi-server later, but still no
database designed
for holding the login-status.

Can u tell more about how to use shared disk in java, how to test it?
 
T

Thomas Hawtin

John_Woo said:
A application, needs to check whether a user already logined, by
looking at static var in memory.

I don't have good idea. what's doing is, use Hashmap <not sure
Hashtable is better interms of inserting/removing/finding> to store ID
<string> -- key and status <character> -- value.

question:
what the better way to implement this goal, it should support up to 1
million of keys, in terms of high inserting/removing/finding, and
happened more frequently.

If you arrange your IDs to be sequential, then you could use an
AtomicIntegerArray. That would have little overhead over the minimum
possible memory requirement. If the IDs are not sequential, the a linear
probing hash map would make sense. Either find an open source
implementation or write your own. A lot easier and faster than using C
or a database.

Tom Hawtin
 
C

Chris Uppal

Thomas said:
If you arrange your IDs to be sequential, then you could use an
AtomicIntegerArray. That would have little overhead over the minimum
possible memory requirement. If the IDs are not sequential, the a linear
probing hash map would make sense. Either find an open source
implementation or write your own. A lot easier and faster than using C
or a database.

I think the latter would be better even if the IDs are sequential -- the whole
million are unlikely to be logged in at once.

-- chris
 
T

Thomas Hawtin

Chris said:
I think the latter would be better even if the IDs are sequential -- the whole
million are unlikely to be logged in at once.

If it's only one status byte per user, then we are only talking 1 MB -
tiny. Using AtomicIntegerArray in that way is relatively easy (so long
as you know what you are doing) and removes problems related to locking.

Tom Hawtin
 
C

Chris Uppal

Thomas Hawtin wrote:

[me:]
If it's only one status byte per user, then we are only talking 1 MB -
tiny.

Or even less if you use 1-bit per user. That way you can run the whole system
on the CTO's mobile phone ;-)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top