Memory efficient way to store strings in hash maps using

Æ

æœã®æœ¨

Hi. I have recently written an article about problem of storing
millions of short strings in hash maps, as well as in other
containers. The problem is the memory overhead of std::string.
Therefore in the article I have discussed using boost::array
(std::array) as keys and values of hash maps and for different hash
map implementations. This is nothing advanced, but the benchmark
results may be interesting.

I have just started blogging, so please take a look at my articles.
Most of them are related to efficiency and memory consumption when
storing string data in containers. Any feedback (here or there) will
motivate me to continue blogging.

Articles as for now:
* Huge unordered hash maps (and threading)
* Debugging in C++
* Std::string on several unordered hash map implementations.
Benchmark.
* Memory overhead of an std::string
* Memory efficient way to store strings in hash maps using
boost::array

The blog url: http://jovislab.com/blog/

There are on ads on my blog :)

Thanks in advance for your comments!
 
M

Marc

æœã®æœ¨ said:
Hi. I have recently written an article about problem of storing
millions of short strings in hash maps, as well as in other
containers. The problem is the memory overhead of std::string.

Assuming you are using libstdc++, did you try other implementations of
strings, in particular ones using the short-string optimization
technique? There should be one in ext/vstring.h called __vstring.
Other libraries (libcxx for instance) have something like that by
default.
 
A

asanoki

W dniu środa, 18 kwietnia 2012 06:07:28 UTC+9 użytkownik Marc napisał:
Assuming you are using libstdc++, did you try other implementations of
strings, in particular ones using the short-string optimization
technique? There should be one in ext/vstring.h called __vstring.
Other libraries (libcxx for instance) have something like that by
default.

Thanks for the hint. I have just checked it. Still the overhead is big.
G++ -O2 32bit, sys 3bit, 1000000 strings on the heap. Virtual memory in KB:
length __vstring std::string
1 34552 42340
4 34552 42340
8 34552 50260
16 58048 58048

Regards.
 
J

Juha Nieminen

????????? said:
Hi. I have recently written an article about problem of storing
millions of short strings

Storing large amounts of strings (or other similar data) in an efficient
manner (in terms of both memory usage and access times) is a very non-trivial
problem, and there are dozens and dozens of data containers and algorithms
dedicated to that precise problem.

If one needs that kind of efficiency (and one understands even a tiny bit
about data conatiners, algorithms, efficiency, and how the standard data
containers work), one wouldn't be using std::string for this in the first
place. (If it uses short string optimization, it may help a bit, but it's
defeated immediately when the strings are even one character longer than
the optimization threshold. Even in that case they would still consume
more memory and be slower than an advanced, very specialized data container
designed for this exact purpose.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top