rope vs. string

J

James Aguilar

Hey all,

I have a program that runs the Manber-Myers suffix array search algorithm
with longest common prefix values. I need an efficient way to represent a
sequential array of characters to be passed into the algorithm. Here are
the features I need for the representation:

* Must be randomly accessible in constant time
* Must be capable of efficiently storing strings of more than 10 million
characters

Here are some plusses:

* Dereferencing an iterator should be fast, not just constant time, but
quick constant time, since it will be happening often (at least (size of
pattern) + log(size of text) times)
* Incrementing an iterator should be similarly fast

It would also be useful if this representation provided an efficient method
of extracting all the characters for the problem from the standard input
(again, no less than 5 million characters, randomly distributed among A, C,
G, and T).

Now, here's the question: which is better for this problem, a string or a
rope. If a rope would be best, can you offer any guidance as to ways to
make the use of the rope as fast and painless as possible (if that is the
preferable representation).

- JFA1
 
I

Ivan Vecerina

Hi James,
I have a program that runs the Manber-Myers suffix array search algorithm
with longest common prefix values. I need an efficient way to represent a
sequential array of characters to be passed into the algorithm. Here are
the features I need for the representation:

* Must be randomly accessible in constant time
* Must be capable of efficiently storing strings of more than 10 million
characters

Here are some plusses:

* Dereferencing an iterator should be fast, not just constant time, but
quick constant time, since it will be happening often (at least (size of
pattern) + log(size of text) times)
* Incrementing an iterator should be similarly fast

It would also be useful if this representation provided an efficient
method of extracting all the characters for the problem from the standard
input (again, no less than 5 million characters, randomly distributed
among A, C, G, and T).

Now, here's the question: which is better for this problem, a string or a
rope. If a rope would be best, can you offer any guidance as to ways to
make the use of the rope as fast and painless as possible (if that is the
preferable representation).

The implementation of rope::iterator is somewhat more complex than for
string (where it is basically a pointer), and will introduce an overhead.

The only true advantage of a rope (i.e. the non-standard rope class from
the SGI STL) is that it supports efficient *editing* of a small portion
of the string (not just replacing, but also inserting/deleting/splicing
subsequences within the original string). Unless this is relevant to
what you need to do, there is no point in using rope instead of string.

The caveat with std::string, depending on how the platform you use
implements the standard library, is to try to avoid unnecessary copies
of the data structure (be careful to pass the strings as references
whenever possible, to avoid object copies in case your implementation
does not use reference-counting).

In case top performance is critical, passing the input as a file
and mapping it into memory (with platform-specific mmap/MapViewOfFile),
will most likely be the fastest approach at run-time.

Note also that your algorithm could be templated on the iterator type,
allowing it to be used on any type of source data range.


I hope this helps,
Ivan
 
J

James Aguilar

Oh! Thanks for the help. I didn't realize the rope was not in -the- STL.
I got the sense from the site that I was reading that it was. If it's not
standard, I'm not going to mess with it, as it is only for a homework
assignment. Thank you so much for your advice nonetheless.

- JFA1
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top