Text manipulation while keeping original position offsets.

K

Kai Schlamp

Hi.

I need to manipulate large strings (deleting and adding the deleted
chars again, moving chars around), but still want to remember the
original position offsets. E.g. the word "computer" starts at offset
133 in the original text and is then moved to position 244, I still
want the info that it was originally at position 133.
The most ugly (and resource hungry) solution would be to store for
every character its original position plus it's position change. There
are surely better solutions, but also more complex ones.
Are there any good text manipulation libraries that have a solution to
my problem? I don't want to reinvent the wheel.

Regards,
Kai
 
R

Roedy Green

I need to manipulate large strings (deleting and adding the deleted
chars again, moving chars around), but still want to remember the
original position offsets. E.g. the word "computer" starts at offset
133 in the original text and is then moved to position 244, I still
want the info that it was originally at position 133.
The most ugly (and resource hungry) solution would be to store for
every character its original position plus it's position change. There
are surely better solutions, but also more complex ones.
Are there any good text manipulation libraries that have a solution to
my problem? I don't want to reinvent the wheel.

You probably need a list of bookmarks of interesting places. Every
time you made a modification, you update your bookmark offsets.
Most mod rules will be of the form.

If offset >= a increment by aa. if offset < a increment by bb
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Out of 135 criminals, including robbers and rapists, 118 admitted that when they were children they burned, hanged and stabbed domestic animals."
~ Ogonyok Magazine 1979.
 
T

Tom Anderson

I need to manipulate large strings (deleting and adding the deleted
chars again, moving chars around), but still want to remember the
original position offsets. E.g. the word "computer" starts at offset 133
in the original text and is then moved to position 244, I still want the
info that it was originally at position 133. The most ugly (and resource
hungry) solution would be to store for every character its original
position plus it's position change. There are surely better solutions,
but also more complex ones. Are there any good text manipulation
libraries that have a solution to my problem? I don't want to reinvent
the wheel.

I don't know of any libraries which do this.

If i was going to write something, i'd use a string representation that
was like enfilades or ropes:

http://en.wikipedia.org/wiki/Enfilade_(Xanadu)
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

And then look for a way to hang 'original position' information on blocks
of characters in that structure. Conceptually, it's similar to your null
hypothesis of putting an original position on every character, but when
you have blocks of characters that were contiguous in the original, you
can optimise by only storing a single original position, for the start of
the block.

In fact, i think the canonical enfilades work by storing indices into an
original text, rather than characters, so you may find they give you want
you want for free. It's hard to tell, because everything from Xanadu is
poorly documented and shrouded in strange language.

tom
 
M

markspace

Kai said:
Hi.

I need to manipulate large strings (deleting and adding the deleted
chars again, moving chars around), but still want to remember the
original position offsets. E.g. the word "computer" starts at offset
133 in the original text and is then moved to position 244, I still
want the info that it was originally at position 133.
The most ugly (and resource hungry) solution would be to store for
every character its original position plus it's position change. There
are surely better solutions, but also more complex ones.
Are there any good text manipulation libraries that have a solution to
my problem? I don't want to reinvent the wheel.


I just happended to look at the Java Swing text component tutorials
recently. There's an Undo/Redo manager available, you might be able to
implement some sort of Memento pattern for your strings.

<http://java.sun.com/docs/books/tutorial/uiswing/components/generaltext.html>

Also, Java strings most often work by returning an index and length, and
sharing an immutable buffer. In other works this code:

String s1 = "This is my new computer.";
s1 = s1.substring( 15 );

s1 now equals "computer." (if I counted right) but the original buffer
with the full string is still there. All that's happened is the index
of the string has been bumped up to 15 and the length has been decreased
accordingly.

In other words, if you just save the original string, you aren't
actually doing anything ugly or resource hungry.


String s1 = "This is my new computer.";
Stack savedStrings = new Stack();
savedStrings.push( s1 );
s1 = s1.substring( 15 );

This costs very little, and your original string is still there for
searching. Adding a memento to this might be relatively easy. You
might be able to even just save the new and the old values, with out an
explicit "operation" to transform them.

Note I really should have used a Deque, and generics, in the example
above, I'm just trying to be super clear if you aren't up to speed on
Java collections.

<http://en.wikipedia.org/wiki/Memento_pattern>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top