Efficient storage of a temporary string

Randy Kramer · Mar 2, 2005

Background: In order to do the parsing I've talked about in another thread, in
many circumstances I need to know the number of spaces before and after the
current token. I'm trying to think about efficient ways to do that--one
might be to do a preprocess pass through the text to figure out how many
spaces separate various tokens then store the tokens and spaces between them
in a temporary in memory data structure, or I'll need a way to backtrack from
the found position of some token to find how many spaces separate it from the
previous token.

In another thread I asked about streams. In this thread I want to ask about
an efficient way to store the intermediate result if I do a preprocessing
pass.

What I envision as a result of the preprocessing pass is a new representation
of the file where all spaces or groups of spaces are replaced by a list of
"tokens" and the numbers of spaces between those tokens or between a token
and then last/next newline. For example, with the TWiki marked up text:

This is a two level bulleted list:
* Level 1
* Level 2

The result I'd see is something like this:

bof,0,"This is a two level bulleted list:",0,\n,3,*,1,"Level 1",0,
\n,6,*,1,"Level 2",eof

Aside: I don't necessarily have to break everything down into tokens of a
single word (I didn't in the above), but it might end up being easier.

What makes the most sense as temporary storage of that result? My guess is an
array, which will expand thruout the prescan process (unless I preallocate an
array of an appropriate size--can I do that in Ruby), and then be destroyed
after the main processing pass. (I'll probably do the main processing pass
by essentially incrementing my way through that array.)

Is there a better approach?

(Aside: At some point I may rewrite the method to do this preprocessing pass
in C.)

Randy Kramer

Robert Klemme · Mar 2, 2005

Randy Kramer said:
Background: In order to do the parsing I've talked about in another
thread, in
many circumstances I need to know the number of spaces before and after
the
current token. I'm trying to think about efficient ways to do that--one
might be to do a preprocess pass through the text to figure out how many
spaces separate various tokens then store the tokens and spaces between
them
in a temporary in memory data structure, or I'll need a way to backtrack
from
the found position of some token to find how many spaces separate it from
the
previous token.

In another thread I asked about streams. In this thread I want to ask
about
an efficient way to store the intermediate result if I do a preprocessing
pass.

What I envision as a result of the preprocessing pass is a new
representation
of the file where all spaces or groups of spaces are replaced by a list of
"tokens" and the numbers of spaces between those tokens or between a token
and then last/next newline. For example, with the TWiki marked up text:

This is a two level bulleted list:
* Level 1
* Level 2

The result I'd see is something like this:

bof,0,"This is a two level bulleted list:",0,\n,3,*,1,"Level 1",0,
\n,6,*,1,"Level 2",eof

Aside: I don't necessarily have to break everything down into tokens of a
single word (I didn't in the above), but it might end up being easier.

What makes the most sense as temporary storage of that result? My guess
is an
array, which will expand thruout the prescan process (unless I preallocate
an
array of an appropriate size--can I do that in Ruby),

Yes, you can
=> [nil, nil, nil, nil, nil, nil, nil, nil, nil, nil]

But I'd do that only if the array allocation / reallocation proves as
performance bottleneck.

and then be destroyed
after the main processing pass. (I'll probably do the main processing
pass
by essentially incrementing my way through that array.)

Is there a better approach?

(Aside: At some point I may rewrite the method to do this preprocessing
pass
in C.)

Does this help?
This is a two level bulleted list:
* Level 1
* Level 2
EOF

a=[]; s.scan %r{"[^"]*"|\S+|\n|\s+}xo do |m| a << (/\A\s+\z/ =~ m ?
m.length : m ) end => "This is a two level bulleted list:\n * Level 1\n * Level 2\n"
a

Click to expand...

=> ["This", 1, "is", 1, "a", 1, "two", 1, "level", 1, "bulleted", 1,
"list:", 1, 3, "*", 1, "Level", 1, "1", 1, 6, "*", 1, "Level",
1, "2", 1]

The quoting part of the regexp can be improved to accept escaped quotes
inside a string as well as single quotes but I guess, you get the picture.

Also, you can do any type of conversion on the matched string in the block
before you insert the match into the array. If you use grouping in the
regexp, then you probably can use that for discrimination of the action to
be taken.

Kind regards

robert

A more efficient code	1	Apr 11, 2022
POST local storage - angular	0	May 10, 2022
Sort and count word pairs in a string	6	Jan 29, 2023
Collecting multiple items and saving to one list item, for eventual storage as a record.	8	Mar 5, 2023
Measuring a string of text	1	Sep 15, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Trouble accessing a value within a JSON string.	1	Jun 16, 2023
Temporary Object	2	May 15, 2013

Efficient storage of a temporary string

Randy Kramer

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads