data structure that preserving insertion order

O

Owner

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.
 
P

Peter Nilsson

Owner said:
Is there any data structure that preserving insertion order

Yes, lots of them. In terms of the 'usual' data structures,
pretty much all of them except trees, graphs and hash tables.

Data structures generally aren't language dependant. It doesn't
sound like you have a C issue.
I need to process unicode mapping and unlike ascii which has
128 codes, unicode has more than roughly 10,000 letters.

So store unicode characters in whatever array, list, etc...
you would normally store a sequence of characters.
I'm looking for..

Data structure that handles large data and preserving inser-
tion order.

What's the actual problem?

If you're building yet another text editor, but one that will
open a 100MB text file and still run in reasonable time when
the user inserts a line at the beginning, then you just need a
container to house the lines, and manage each line separately.
 
O

Owner

Yes, lots of them. In terms of the 'usual' data structures,
pretty much all of them except trees, graphs and hash tables.


Data structures generally aren't language dependant. It doesn't
sound like you have a C issue.


So store unicode characters in whatever array, list, etc...
you would normally store a sequence of characters.


What's the actual problem?

If you're building yet another text editor, but one that will
open a 100MB text file and still run in reasonable time when
the user inserts a line at the beginning, then you just need a
container to house the lines, and manage each line separately.

If you can tell me one data structure, I can

research with the clue on the web. but I have no idea

except trees. even if I heard trees (like binary tree,

b-tree..) I dont' know how to relate it to editor or

big data in actual code. but I'm learning step by step.

I already posted similar but shorter post at

comp.programming and it was unproductive.
 
P

Paul N

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.

I'm afraid you still haven't said enough for us to be able to help
you. When you say "preserving insertion order", what are you planning
on doing to the data that might mess up the order?

For instance, if I do the following:

char buff[10];
char *p = buff;
*(p++) = 'H';
*(p++) = 'e';
*(p++) = 'l';
*(p++) = 'l';
*(p++) = 'o';
*p = 0;

printf(buff);

then this should print all the characters in the order they were
inserted. Presumably you mean something more than this.
 
O

Owner

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.

I'm afraid you still haven't said enough for us to be able to help
you. When you say "preserving insertion order", what are you planning
on doing to the data that might mess up the order?

For instance, if I do the following:

char buff[10];
char *p = buff;
*(p++) = 'H';
*(p++) = 'e';
*(p++) = 'l';
*(p++) = 'l';
*(p++) = 'o';
*p = 0;

printf(buff);

then this should print all the characters in the order they were
inserted. Presumably you mean something more than this.

Yes exactly, preserving order like that but with large data

like unicode mapping. data structure preserving the inorder

of character( character has its own value according

character set but when inserted the character in some

sort of data structure, it preserves inorder ) with able

to search the character efficiently.
 
B

Ben Bacarisse

Owner said:
Yes exactly, preserving order like that but with large data
like unicode mapping. data structure preserving the inorder
of character( character has its own value according
character set but when inserted the character in some
sort of data structure, it preserves inorder ) with able
to search the character efficiently.

I'm still not sure what you want. In particular what is "Unicode
mapping"? Presumably it maps Unicode code points to ... what?

Anyway, any data structure that supports the mapping you want can be
made to record the insertion order, simply by maintaining a list at the
same time. This so simple that I image it is not what you want but I
can't tell. Why not say what the actual problem is that needs to be
solved?
 
O

Owner

I'm still not sure what you want. In particular what is "Unicode
mapping"? Presumably it maps Unicode code points to ... what?

Anyway, any data structure that supports the mapping you want can be
made to record the insertion order, simply by maintaining a list at the
same time. This so simple that I image it is not what you want but I
can't tell. Why not say what the actual problem is that needs to be
solved?

Trying to build tr unix tool that supports unicode.

ascii is 128 characters so a array can hold characters.

unicode is a little over 10,000 characters.

thought a binary tree could be right structure. but then

I need preserve order which character goes in first.

Thank you for replies by the way.
 
B

Ben Bacarisse

Owner said:
Trying to build tr unix tool that supports unicode.
ascii is 128 characters so a array can hold characters.
unicode is a little over 10,000 characters.
thought a binary tree could be right structure. but then
I need preserve order which character goes in first.

Why? I'd use something like a hash table. I don't see any need to
remember the insertion order. What's your reasoning?
 
O

Owner

Why? I'd use something like a hash table. I don't see any need to
remember the insertion order. What's your reasoning?

If you command something like "tr hashtable empttable"

replacing same length word hashtable with empttable.

I thought inorder needs to be preserved to replace the word,

no?
 
K

Kai-Uwe Bux

Owner said:
If you command something like "tr hashtable empttable"

replacing same length word hashtable with empttable.

I thought inorder needs to be preserved to replace the word,

no?

But would

tr abcd efgh

not just transliterate as follows:

a --> e
b --> f
c --> g
d --> h

_every_ a becomes an e, _every_ b becomes an f, ... . I.e.,

asdbdea --> eshfhee

For that, you don't need to know the order:

tr abcd efgh

behaves exactly like

tr badc fehg

At least, that is how tr behaves on my Unix box. Moreover, that is what the
info page tells me.


Best,

Kai-Uwe Bux
 
O

Owner

You're off by at least a factor of 10. There are 109,242 assigned graphical
code points in the latest Unicode standard.

Unicode in my primary language and what my system is using is

a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.
 
J

James Kuyper

On 04/29/2011 01:35 PM, Owner wrote:
....
If you command something like "tr hashtable empttable"

Those arguments tell tr to read from the standard input, perform the
following conversions, and write the result to the standard output:

* replace every 'h' with an 'e'
* replace every 'a' with an 'm'
* replace every 's' with a 'p'
* replace every 'h' with a 't' (which is a bit confusing)
etc.

I seriously doubt that this is what you thought it would do.
I thought inorder needs to be preserved to replace the word,

The order of the elements in the first set relative to the second set is
very important. But

tr abc xyz

performs exactly the same transformation as

tr bca yzx
 
J

James Kuyper

Unicode in my primary language and what my system is using is

Unicode is not a language. It's "a computing industry standard for the
consistent encoding, representation and handling of text ... the latest
version of Unicode consists of a repertoire of more than 109,000
characters covering 93 scripts, a set of code charts for visual
reference, an encoding methodology and set of standard character
encodings, an enumeration of character properties such as upper and
lower case, a set of reference data computer files, and a number of
related items, such as character properties, rules for normalization,
decomposition, collation, rendering, and bidirectional display order
(for the correct display of text containing both right-to-left scripts,
such as Arabic and Hebrew, and left-to-right scripts)."
<http://en.wikipedia.org/wiki/Unicode>

What you probably mean is that your computer system natively uses one of
the Unicode standard character encodings.
a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.

That sounds like you might be talking about either UCS-2 or UTF-16
encodings, but it could be a bad description of something else.

The best answer to your question depends upon which encoding you're
talking about - please look for documentation telling you what it is.
 
G

Gene

Unicode in my primary language and what my system is using is

a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.

It sounds like all you need is a dictionary data structure.
Dictionaries map one string to another. Here you have the special
case of mapping single characters to single characters. For the
command tr abcd efgh, the algorithm is just:

1. Build a dictionary D.
2. Insert into D the mappings a->e, b->f, c->g, d->h.
3. For each input character X,
if some mapping X->Y is in D,
print Y
else
print X

Single character dictionaries can be implemented as simple lookup
arrays as has been mentioned as long as the character set is not too
large; 10,000 would certainly be fine. But if your tr sets are small,
most of the entries will be trivial mappings to themselves. Hash
tables, BSTs, skip lists, and bit-wise tries all ought to work fine,
too. They have the advantage of storing only the non-identity
mappings as shown above. Because small data structures fit in cache
memory, these might actually be faster than the lookup table.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top