data structure that preserving insertion order

Owner · Apr 29, 2011

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.

Peter Nilsson · Apr 29, 2011

Owner said:
Is there any data structure that preserving insertion order

Yes, lots of them. In terms of the 'usual' data structures,
pretty much all of them except trees, graphs and hash tables.

in C?

Data structures generally aren't language dependant. It doesn't
sound like you have a C issue.

I need to process unicode mapping and unlike ascii which has
128 codes, unicode has more than roughly 10,000 letters.

So store unicode characters in whatever array, list, etc...
you would normally store a sequence of characters.

I'm looking for..

Data structure that handles large data and preserving inser-
tion order.

What's the actual problem?

If you're building yet another text editor, but one that will
open a 100MB text file and still run in reasonable time when
the user inserts a line at the beginning, then you just need a
container to house the lines, and manage each line separately.

Owner · Apr 29, 2011

Yes, lots of them. In terms of the 'usual' data structures,
pretty much all of them except trees, graphs and hash tables.

Data structures generally aren't language dependant. It doesn't
sound like you have a C issue.

So store unicode characters in whatever array, list, etc...
you would normally store a sequence of characters.

What's the actual problem?

If you're building yet another text editor, but one that will
open a 100MB text file and still run in reasonable time when
the user inserts a line at the beginning, then you just need a
container to house the lines, and manage each line separately.

If you can tell me one data structure, I can

research with the clue on the web. but I have no idea

except trees. even if I heard trees (like binary tree,

b-tree..) I dont' know how to relate it to editor or

big data in actual code. but I'm learning step by step.

I already posted similar but shorter post at

comp.programming and it was unproductive.

Paul N · Apr 29, 2011

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.

I'm afraid you still haven't said enough for us to be able to help
you. When you say "preserving insertion order", what are you planning
on doing to the data that might mess up the order?

For instance, if I do the following:

char buff[10];
char *p = buff;
*(p++) = 'H';
*(p++) = 'e';
*(p++) = 'l';
*(p++) = 'l';
*(p++) = 'o';
*p = 0;

printf(buff);

then this should print all the characters in the order they were
inserted. Presumably you mean something more than this.

Owner · Apr 29, 2011

Is there any data structure that preserving insertion order
in C?

I need to process unicode mapping and unlike ascii which has

128 codes, unicode has more than roughly 10,000 letters.

I'm looking for..

Data structure that handles large data and preserving inser-

tion order.

Click to expand...

I'm afraid you still haven't said enough for us to be able to help
you. When you say "preserving insertion order", what are you planning
on doing to the data that might mess up the order?

For instance, if I do the following:

char buff[10];
char *p = buff;
*(p++) = 'H';
*(p++) = 'e';
*(p++) = 'l';
*(p++) = 'l';
*(p++) = 'o';
*p = 0;

printf(buff);

then this should print all the characters in the order they were
inserted. Presumably you mean something more than this.

Yes exactly, preserving order like that but with large data

like unicode mapping. data structure preserving the inorder

of character( character has its own value according

character set but when inserted the character in some

sort of data structure, it preserves inorder ) with able

to search the character efficiently.

Ben Bacarisse · Apr 29, 2011

Owner said:
Yes exactly, preserving order like that but with large data
like unicode mapping. data structure preserving the inorder
of character( character has its own value according
character set but when inserted the character in some
sort of data structure, it preserves inorder ) with able
to search the character efficiently.

I'm still not sure what you want. In particular what is "Unicode
mapping"? Presumably it maps Unicode code points to ... what?

Anyway, any data structure that supports the mapping you want can be
made to record the insertion order, simply by maintaining a list at the
same time. This so simple that I image it is not what you want but I
can't tell. Why not say what the actual problem is that needs to be
solved?

Owner · Apr 29, 2011

I'm still not sure what you want. In particular what is "Unicode
mapping"? Presumably it maps Unicode code points to ... what?

Anyway, any data structure that supports the mapping you want can be
made to record the insertion order, simply by maintaining a list at the
same time. This so simple that I image it is not what you want but I
can't tell. Why not say what the actual problem is that needs to be
solved?

Trying to build tr unix tool that supports unicode.

ascii is 128 characters so a array can hold characters.

unicode is a little over 10,000 characters.

thought a binary tree could be right structure. but then

I need preserve order which character goes in first.

Thank you for replies by the way.

Ben Bacarisse · Apr 29, 2011

Owner said:
Trying to build tr unix tool that supports unicode.
ascii is 128 characters so a array can hold characters.
unicode is a little over 10,000 characters.
thought a binary tree could be right structure. but then
I need preserve order which character goes in first.

Why? I'd use something like a hash table. I don't see any need to
remember the insertion order. What's your reasoning?

Owner · Apr 29, 2011

Why? I'd use something like a hash table. I don't see any need to
remember the insertion order. What's your reasoning?

If you command something like "tr hashtable empttable"

replacing same length word hashtable with empttable.

I thought inorder needs to be preserved to replace the word,

no?

Kai-Uwe Bux · Apr 29, 2011

Owner said:
If you command something like "tr hashtable empttable"

replacing same length word hashtable with empttable.

I thought inorder needs to be preserved to replace the word,

no?

But would

tr abcd efgh

not just transliterate as follows:

a --> e
b --> f
c --> g
d --> h

_every_ a becomes an e, _every_ b becomes an f, ... . I.e.,

asdbdea --> eshfhee

For that, you don't need to know the order:

tr abcd efgh

behaves exactly like

tr badc fehg

At least, that is how tr behaves on my Unix box. Moreover, that is what the
info page tells me.

Best,

Kai-Uwe Bux

Owner · Apr 29, 2011

At least, that is how tr behaves on my Unix box. Moreover, that is what the
info page tells me.

Best,

Kai-Uwe Bux

Ah! Thank you

Owner · Apr 29, 2011

You're off by at least a factor of 10. There are 109,242 assigned graphical
code points in the latest Unicode standard.

Unicode in my primary language and what my system is using is

a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.

James Kuyper · Apr 30, 2011

On 04/29/2011 01:35 PM, Owner wrote:
....

If you command something like "tr hashtable empttable"

Those arguments tell tr to read from the standard input, perform the
following conversions, and write the result to the standard output:

* replace every 'h' with an 'e'
* replace every 'a' with an 'm'
* replace every 's' with a 'p'
* replace every 'h' with a 't' (which is a bit confusing)
etc.

I seriously doubt that this is what you thought it would do.

I thought inorder needs to be preserved to replace the word,

The order of the elements in the first set relative to the second set is
very important. But

tr abc xyz

performs exactly the same transformation as

tr bca yzx

James Kuyper · Apr 30, 2011

Unicode in my primary language and what my system is using is

Unicode is not a language. It's "a computing industry standard for the
consistent encoding, representation and handling of text ... the latest
version of Unicode consists of a repertoire of more than 109,000
characters covering 93 scripts, a set of code charts for visual
reference, an encoding methodology and set of standard character
encodings, an enumeration of character properties such as upper and
lower case, a set of reference data computer files, and a number of
related items, such as character properties, rules for normalization,
decomposition, collation, rendering, and bidirectional display order
(for the correct display of text containing both right-to-left scripts,
such as Arabic and Hebrew, and left-to-right scripts)."
<http://en.wikipedia.org/wiki/Unicode>

What you probably mean is that your computer system natively uses one of
the Unicode standard character encodings.

a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.

That sounds like you might be talking about either UCS-2 or UTF-16
encodings, but it could be a bad description of something else.

The best answer to your question depends upon which encoding you're
talking about - please look for documentation telling you what it is.

Gene · Apr 30, 2011

Unicode in my primary language and what my system is using is

a little over 10,000 character set.

unicode is only 16-bit when I change locale setting in the

beginning of the program.

It sounds like all you need is a dictionary data structure.
Dictionaries map one string to another. Here you have the special
case of mapping single characters to single characters. For the
command tr abcd efgh, the algorithm is just:

1. Build a dictionary D.
2. Insert into D the mappings a->e, b->f, c->g, d->h.
3. For each input character X,
if some mapping X->Y is in D,
print Y
else
print X

Single character dictionaries can be implemented as simple lookup
arrays as has been mentioned as long as the character set is not too
large; 10,000 would certainly be fine. But if your tr sets are small,
most of the entries will be trivial mappings to themselves. Hash
tables, BSTs, skip lists, and bit-wise tries all ought to work fine,
too. They have the advantage of storing only the non-identity
mappings as shown above. Because small data structures fit in cache
memory, these might actually be faster than the lookup table.

Data saving in condition of changing reality	0	Apr 29, 2022
Name that data structure!	17	Dec 22, 2008
Constness of pointed to data from constant structure	1	Jul 20, 2009
An algorithm to generate fake but realistic graph data that trends	0	Nov 21, 2013
Data Retrieval/Insertion Data Structure	3	Jul 1, 2004
Ideal data structure for nested list format?	29	Sep 4, 2010
lookup data structure that returns the previous element as well	5	Dec 9, 2003
two interesting data structure/algorithm questions	3	Nov 16, 2007

data structure that preserving insertion order

Owner

Peter Nilsson

Owner

Paul N

Owner

Ben Bacarisse

Owner

Ben Bacarisse

Owner

Kai-Uwe Bux

Owner

Owner

James Kuyper

James Kuyper

Gene

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads