code review / efficient lookup techniques...

J

James

BTW, this is homework. I hope I am not out of line here...


I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string is
4 characters and only uses lowercase alphabetic characters. It needs to work
with more than just ASCII, (e.g., EBCDIC) I need to do this mapping without
using any standard containers. I immediately thought of using a 4 dimintinal
array of values to do a very fast lookup:


#define DEPTH (UCHAR_MAX + 1)

void* array[DEPTH][DEPTH][DEPTH][DEPTH];


However, this uses way to much memory. So, I thought of using the following
table:


#define DEPTH 27

void* array[DEPTH][DEPTH][DEPTH][DEPTH];


Which should handle all lowercase letters. However, I need to translate the
character value to an index into this array. So, here is what I have come up
with so far:




#include <iostream>
#include <climits>
#include <cctype>
#include <cstring>
#include <cassert>




class translator
{
static unsigned char g_chars[UCHAR_MAX + 1];
static char const* const g_alpha;


public:
translator()
{
for (int i = 0; i <= UCHAR_MAX; ++i)
{
if (isalpha(i) && islower(i))
{
for (int a = 0; a < 26; ++a)
{
if (i == g_alpha[a])
{
g_chars = a;
break;
}
}
}
}
}


public:
inline int operator [] (char c) const
{
return g_chars[(unsigned int)c];
}
};

unsigned char
translator::g_chars[UCHAR_MAX + 1] = { 0 };

char const* const
translator::g_alpha = "abcdefghijklmnopqrstuvwxyz";

static translator g_translator;




template<typename TR, typename T>
class cmap
{
typedef TR (*fp_command) (T);


struct table
{
fp_command m_fp[27][27][27][27];
};


private:
table m_table;


public:
cmap(fp_command fp_default)
{
for (std::size_t i1 = 0; i1 < 27; ++i1)
{
for (std::size_t i2 = 0; i2 < 27; ++i2)
{
for (std::size_t i3 = 0; i3 < 27; ++i3)
{
for (std::size_t i4 = 0; i4 < 27; ++i4)
{
m_table.m_fp[i1][i2][i3][i4] = fp_default;
}
}
}
}
}


private:
inline fp_command& prv_find(char const* name)
{
assert(strlen(name) >= 4);

return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
}

inline fp_command prv_find(char const* name) const
{
assert(strlen(name) >= 4);

return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
}


public:
void add(char const* name, fp_command cmd)
{
prv_find(name) = cmd;
}


inline TR exec(char const* name, T P) const
{
return prv_find(name)(P);
}
};




void
command_north(char const* str)
{
std::cout << "void command_north(char const* "
<< str
<< ")"
<< std::endl;
}


void
command_south(char const* str)
{
std::cout << "void command_south(char const* "
<< str
<< ")"
<< std::endl;
}


void
command_east(char const* str)
{
std::cout << "void command_east(char const* "
<< str
<< ")"
<< std::endl;
}

void
command_west(char const* str)
{
std::cout << "void command_west(char const* "
<< str
<< ")"
<< std::endl;
}

void
command_attack(char const* str)
{
std::cout << "void command_attack(char const* "
<< str
<< ")"
<< std::endl;
}

void
command_unknown(char const* str)
{
std::cout << "void command_unknown(char const* "
<< str
<< ")"
<< std::endl;
}




int main()
{
{
static cmap<void, char const*> cm_normal(command_unknown);
static cmap<void, char const*> cm_reverse(command_unknown);

cm_normal.add("north", command_north);
cm_normal.add("south", command_south);
cm_normal.add("east", command_east);
cm_normal.add("west", command_west);
cm_normal.add("attack", command_attack);

cm_reverse.add("north", command_south);
cm_reverse.add("south", command_north);
cm_reverse.add("east", command_west);
cm_reverse.add("west", command_east);
cm_reverse.add("attack", command_attack);

cm_normal.exec("north", "Message1");
cm_normal.exec("south", "Message2");
cm_normal.exec("east", "Message3");
cm_normal.exec("west", "Message4");
cm_normal.exec("west", "Message5");
cm_normal.exec("weild", "Message6");
cm_normal.exec("attack", "Big Monster!");
cm_normal.exec("fight", "Message7");
cm_normal.exec("1234xx566", "Message8");
cm_normal.exec("zzzz", "Message9");

std::cout << "----------------------------" << std::endl;

cm_reverse.exec("north", "Message1");
cm_reverse.exec("south", "Message2");
cm_reverse.exec("east", "Message3");
cm_reverse.exec("west", "Message4");
cm_reverse.exec("west", "Message5");
cm_reverse.exec("weild", "Message6");
cm_reverse.exec("attack", "Big Monster!");
cm_reverse.exec("fight", "Message7");
cm_reverse.exec("1234xx566", "Message8");
cm_reverse.exec("zzzz", "Message9");
}

return 0;
}




It seems to work, but I am a but uneasy... What do you all think of this? Am
I going about solving the problem in an ass-backwards manner?



Thanks.



BTW, can you think of a faster way to perform the lookup process???
 
C

Chris M. Thomasson

James said:
BTW, this is homework. I hope I am not out of line here...


I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string
is 4 characters and only uses lowercase alphabetic characters. It needs to
work with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
without using any standard containers. I immediately thought of using a 4
dimintinal array of values to do a very fast lookup:
[...]

4 dimensional

of course!

ARGH.
 
C

Chris M. Thomasson

James said:
BTW, this is homework. I hope I am not out of line here...


I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string
is 4 characters and only uses lowercase alphabetic characters.

I should clarify... The minimum length of a string is 4, and the maximum
characters evaluated are 4. So the following commands are going to map to
identical functions:


hello
hell
hell1234



It needs to work with more than just ASCII, (e.g., EBCDIC) I need to do
this mapping without using any standard containers. I immediately thought
of using a 4 dimintinal array of values to do a very fast lookup:
[...]
 
J

James

Chris M. Thomasson said:
James said:
BTW, this is homework. I hope I am not out of line here...


I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string
is 4 characters and only uses lowercase alphabetic characters.

I should clarify... The minimum length of a string is 4, and the maximum
characters evaluated are 4. So the following commands are going to map to
identical functions:


hello
hell
hell1234



It needs to work with more than just ASCII, (e.g., EBCDIC) I need to do
this mapping without using any standard containers. I immediately thought
of using a 4 dimintinal array of values to do a very fast lookup:
[...]


OOPS! The last two posts were using the wrong account. Sorry Chris! I am
Chris' cousin and am learning C++ at the local community college.

;^(...
 
J

James

Paul N said:
BTW, this is homework. I hope I am not out of line here...

I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string
is
4 characters and only uses lowercase alphabetic characters. It needs to
work
with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
without
using any standard containers. I immediately thought of using a 4
dimintinal
array of values to do a very fast lookup:

#define DEPTH (UCHAR_MAX + 1)

void* array[DEPTH][DEPTH][DEPTH][DEPTH];

However, this uses way to much memory.

Do you *need* to do it quickly, or is this a requirement that you have
added yourself to the task you were set?

I am going for some extra credits...

;^)



If speed is not an issue, I would be inclined to write a routine that
converts each of the letters into a number and then multiply up. For
instance, set a to be the value of the first letter, b the second, c
the third and d the fourth according to the arrangement

'a' = 1, 'b' = 2, ... 'z' = 26, 0 if string already finished

and then compute

value = a + b * 27 + c * 27*27 + d * 27*27*27

The converting function is pretty quick in ASCII (just a subtraction,
as long as you have some string left) and only a little slower for
EBCDIC.

Hope that helps.

I am not sure I understand. Are you suggesting to use the computed value as
a key in a binary tree or something?
 
J

James

James said:
Paul N said:
BTW, this is homework. I hope I am not out of line here...

I am in the process of learning C++. I have been given a task to
complete
which involves mapping strings to a value. The minimum length of a
string is
4 characters and only uses lowercase alphabetic characters. It needs to
work
with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
without
using any standard containers. I immediately thought of using a 4
dimintinal
array of values to do a very fast lookup:

#define DEPTH (UCHAR_MAX + 1)

void* array[DEPTH][DEPTH][DEPTH][DEPTH];

However, this uses way to much memory.

Do you *need* to do it quickly, or is this a requirement that you have
added yourself to the task you were set?

I am going for some extra credits...

;^)

I guess you could say that the technique I used is basically equal to a
(UCHAR_MAX + 1)-ary trie.
 
P

Paul N

BTW, this is homework. I hope I am not out of line here...

I am in the process of learning C++. I have been given a task to complete
which involves mapping strings to a value. The minimum length of a string is
4 characters and only uses lowercase alphabetic characters. It needs to work
with more than just ASCII, (e.g., EBCDIC) I need to do this mapping without
using any standard containers. I immediately thought of using a 4 dimintinal
array of values to do a very fast lookup:

#define DEPTH (UCHAR_MAX + 1)

void* array[DEPTH][DEPTH][DEPTH][DEPTH];

However, this uses way to much memory.

Do you *need* to do it quickly, or is this a requirement that you have
added yourself to the task you were set?

If speed is not an issue, I would be inclined to write a routine that
converts each of the letters into a number and then multiply up. For
instance, set a to be the value of the first letter, b the second, c
the third and d the fourth according to the arrangement

'a' = 1, 'b' = 2, ... 'z' = 26, 0 if string already finished

and then compute

value = a + b * 27 + c * 27*27 + d * 27*27*27

The converting function is pretty quick in ASCII (just a subtraction,
as long as you have some string left) and only a little slower for
EBCDIC.

Hope that helps.
Paul.
 
T

Thomas J. Gritzan

James said:
BTW, this is homework. I hope I am not out of line here...


I am in the process of learning C++. I have been given a task to
complete which involves mapping strings to a value. The minimum length
of a string is 4 characters and only uses lowercase alphabetic
characters. It needs to work with more than just ASCII, (e.g., EBCDIC) I
need to do this mapping without using any standard containers. I
immediately thought of using a 4 dimintinal array of values to do a very
fast lookup: [...]
BTW, can you think of a faster way to perform the lookup process???

I don't know if it is faster, but you could use a binary search on a
sorted array:

1) build an array of (key, value) pairs
2) sort the array using std::lexicographical_compare, strcmp or a
hand-written loop
3) search the lookup string in the array with binary search using above
compare function.

http://en.wikipedia.org/wiki/Binary_search_algorithm
 
J

James Kanze


[...]
Why 27? (instead of 26)

For the final '\0' when the string is less than 4 characters.
He could save a little memory by making the first dimension 26.
If he implemented it as a true trie, he would probably save a
lot more, since a lot of the sub-tries would probably be empty;
a dynamically constructed trie would definitely be a legitimate
answer to the problem, and would probably be rather fast. In
which case, something like:

struct TrieNode
{
void* action;
TrieNode* next[16];
};

would do the trick.
 
J

James

Paavo Helde said:
Such a 4-dimensional array is quite large, and according to your example,
very sparse.

If I understand you correctly, yes I agree; storing 5 objects in a memory
hog of a data-structure is wasteful indeed.



Accessing it in random fashion will probably cause a lot of
cache misses, so it is not given that this is the fastest lookup
possible.

I admit that is a cop out, however I don't really want to get into the
"hardware level" jest yet. However, AFAICT, there would be an upper bound on
the number of cache misses per-call to 'cmap::prv_find()'? The size of the
data-structure is static and that's that, so to speak.



A tag of 4 characters/bytes immediately suggests that you could interpret
those 4 characters as 32-bit unsigned integers (unsigned is needed, as
signed integers can have trap values).

Do you mean something like:




typedef unsigned int uint32_type;


typedef char assert_s
[
sizeof(uint32_type) == sizeof(char) * 4
&& sizeof(uint32_type) * CHAR_BIT == 32
? 1 : -1
];


typedef char cmd_buffer[4];


union cmd
{
cmd_buffer buffer;
uint32_type whole;
};




then decoding 4 chars at once like:


union cmd c;

char const* const string = "command";

c.whole = *((*uint32_type)string);


?



(This should work for EBCDIC as
well as far as both your source code and the initial data for
initializing the map are in the same codepage.) Thus the mapping is
simplified to integer-to-value. One could hold these in a sorted array
and perform a binary search (or even a linear search in an unsorted array
if the number of entries is small, placing the more frequent commands in
the beginning).

Please correct me if I am way off base here but I think the complexity for
the simple and naive algorithm I posted should be O(8) complexity for insert
and lookup operations. This tells me that storing a large amount of items in
the array will have no impact on the complexity whatsoever. I cannot exactly
claim that for a binary search whose complexity parameters can be effected
by N.



Below are just some stylistic comments...
[...]

int main()
{
{

Why extra braces?

This is probably going to sound retarded, but I personally like to
encapsulate everything in main within an extra scope in order to ensure that
all objects are destructed before I actually return from main. Imagine I
wanted to print a final message that occurs after all messages from the
object dtors are printed:


#include <iostream>

struct foo
{
~foo()
{
std::cout << "~foo()" << std::endl;
}
};


int main()
{
foo f;
std::cout << "the program is finished" << std::endl;
return 0;
}


If I want the output to be like:


~foo()
the program is finished



I might do something like:

int main()
{
{
foo f;
}

std::cout << "the program is finished" << std::endl;
return 0;
}




Does that make any sense at all?

;^o



Why static?

They blow the stack on a Windows platform. BTW, I don't really see a need
for dynamic memory allocation here.




[...]
Given that the number of used entries in the map is 5 as in your example,
it is quite certain that anything which uses a compact map (of size about
20 bytes) will be faster than your solution (using more than 500000
bytes) (of course assuming there is any kind of memory caching present in
the system).

I fully intend to insert many command strings with random names assigned to
random command functions. I have to turn this in on Monday. IMVVVHO, the
sparse 256-ary array based tree should work fine for the large data-set.



But as this is homework, the correctness of the solution
should be what matters, and the speed should not be an issue at all (as
long as the demo completes before the professor gets bored ;-).

LOL!

The only reason why I am doing this is so I can get some extra credit. I am
interested in getting a good grade, and plan on becoming at least a fairly
decent programmer. Perhaps I can even make some $$$ in the future.
 
J

James

Paavo Helde said:
James said:
I admit that is a cop out, however I don't really want to get into the
"hardware level" jest yet. However, AFAICT, there would be an upper
bound on the number of cache misses per-call to 'cmap::prv_find()'?
The size of the data-structure is static and that's that, so to speak.

Sure there is an upper bound. This does not mean that the approach is the
fastest. The big-O calculations only make sense if N is large, with N=5
they are meaningless.
Agreed.



Do you mean something like:
[...]

Why so complex?

Ummm, I don't know. That's just what popped out...

;^/

What I had in mind whas something like

char const* const s = "command";

typedef unsigned char uc; // for brevity

uint32_t x = (uc(s[0])<<24) | (uc(s[1])<<16) | (uc(s[2])<<16) | uc(s[3]);



[...]
This is probably going to sound retarded, but I personally like to
encapsulate everything in main within an extra scope in order to
ensure that all objects are destructed before I actually return from
main. Imagine I wanted to print a final message that occurs after all
messages from the object dtors are printed:

OK, but you did not have that final printout in the original example.

It's a habit.



OK, but for the homework only. In real life you cannot really use static
data, as this is hostile to multi-threading.


I am NOT a threading expert in any way shape or form, however AFAICT, the
following pseudo-code would be 100% thread-safe, and does not need any
synchronization primitives:




// [snip translator impl code]

static translator const* g_translator = NULL;


// [snip cmap impl code]


typedef cmap<void, char const*> cmap_type;


static cmap_type const* g_cmdmap = NULL;


#define ARRAY_DEPTH(t) (sizeof(t) / sizeof(t[0]))


static char const* g_cmds[] =
{
"north",
"east",
"west",
"south",
"wield"
};


extern "C"
void* pthread_entry(void* state)
{
for (;;)
{
g_commands->exec(g_cmds[rand() % ARRAY_DEPTH(g_cmds)]);
}

return NULL;
}


int main()
{
{
static translator const l_translator;
static cmap_type cmdmap(command_unknown);


// make the above visible to the outside world...
g_translator = &l_translator;
g_cmdmap = &cmdmap;


// add all of the commands in g_cmds to the cmdmap


// create threads


// join threads
}

return 0;
}




All mutations to shared data are done before the threads are created. All of
the static data-structures are fully initialized before the threads are
created. Threads perform read-only accessed to the command map and it's
translator.


Am I missing anything obvious here?



[...]
BTW, can you think of a faster way to perform the lookup process???

Given that the number of used entries in the map is 5 as in your
example, it is quite certain that anything which uses a compact map
(of size about 20 bytes) will be faster than your solution (using
more than 500000 bytes) (of course assuming there is any kind of
memory caching present in the system).

I fully intend to insert many command strings with random names
assigned to random command functions. I have to turn this in on
Monday. IMVVVHO, the sparse 256-ary array based tree should work fine
for the large data-set.

Yes, if N is large, the big-O rules should kick in. If N is >100000, then
certainly your data structure is optimal.

That's what I was hoping for.



I can understand that. Your solution is quite fine in my mind, as a
homework.

Thank you! :^)



I would not bet though that you can get some extra credit by
that. If I were a professor I would probably like it more if you
implemented a custom binary search function, instead of creating large
data arrays containing mostly emptiness.

Humm... I still have time to change it to a binary search. In fact, since I
can use 32-bit unsigned integers, I think the following hashed tree
implementation I came up with should work out quite well:

http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1

What do you think of that? Should I use it instead? Humm...



Especially the motivation is
very vague - you claim this is the fastest approach, without doing any
measurements to back up your claim. I would at least make a prototype
with std::map and compare with your approach, to see if the "O(8)" claim
is justified.

You are 100% correct. I should test it against a std::map. BTW, the reason
why I use O(8) is because there are 4 access to the translator table, and 4
access to the 4D table in the command map regardless of how many items are
in the "container".



Note that in real life your solution has little chances. In a real
program nobody wants to have a component which cannot be multithreaded
and takes 1000 times more memory than necessary, even if might work a bit
faster than the alternatives.

AFAICT, I do think it can be multi-threaded ___IF___ all the commands are
inserted ___BEFORE___ any threads are created. In fact, well, I hope I don't
make a massive retarded fool out of my self here, but I think you could add
commands to the map while threads are reading it by using an atomic swap.
The data-structure is static such that it does not grow or shrink, and the
memory cells are the size of a function pointer. So, if the function pointer
is the size of a void* pointer, and I have atomic swap that can operate on
pointers, then an update to the map could look like this:


fp_command cmap::add(char const* name, fp_command cmd)
{
fp_command& cell = prv_find(name);

return atomic_swap(&cell, cmd);
}


This would overwrite commands that already exist and return the previous
value. If this is unacceptable, you could might be able to use atomic
compare and swap:

bool cmap::add_cas(char const* name, fp_command cmp, fp_command cmd)
{
fp_command& cell = prv_find(name);

return atomic_cas(&cell, cmp, cmd);
}



I have no idea if it will actually work the way I think it should work or
not! I am most likely missing something important.

YIKES!



It seems you have misunderstood the goal of
professional programming.

Oh crap!

;^(...



It is usually not about producing the fastest
code possible - it is all about finishing the project in the schedule,
with code having acceptable correctness, speed, size, ease-of-usage,
etc., for the end user, plus acceptable in-house maintainability.

I will definitely keep those wise words in mind. Thanks Paavo for all your
valuable input, I really do appreciate it!


:^)
 
C

Chris M. Thomasson

James Kanze said:
[...]
#define DEPTH 27
Why 27? (instead of 26)

For the final '\0' when the string is less than 4 characters.
He could save a little memory by making the first dimension 26.
If he implemented it as a true trie, he would probably save a
lot more, since a lot of the sub-tries would probably be empty;
a dynamically constructed trie would definitely be a legitimate
answer to the problem, and would probably be rather fast. In
which case, something like:

struct TrieNode
{
void* action;
TrieNode* next[16];
};

would do the trick.

I am seriously thinking about using the following tree algorithm:

http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1
 
J

James

DAMN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


My cousin is going to kill me. He let me borrow his laptop so I can complete
this damn college course. It's loaded with some of his programming examples
and code sketches. I think I will just use a different newsreader. BTW, any
suggestions on a decent news reader? I don't want to send news out on the
wrong Outlook account again.
 
C

Chris M. Thomasson

AFAICT, I do think it can be multi-threaded ___IF___ all the commands are
inserted ___BEFORE___ any threads are created. In fact, well, I hope I
don't make a massive retarded fool out of my self here, but I think you
could add commands to the map while threads are reading it by using an
atomic swap. The data-structure is static such that it does not grow or
shrink, and the memory cells are the size of a function pointer. So, if
the function pointer is the size of a void* pointer, and I have atomic
swap that can operate on pointers, then an update to the map could look
like this:


fp_command cmap::add(char const* name, fp_command cmd)
{
fp_command& cell = prv_find(name);

return atomic_swap(&cell, cmd);
}


This would overwrite commands that already exist and return the previous
value. If this is unacceptable, you could might be able to use atomic
compare and swap:

bool cmap::add_cas(char const* name, fp_command cmp, fp_command cmd)
{
fp_command& cell = prv_find(name);

return atomic_cas(&cell, cmp, cmd);
}



I have no idea if it will actually work the way I think it should work or
not! I am most likely missing something important.

YIKES!

James, you should focus on your class instead of multi-threading. However,
to my amazement, memory visibility aside, you're idea would actually work!

Wow!
 
C

Chris M. Thomasson

James said:
DAMN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


My cousin is going to kill me. He let me borrow his laptop so I can
complete this damn college course. It's loaded with some of his
programming examples and code sketches. I think I will just use a
different newsreader. BTW, any suggestions on a decent news reader? I
don't want to send news out on the wrong Outlook account again.


I want my damn laptop back!!! I coming over to you're house to thrash you.


LOL, just kidding James.

;^D



You can go ahead and delete my Outlook account if you want. BTW, I use the
PAN newsreader on Linux and it works fairly well.
 
C

Chris M. Thomasson

James said:
OOPS! The last two posts were using the wrong account. Sorry Chris! I am
Chris' cousin and am learning C++ at the local community college.

You better get an A+!

;^)
 
J

James

James Kanze said:
typedef unsigned int uint32_type;
typedef char assert_s
[
sizeof(uint32_type) == sizeof(char) * 4
&& sizeof(uint32_type) * CHAR_BIT == 32
? 1 : -1
];
typedef char cmd_buffer[4];
union cmd
{
cmd_buffer buffer;
uint32_type whole;
};
then decoding 4 chars at once like:
union cmd c;
char const* const string = "command";
c.whole = *((*uint32_type)string);

More likely, he was thinking of something like:

uint_fast32_t i = ((c[0] & 0xFF) << 24)
| ((c[1] & 0xFF) << 16)
| ((c[2] & 0xFF) << 8)
| ((c[3] & 0xFF) );

This is guaranteed to work on any machine.

Yeah. I don't quite know why I used a union.



There is no such thing as O(8). The complexity is O(1).

Yikes! I was errounsly counting each diminsion as a seperate access:

return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];

Humm... Okay. I count 4 separate accesses into the g_translator table. I
also count a single access into the m_table.m_fp table. Why would that not
be O(5)? I guess I need to break out of that line of thinking.


Thanks James.



But
the constant overhead is high due to the lack of locality. It's
a relatively poor space/speed trade-off, given that on modern
machines, excess space results in a noticeable slow-down.
Better solutions exist.


Array access is O(1), regardless of the size of the array. But
that's purely a theoretical figure. In practice, if the array
is very sparsely populated, the program will run slower,
compared to other algorithms.


Binary search is O(lg n). For small n (less than a couple of
hundred), the difference is not significant. (In the tests I've
done, binary search on a sorted vector or std::map beats hash
coding up to somewhere between 150 and 200 entries. Even though
hash coding is O(1) and the binary search or std::map are O(ln
n).

have you tries using std::map as buckets in a static hash map?


struct my_map
{
std::map<...> m_buckets[DEPTH];
};

?



Big O complexity is only significant when the number of
elements becomes large. (How large depends on the difference in
the big O factor. The difference between constant and quadratic
can become significant very quickly. The difference between
constant an logrithmic does't become significant until you
really do have a lot of entries.

I am going to populate the table with very many of entries (e.g., hundreds
of thousands).



Below are just some stylistic comments...
int main()
{
{
Why extra braces?
This is probably going to sound retarded, but I personally
like to encapsulate everything in main within an extra scope
in order to ensure that all objects are destructed before I
actually return from main.

That's interesting. Generally (in actual applications---not
necessarily is homework type projects), main will only be a
couple of function calls, maybe one to check the command line
arguments, and one to do the actual work; in the case of a Unix
like filter, you might put the loop over the filenames in main,
but that's about it.

Rather than extra braces, I'd just put the processing itself in
a separate function. (But this is perhaps more a production
code consideration. If you don't have command line options or
arguments, configuration files, etc., it's probably overkill.)

Do you think it's a bad habit of mine to add the extra braces? A separate
function would most certainly work very well.



Imagine I wanted to print a final message that occurs after
all messages from the object dtors are printed:

You can't, because you don't know the order the destructors of
static objects will be called:).
Touché!

:^)



[...]
static cmap<void, char const*> cm_normal(command_unknown);
static cmap<void, char const*> cm_reverse(command_unknown);
Why static?
They blow the stack on a Windows platform.

You can change the default stack size at link time, if that's
all that's bothering you.
BTW, I don't really see a need for dynamic memory allocation
here.

For large tables, it doesn't hurt anything. It costs the same
to allocate int[5] and int[5000000]. In the first case, that
cost has to be amortised over accessing 5 elements, which means
it may be significant. In the second, it's amortised over
accessing 5000000 elements, which means that it's practically
nothing.

I clearly need to change my line of thinking wrt this issue. When I polish
this up I will post the code again so you call all see if I have learned
anything...

;^)



[...]
BTW, can you think of a faster way to perform the lookup
process???
Given that the number of used entries in the map is 5 as in
your example, it is quite certain that anything which uses a
compact map (of size about 20 bytes) will be faster than
your solution (using more than 500000 bytes) (of course
assuming there is any kind of memory caching present in the
system).
I fully intend to insert many command strings with random
names assigned to random command functions. I have to turn
this in on Monday. IMVVVHO, the sparse 256-ary array based
tree should work fine for the large data-set.

You mean a 4*27 array, don't you. 4*256 probably won't fit into
memory.

I made a stupid mistake! I was thinking of the translator array which is 256
elements wide.


Humm... I do still think of the 4*27 array as being similar to a 4-level
27-are trie. Am I off my rocker James?!

;^/



If you're looking for the simplest solution, then linear search
is fine. If the number of entries because too large, you can
either sort the array and use binary search, or switch to a trie
or a hash table. (I'd probably use a trie, but I've a certain
experience with this sort of code, and it probably wouldn't take
me more than an hour to implement it. If you've never done
anything similar before, it could easily take a day or two, once
you've fully understood the algorithm.)

I created this tree with this homework problem in mind:

http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1

Do you think I should use it instead of the 4D array?



If making money is your goal, the commercial side generally pays
a lot better:). (Bill Gate's original Basic used a linear
search in the name table. It was, in fact, recommended to use
the most frequently used variables at the beginning of the
program, so they would be at the front of the table.)

:^)






BTW, thank you for all of your expert advise James! I have been lurking
around this news groups for a while and you're input is invaluable, IMVHO of
course.

;^)
 
J

James

James said:
There is no such thing as O(8). The complexity is O(1).

Yikes! I was errounsly counting each diminsion as a seperate access:

return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];

Humm... Okay. I count 4 separate accesses into the g_translator table.


That could be rewritten as:


size_t d1 = g_translator[name[0]];
size_t d2 = g_translator[name[1]];
size_t d3 = g_translator[name[2]];
size_t d4 = g_translator[name[3]];


return m_table.m_fp[d1][d2][d3][d4];





Actually, there are 4 accesses to the name array:



char c1 = name[0];
char c2 = name[1];
char c3 = name[2];
char c4 = name[3];

size_t d1 = g_translator[c1];
size_t d2 = g_translator[c2];
size_t d3 = g_translator[c3];
size_t d4 = g_translator[c4];

return m_table.m_fp[d1][d2][d3][d4];




Can I legitimately claim that the code above is O(1)?



Please help me clear my ignorance!

;^(




[...]
 
C

Chris M. Thomasson

Chris M. Thomasson said:
James, you should focus on your class instead of multi-threading. However,
to my amazement, memory visibility aside, you're idea would actually work!

Wow!

Just to clarify, you will need to use acquire/release semantics if there is
any data that is dependant on the insertion. For instance, if thread `A'
created datum `D' and inserted a command to the map that gave other threads
access to `D', then you would need to ensure that D is rendered into a fully
visible and coherent state before it becomes visible to threads other than
`A'.
 
C

Chris M. Thomasson

James said:
Do you think it's a bad habit of mine to add the extra braces? A separate
function would most certainly work very well.

I sometimes do the exact same thing. So, no I don't think it's a bad
habit...

lol

[...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top