code review / efficient lookup techniques...

Discussion in 'C++' started by James, Nov 10, 2009.

  1. James

    James Guest

    BTW, this is homework. I hope I am not out of line here...


    I am in the process of learning C++. I have been given a task to complete
    which involves mapping strings to a value. The minimum length of a string is
    4 characters and only uses lowercase alphabetic characters. It needs to work
    with more than just ASCII, (e.g., EBCDIC) I need to do this mapping without
    using any standard containers. I immediately thought of using a 4 dimintinal
    array of values to do a very fast lookup:


    #define DEPTH (UCHAR_MAX + 1)

    void* array[DEPTH][DEPTH][DEPTH][DEPTH];


    However, this uses way to much memory. So, I thought of using the following
    table:


    #define DEPTH 27

    void* array[DEPTH][DEPTH][DEPTH][DEPTH];


    Which should handle all lowercase letters. However, I need to translate the
    character value to an index into this array. So, here is what I have come up
    with so far:




    #include <iostream>
    #include <climits>
    #include <cctype>
    #include <cstring>
    #include <cassert>




    class translator
    {
    static unsigned char g_chars[UCHAR_MAX + 1];
    static char const* const g_alpha;


    public:
    translator()
    {
    for (int i = 0; i <= UCHAR_MAX; ++i)
    {
    if (isalpha(i) && islower(i))
    {
    for (int a = 0; a < 26; ++a)
    {
    if (i == g_alpha[a])
    {
    g_chars = a;
    break;
    }
    }
    }
    }
    }


    public:
    inline int operator [] (char c) const
    {
    return g_chars[(unsigned int)c];
    }
    };

    unsigned char
    translator::g_chars[UCHAR_MAX + 1] = { 0 };

    char const* const
    translator::g_alpha = "abcdefghijklmnopqrstuvwxyz";

    static translator g_translator;




    template<typename TR, typename T>
    class cmap
    {
    typedef TR (*fp_command) (T);


    struct table
    {
    fp_command m_fp[27][27][27][27];
    };


    private:
    table m_table;


    public:
    cmap(fp_command fp_default)
    {
    for (std::size_t i1 = 0; i1 < 27; ++i1)
    {
    for (std::size_t i2 = 0; i2 < 27; ++i2)
    {
    for (std::size_t i3 = 0; i3 < 27; ++i3)
    {
    for (std::size_t i4 = 0; i4 < 27; ++i4)
    {
    m_table.m_fp[i1][i2][i3][i4] = fp_default;
    }
    }
    }
    }
    }


    private:
    inline fp_command& prv_find(char const* name)
    {
    assert(strlen(name) >= 4);

    return m_table.m_fp
    [g_translator[name[0]]]
    [g_translator[name[1]]]
    [g_translator[name[2]]]
    [g_translator[name[3]]];
    }

    inline fp_command prv_find(char const* name) const
    {
    assert(strlen(name) >= 4);

    return m_table.m_fp
    [g_translator[name[0]]]
    [g_translator[name[1]]]
    [g_translator[name[2]]]
    [g_translator[name[3]]];
    }


    public:
    void add(char const* name, fp_command cmd)
    {
    prv_find(name) = cmd;
    }


    inline TR exec(char const* name, T P) const
    {
    return prv_find(name)(P);
    }
    };




    void
    command_north(char const* str)
    {
    std::cout << "void command_north(char const* "
    << str
    << ")"
    << std::endl;
    }


    void
    command_south(char const* str)
    {
    std::cout << "void command_south(char const* "
    << str
    << ")"
    << std::endl;
    }


    void
    command_east(char const* str)
    {
    std::cout << "void command_east(char const* "
    << str
    << ")"
    << std::endl;
    }

    void
    command_west(char const* str)
    {
    std::cout << "void command_west(char const* "
    << str
    << ")"
    << std::endl;
    }

    void
    command_attack(char const* str)
    {
    std::cout << "void command_attack(char const* "
    << str
    << ")"
    << std::endl;
    }

    void
    command_unknown(char const* str)
    {
    std::cout << "void command_unknown(char const* "
    << str
    << ")"
    << std::endl;
    }




    int main()
    {
    {
    static cmap<void, char const*> cm_normal(command_unknown);
    static cmap<void, char const*> cm_reverse(command_unknown);

    cm_normal.add("north", command_north);
    cm_normal.add("south", command_south);
    cm_normal.add("east", command_east);
    cm_normal.add("west", command_west);
    cm_normal.add("attack", command_attack);

    cm_reverse.add("north", command_south);
    cm_reverse.add("south", command_north);
    cm_reverse.add("east", command_west);
    cm_reverse.add("west", command_east);
    cm_reverse.add("attack", command_attack);

    cm_normal.exec("north", "Message1");
    cm_normal.exec("south", "Message2");
    cm_normal.exec("east", "Message3");
    cm_normal.exec("west", "Message4");
    cm_normal.exec("west", "Message5");
    cm_normal.exec("weild", "Message6");
    cm_normal.exec("attack", "Big Monster!");
    cm_normal.exec("fight", "Message7");
    cm_normal.exec("1234xx566", "Message8");
    cm_normal.exec("zzzz", "Message9");

    std::cout << "----------------------------" << std::endl;

    cm_reverse.exec("north", "Message1");
    cm_reverse.exec("south", "Message2");
    cm_reverse.exec("east", "Message3");
    cm_reverse.exec("west", "Message4");
    cm_reverse.exec("west", "Message5");
    cm_reverse.exec("weild", "Message6");
    cm_reverse.exec("attack", "Big Monster!");
    cm_reverse.exec("fight", "Message7");
    cm_reverse.exec("1234xx566", "Message8");
    cm_reverse.exec("zzzz", "Message9");
    }

    return 0;
    }




    It seems to work, but I am a but uneasy... What do you all think of this? Am
    I going about solving the problem in an ass-backwards manner?



    Thanks.



    BTW, can you think of a faster way to perform the lookup process???
     
    James, Nov 10, 2009
    #1
    1. Advertising

  2. "James" <> wrote in message news:hde75o$rt5$...
    > BTW, this is homework. I hope I am not out of line here...
    >
    >
    > I am in the process of learning C++. I have been given a task to complete
    > which involves mapping strings to a value. The minimum length of a string
    > is 4 characters and only uses lowercase alphabetic characters. It needs to
    > work with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
    > without using any standard containers. I immediately thought of using a 4
    > dimintinal array of values to do a very fast lookup:

    [...]

    4 dimensional

    of course!

    ARGH.
     
    Chris M. Thomasson, Nov 10, 2009
    #2
    1. Advertising

  3. "James" <> wrote in message news:hde75o$rt5$...
    > BTW, this is homework. I hope I am not out of line here...
    >
    >
    > I am in the process of learning C++. I have been given a task to complete
    > which involves mapping strings to a value. The minimum length of a string
    > is 4 characters and only uses lowercase alphabetic characters.


    I should clarify... The minimum length of a string is 4, and the maximum
    characters evaluated are 4. So the following commands are going to map to
    identical functions:


    hello
    hell
    hell1234




    > It needs to work with more than just ASCII, (e.g., EBCDIC) I need to do
    > this mapping without using any standard containers. I immediately thought
    > of using a 4 dimintinal array of values to do a very fast lookup:

    [...]
     
    Chris M. Thomasson, Nov 11, 2009
    #3
  4. James

    James Guest

    "Chris M. Thomasson" <> wrote in message
    news:hdeehj$5m3$...
    >
    > "James" <> wrote in message news:hde75o$rt5$...
    >> BTW, this is homework. I hope I am not out of line here...
    >>
    >>
    >> I am in the process of learning C++. I have been given a task to complete
    >> which involves mapping strings to a value. The minimum length of a string
    >> is 4 characters and only uses lowercase alphabetic characters.

    >
    > I should clarify... The minimum length of a string is 4, and the maximum
    > characters evaluated are 4. So the following commands are going to map to
    > identical functions:
    >
    >
    > hello
    > hell
    > hell1234
    >
    >
    >
    >
    >> It needs to work with more than just ASCII, (e.g., EBCDIC) I need to do
    >> this mapping without using any standard containers. I immediately thought
    >> of using a 4 dimintinal array of values to do a very fast lookup:

    > [...]



    OOPS! The last two posts were using the wrong account. Sorry Chris! I am
    Chris' cousin and am learning C++ at the local community college.

    ;^(...
     
    James, Nov 11, 2009
    #4
  5. James

    James Guest

    "Paul N" <> wrote in message
    news:...
    > On 10 Nov, 23:29, "James" <> wrote:
    >> BTW, this is homework. I hope I am not out of line here...
    >>
    >> I am in the process of learning C++. I have been given a task to complete
    >> which involves mapping strings to a value. The minimum length of a string
    >> is
    >> 4 characters and only uses lowercase alphabetic characters. It needs to
    >> work
    >> with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
    >> without
    >> using any standard containers. I immediately thought of using a 4
    >> dimintinal
    >> array of values to do a very fast lookup:
    >>
    >> #define DEPTH (UCHAR_MAX + 1)
    >>
    >> void* array[DEPTH][DEPTH][DEPTH][DEPTH];
    >>
    >> However, this uses way to much memory.

    >
    > Do you *need* to do it quickly, or is this a requirement that you have
    > added yourself to the task you were set?


    I am going for some extra credits...

    ;^)




    > If speed is not an issue, I would be inclined to write a routine that
    > converts each of the letters into a number and then multiply up. For
    > instance, set a to be the value of the first letter, b the second, c
    > the third and d the fourth according to the arrangement
    >
    > 'a' = 1, 'b' = 2, ... 'z' = 26, 0 if string already finished
    >
    > and then compute
    >
    > value = a + b * 27 + c * 27*27 + d * 27*27*27
    >
    > The converting function is pretty quick in ASCII (just a subtraction,
    > as long as you have some string left) and only a little slower for
    > EBCDIC.
    >
    > Hope that helps.


    I am not sure I understand. Are you suggesting to use the computed value as
    a key in a binary tree or something?
     
    James, Nov 11, 2009
    #5
  6. James

    James Guest

    "James" <> wrote in message news:hdfcol$c3d$...
    > "Paul N" <> wrote in message
    > news:...
    >> On 10 Nov, 23:29, "James" <> wrote:
    >>> BTW, this is homework. I hope I am not out of line here...
    >>>
    >>> I am in the process of learning C++. I have been given a task to
    >>> complete
    >>> which involves mapping strings to a value. The minimum length of a
    >>> string is
    >>> 4 characters and only uses lowercase alphabetic characters. It needs to
    >>> work
    >>> with more than just ASCII, (e.g., EBCDIC) I need to do this mapping
    >>> without
    >>> using any standard containers. I immediately thought of using a 4
    >>> dimintinal
    >>> array of values to do a very fast lookup:
    >>>
    >>> #define DEPTH (UCHAR_MAX + 1)
    >>>
    >>> void* array[DEPTH][DEPTH][DEPTH][DEPTH];
    >>>
    >>> However, this uses way to much memory.

    >>
    >> Do you *need* to do it quickly, or is this a requirement that you have
    >> added yourself to the task you were set?

    >
    > I am going for some extra credits...
    >
    > ;^)


    I guess you could say that the technique I used is basically equal to a
    (UCHAR_MAX + 1)-ary trie.
     
    James, Nov 11, 2009
    #6
  7. James

    Paul N Guest

    On 10 Nov, 23:29, "James" <> wrote:
    > BTW, this is homework. I hope I am not out of line here...
    >
    > I am in the process of learning C++. I have been given a task to complete
    > which involves mapping strings to a value. The minimum length of a string is
    > 4 characters and only uses lowercase alphabetic characters. It needs to work
    > with more than just ASCII, (e.g., EBCDIC) I need to do this mapping without
    > using any standard containers. I immediately thought of using a 4 dimintinal
    > array of values to do a very fast lookup:
    >
    > #define DEPTH (UCHAR_MAX + 1)
    >
    > void* array[DEPTH][DEPTH][DEPTH][DEPTH];
    >
    > However, this uses way to much memory.


    Do you *need* to do it quickly, or is this a requirement that you have
    added yourself to the task you were set?

    If speed is not an issue, I would be inclined to write a routine that
    converts each of the letters into a number and then multiply up. For
    instance, set a to be the value of the first letter, b the second, c
    the third and d the fourth according to the arrangement

    'a' = 1, 'b' = 2, ... 'z' = 26, 0 if string already finished

    and then compute

    value = a + b * 27 + c * 27*27 + d * 27*27*27

    The converting function is pretty quick in ASCII (just a subtraction,
    as long as you have some string left) and only a little slower for
    EBCDIC.

    Hope that helps.
    Paul.
     
    Paul N, Nov 11, 2009
    #7
  8. James schrieb:
    > BTW, this is homework. I hope I am not out of line here...
    >
    >
    > I am in the process of learning C++. I have been given a task to
    > complete which involves mapping strings to a value. The minimum length
    > of a string is 4 characters and only uses lowercase alphabetic
    > characters. It needs to work with more than just ASCII, (e.g., EBCDIC) I
    > need to do this mapping without using any standard containers. I
    > immediately thought of using a 4 dimintinal array of values to do a very
    > fast lookup:

    [...]
    > BTW, can you think of a faster way to perform the lookup process???


    I don't know if it is faster, but you could use a binary search on a
    sorted array:

    1) build an array of (key, value) pairs
    2) sort the array using std::lexicographical_compare, strcmp or a
    hand-written loop
    3) search the lookup string in the array with binary search using above
    compare function.

    http://en.wikipedia.org/wiki/Binary_search_algorithm

    --
    Thomas
     
    Thomas J. Gritzan, Nov 11, 2009
    #8
  9. James

    James Kanze Guest

    On Nov 11, 11:12 pm, Paavo Helde <> wrote:
    > "James" <> wrote innews:hde75o$rt5$:


    [...]
    > > #define DEPTH 27


    > Why 27? (instead of 26)


    For the final '\0' when the string is less than 4 characters.
    He could save a little memory by making the first dimension 26.
    If he implemented it as a true trie, he would probably save a
    lot more, since a lot of the sub-tries would probably be empty;
    a dynamically constructed trie would definitely be a legitimate
    answer to the problem, and would probably be rather fast. In
    which case, something like:

    struct TrieNode
    {
    void* action;
    TrieNode* next[16];
    };

    would do the trick.

    --
    James Kanze
     
    James Kanze, Nov 12, 2009
    #9
  10. James

    James Guest

    "Paavo Helde" <> wrote in message
    news:Xns9CC1C54A5C97paavo256@216.196.109.131...
    > "James" <> wrote in news:hde75o$rt5$:
    >
    >> BTW, this is homework. I hope I am not out of line here...
    >>
    >>
    >> I am in the process of learning C++. I have been given a task to
    >> complete which involves mapping strings to a value. The minimum length
    >> of a string is 4 characters and only uses lowercase alphabetic
    >> characters. It needs to work with more than just ASCII, (e.g., EBCDIC)
    >> I need to do this mapping without using any standard containers. I
    >> immediately thought of using a 4 dimintinal array of values to do a
    >> very fast lookup:

    >
    > Such a 4-dimensional array is quite large, and according to your example,
    > very sparse.


    If I understand you correctly, yes I agree; storing 5 objects in a memory
    hog of a data-structure is wasteful indeed.




    > Accessing it in random fashion will probably cause a lot of
    > cache misses, so it is not given that this is the fastest lookup
    > possible.


    I admit that is a cop out, however I don't really want to get into the
    "hardware level" jest yet. However, AFAICT, there would be an upper bound on
    the number of cache misses per-call to 'cmap::prv_find()'? The size of the
    data-structure is static and that's that, so to speak.




    > A tag of 4 characters/bytes immediately suggests that you could interpret
    > those 4 characters as 32-bit unsigned integers (unsigned is needed, as
    > signed integers can have trap values).


    Do you mean something like:




    typedef unsigned int uint32_type;


    typedef char assert_s
    [
    sizeof(uint32_type) == sizeof(char) * 4
    && sizeof(uint32_type) * CHAR_BIT == 32
    ? 1 : -1
    ];


    typedef char cmd_buffer[4];


    union cmd
    {
    cmd_buffer buffer;
    uint32_type whole;
    };




    then decoding 4 chars at once like:


    union cmd c;

    char const* const string = "command";

    c.whole = *((*uint32_type)string);


    ?




    > (This should work for EBCDIC as
    > well as far as both your source code and the initial data for
    > initializing the map are in the same codepage.) Thus the mapping is
    > simplified to integer-to-value. One could hold these in a sorted array
    > and perform a binary search (or even a linear search in an unsorted array
    > if the number of entries is small, placing the more frequent commands in
    > the beginning).


    Please correct me if I am way off base here but I think the complexity for
    the simple and naive algorithm I posted should be O(8) complexity for insert
    and lookup operations. This tells me that storing a large amount of items in
    the array will have no impact on the complexity whatsoever. I cannot exactly
    claim that for a binary search whose complexity parameters can be effected
    by N.




    > Below are just some stylistic comments...


    [...]


    >> int main()
    >> {
    >> {

    >
    > Why extra braces?


    This is probably going to sound retarded, but I personally like to
    encapsulate everything in main within an extra scope in order to ensure that
    all objects are destructed before I actually return from main. Imagine I
    wanted to print a final message that occurs after all messages from the
    object dtors are printed:


    #include <iostream>

    struct foo
    {
    ~foo()
    {
    std::cout << "~foo()" << std::endl;
    }
    };


    int main()
    {
    foo f;
    std::cout << "the program is finished" << std::endl;
    return 0;
    }


    If I want the output to be like:


    ~foo()
    the program is finished



    I might do something like:

    int main()
    {
    {
    foo f;
    }

    std::cout << "the program is finished" << std::endl;
    return 0;
    }




    Does that make any sense at all?

    ;^o




    >> static cmap<void, char const*> cm_normal(command_unknown);
    >> static cmap<void, char const*> cm_reverse(command_unknown);

    >
    > Why static?


    They blow the stack on a Windows platform. BTW, I don't really see a need
    for dynamic memory allocation here.




    [...]

    >> BTW, can you think of a faster way to perform the lookup process???

    >
    > Given that the number of used entries in the map is 5 as in your example,
    > it is quite certain that anything which uses a compact map (of size about
    > 20 bytes) will be faster than your solution (using more than 500000
    > bytes) (of course assuming there is any kind of memory caching present in
    > the system).


    I fully intend to insert many command strings with random names assigned to
    random command functions. I have to turn this in on Monday. IMVVVHO, the
    sparse 256-ary array based tree should work fine for the large data-set.




    > But as this is homework, the correctness of the solution
    > should be what matters, and the speed should not be an issue at all (as
    > long as the demo completes before the professor gets bored ;-).


    LOL!

    The only reason why I am doing this is so I can get some extra credit. I am
    interested in getting a good grade, and plan on becoming at least a fairly
    decent programmer. Perhaps I can even make some $$$ in the future.
     
    James, Nov 12, 2009
    #10
  11. James

    James Guest

    "Paavo Helde" <> wrote in message
    news:Xns9CC26B60A5203paavo256@216.196.109.131...
    > "James" <> wrote in news:hdip65$283$:
    >> "Paavo Helde" <> wrote in message
    >> news:Xns9CC1C54A5C97paavo256@216.196.109.131...
    >>> "James" <> wrote in news:hde75o$rt5$:

    [...]
    >>> Accessing it in random fashion will probably cause a lot of
    >>> cache misses, so it is not given that this is the fastest lookup
    >>> possible.

    >>
    >> I admit that is a cop out, however I don't really want to get into the
    >> "hardware level" jest yet. However, AFAICT, there would be an upper
    >> bound on the number of cache misses per-call to 'cmap::prv_find()'?
    >> The size of the data-structure is static and that's that, so to speak.

    >
    > Sure there is an upper bound. This does not mean that the approach is the
    > fastest. The big-O calculations only make sense if N is large, with N=5
    > they are meaningless.


    Agreed.




    >>> A tag of 4 characters/bytes immediately suggests that you could
    >>> interpret those 4 characters as 32-bit unsigned integers (unsigned is
    >>> needed, as signed integers can have trap values).

    >>
    >> Do you mean something like:

    [...]

    >> ?

    >
    > Why so complex?


    Ummm, I don't know. That's just what popped out...

    ;^/


    > What I had in mind whas something like
    >
    > char const* const s = "command";
    >
    > typedef unsigned char uc; // for brevity
    >
    > uint32_t x = (uc(s[0])<<24) | (uc(s[1])<<16) | (uc(s[2])<<16) | uc(s[3]);





    [...]
    >>>> int main()
    >>>> {
    >>>> {
    >>>
    >>> Why extra braces?

    >>
    >> This is probably going to sound retarded, but I personally like to
    >> encapsulate everything in main within an extra scope in order to
    >> ensure that all objects are destructed before I actually return from
    >> main. Imagine I wanted to print a final message that occurs after all
    >> messages from the object dtors are printed:

    >
    > OK, but you did not have that final printout in the original example.


    It's a habit.




    >>>> static cmap<void, char const*> cm_normal(command_unknown);
    >>>> static cmap<void, char const*> cm_reverse(command_unknown);
    >>>
    >>> Why static?

    >>
    >> They blow the stack on a Windows platform. BTW, I don't really see a
    >> need for dynamic memory allocation here.
    >>

    >
    > OK, but for the homework only. In real life you cannot really use static
    > data, as this is hostile to multi-threading.



    I am NOT a threading expert in any way shape or form, however AFAICT, the
    following pseudo-code would be 100% thread-safe, and does not need any
    synchronization primitives:




    // [snip translator impl code]

    static translator const* g_translator = NULL;


    // [snip cmap impl code]


    typedef cmap<void, char const*> cmap_type;


    static cmap_type const* g_cmdmap = NULL;


    #define ARRAY_DEPTH(t) (sizeof(t) / sizeof(t[0]))


    static char const* g_cmds[] =
    {
    "north",
    "east",
    "west",
    "south",
    "wield"
    };


    extern "C"
    void* pthread_entry(void* state)
    {
    for (;;)
    {
    g_commands->exec(g_cmds[rand() % ARRAY_DEPTH(g_cmds)]);
    }

    return NULL;
    }


    int main()
    {
    {
    static translator const l_translator;
    static cmap_type cmdmap(command_unknown);


    // make the above visible to the outside world...
    g_translator = &l_translator;
    g_cmdmap = &cmdmap;


    // add all of the commands in g_cmds to the cmdmap


    // create threads


    // join threads
    }

    return 0;
    }




    All mutations to shared data are done before the threads are created. All of
    the static data-structures are fully initialized before the threads are
    created. Threads perform read-only accessed to the command map and it's
    translator.


    Am I missing anything obvious here?




    >> [...]
    >>
    >>>> BTW, can you think of a faster way to perform the lookup process???
    >>>
    >>> Given that the number of used entries in the map is 5 as in your
    >>> example, it is quite certain that anything which uses a compact map
    >>> (of size about 20 bytes) will be faster than your solution (using
    >>> more than 500000 bytes) (of course assuming there is any kind of
    >>> memory caching present in the system).

    >>
    >> I fully intend to insert many command strings with random names
    >> assigned to random command functions. I have to turn this in on
    >> Monday. IMVVVHO, the sparse 256-ary array based tree should work fine
    >> for the large data-set.

    >
    > Yes, if N is large, the big-O rules should kick in. If N is >100000, then
    > certainly your data structure is optimal.


    That's what I was hoping for.




    >>> But as this is homework, the correctness of the solution
    >>> should be what matters, and the speed should not be an issue at all
    >>> (as long as the demo completes before the professor gets bored ;-).

    >>
    >> LOL!
    >>
    >> The only reason why I am doing this is so I can get some extra credit.
    >> I am interested in getting a good grade, and plan on becoming at least
    >> a fairly decent programmer. Perhaps I can even make some $$$ in the
    >> future.

    >
    > I can understand that. Your solution is quite fine in my mind, as a
    > homework.


    Thank you! :^)




    > I would not bet though that you can get some extra credit by
    > that. If I were a professor I would probably like it more if you
    > implemented a custom binary search function, instead of creating large
    > data arrays containing mostly emptiness.


    Humm... I still have time to change it to a binary search. In fact, since I
    can use 32-bit unsigned integers, I think the following hashed tree
    implementation I came up with should work out quite well:

    http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1

    What do you think of that? Should I use it instead? Humm...




    > Especially the motivation is
    > very vague - you claim this is the fastest approach, without doing any
    > measurements to back up your claim. I would at least make a prototype
    > with std::map and compare with your approach, to see if the "O(8)" claim
    > is justified.


    You are 100% correct. I should test it against a std::map. BTW, the reason
    why I use O(8) is because there are 4 access to the translator table, and 4
    access to the 4D table in the command map regardless of how many items are
    in the "container".




    > Note that in real life your solution has little chances. In a real
    > program nobody wants to have a component which cannot be multithreaded
    > and takes 1000 times more memory than necessary, even if might work a bit
    > faster than the alternatives.


    AFAICT, I do think it can be multi-threaded ___IF___ all the commands are
    inserted ___BEFORE___ any threads are created. In fact, well, I hope I don't
    make a massive retarded fool out of my self here, but I think you could add
    commands to the map while threads are reading it by using an atomic swap.
    The data-structure is static such that it does not grow or shrink, and the
    memory cells are the size of a function pointer. So, if the function pointer
    is the size of a void* pointer, and I have atomic swap that can operate on
    pointers, then an update to the map could look like this:


    fp_command cmap::add(char const* name, fp_command cmd)
    {
    fp_command& cell = prv_find(name);

    return atomic_swap(&cell, cmd);
    }


    This would overwrite commands that already exist and return the previous
    value. If this is unacceptable, you could might be able to use atomic
    compare and swap:

    bool cmap::add_cas(char const* name, fp_command cmp, fp_command cmd)
    {
    fp_command& cell = prv_find(name);

    return atomic_cas(&cell, cmp, cmd);
    }



    I have no idea if it will actually work the way I think it should work or
    not! I am most likely missing something important.

    YIKES!




    > It seems you have misunderstood the goal of
    > professional programming.


    Oh crap!

    ;^(...




    > It is usually not about producing the fastest
    > code possible - it is all about finishing the project in the schedule,
    > with code having acceptable correctness, speed, size, ease-of-usage,
    > etc., for the end user, plus acceptable in-house maintainability.


    I will definitely keep those wise words in mind. Thanks Paavo for all your
    valuable input, I really do appreciate it!


    :^)
     
    James, Nov 12, 2009
    #11
  12. "James Kanze" <> wrote in message
    news:...
    > On Nov 11, 11:12 pm, Paavo Helde <> wrote:
    >> "James" <> wrote innews:hde75o$rt5$:

    >
    > [...]
    >> > #define DEPTH 27

    >
    >> Why 27? (instead of 26)

    >
    > For the final '\0' when the string is less than 4 characters.
    > He could save a little memory by making the first dimension 26.
    > If he implemented it as a true trie, he would probably save a
    > lot more, since a lot of the sub-tries would probably be empty;
    > a dynamically constructed trie would definitely be a legitimate
    > answer to the problem, and would probably be rather fast. In
    > which case, something like:
    >
    > struct TrieNode
    > {
    > void* action;
    > TrieNode* next[16];
    > };
    >
    > would do the trick.


    I am seriously thinking about using the following tree algorithm:

    http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1
     
    Chris M. Thomasson, Nov 12, 2009
    #12
  13. James

    James Guest

    "Chris M. Thomasson" <> wrote in message
    news:hdj9pt$hqi$...


    DAMN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


    My cousin is going to kill me. He let me borrow his laptop so I can complete
    this damn college course. It's loaded with some of his programming examples
    and code sketches. I think I will just use a different newsreader. BTW, any
    suggestions on a decent news reader? I don't want to send news out on the
    wrong Outlook account again.
     
    James, Nov 12, 2009
    #13
  14. "James" <> wrote in message news:hdj9lo$hmq$...
    [...]

    > AFAICT, I do think it can be multi-threaded ___IF___ all the commands are
    > inserted ___BEFORE___ any threads are created. In fact, well, I hope I
    > don't make a massive retarded fool out of my self here, but I think you
    > could add commands to the map while threads are reading it by using an
    > atomic swap. The data-structure is static such that it does not grow or
    > shrink, and the memory cells are the size of a function pointer. So, if
    > the function pointer is the size of a void* pointer, and I have atomic
    > swap that can operate on pointers, then an update to the map could look
    > like this:
    >
    >
    > fp_command cmap::add(char const* name, fp_command cmd)
    > {
    > fp_command& cell = prv_find(name);
    >
    > return atomic_swap(&cell, cmd);
    > }
    >
    >
    > This would overwrite commands that already exist and return the previous
    > value. If this is unacceptable, you could might be able to use atomic
    > compare and swap:
    >
    > bool cmap::add_cas(char const* name, fp_command cmp, fp_command cmd)
    > {
    > fp_command& cell = prv_find(name);
    >
    > return atomic_cas(&cell, cmp, cmd);
    > }
    >
    >
    >
    > I have no idea if it will actually work the way I think it should work or
    > not! I am most likely missing something important.
    >
    > YIKES!


    James, you should focus on your class instead of multi-threading. However,
    to my amazement, memory visibility aside, you're idea would actually work!

    Wow!
     
    Chris M. Thomasson, Nov 12, 2009
    #14
  15. "James" <> wrote in message news:hdja33$i70$...
    > "Chris M. Thomasson" <> wrote in message
    > news:hdj9pt$hqi$...
    >
    >
    > DAMN!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    >
    >
    > My cousin is going to kill me. He let me borrow his laptop so I can
    > complete this damn college course. It's loaded with some of his
    > programming examples and code sketches. I think I will just use a
    > different newsreader. BTW, any suggestions on a decent news reader? I
    > don't want to send news out on the wrong Outlook account again.



    I want my damn laptop back!!! I coming over to you're house to thrash you.


    LOL, just kidding James.

    ;^D



    You can go ahead and delete my Outlook account if you want. BTW, I use the
    PAN newsreader on Linux and it works fairly well.
     
    Chris M. Thomasson, Nov 12, 2009
    #15
  16. "James" <> wrote in message news:hdeeq7$5ui$...
    [...]
    > OOPS! The last two posts were using the wrong account. Sorry Chris! I am
    > Chris' cousin and am learning C++ at the local community college.


    You better get an A+!

    ;^)
     
    Chris M. Thomasson, Nov 12, 2009
    #16
  17. James

    James Guest

    "James Kanze" <> wrote in message
    news:...
    > On Nov 12, 5:01 pm, "James" <> wrote:
    >> "Paavo Helde" <> wrote in message

    [...]
    >
    >> typedef unsigned int uint32_type;

    >
    >> typedef char assert_s
    >> [
    >> sizeof(uint32_type) == sizeof(char) * 4
    >> && sizeof(uint32_type) * CHAR_BIT == 32
    >> ? 1 : -1
    >> ];

    >
    >> typedef char cmd_buffer[4];

    >
    >> union cmd
    >> {
    >> cmd_buffer buffer;
    >> uint32_type whole;
    >> };

    >
    >> then decoding 4 chars at once like:

    >
    >> union cmd c;

    >
    >> char const* const string = "command";

    >
    >> c.whole = *((*uint32_type)string);

    >
    >> ?

    >
    > More likely, he was thinking of something like:
    >
    > uint_fast32_t i = ((c[0] & 0xFF) << 24)
    > | ((c[1] & 0xFF) << 16)
    > | ((c[2] & 0xFF) << 8)
    > | ((c[3] & 0xFF) );
    >
    > This is guaranteed to work on any machine.


    Yeah. I don't quite know why I used a union.




    >> > (This should work for EBCDIC as well as far as both your
    >> > source code and the initial data for initializing the map
    >> > are in the same codepage.) Thus the mapping is simplified to
    >> > integer-to-value. One could hold these in a sorted array and
    >> > perform a binary search (or even a linear search in an
    >> > unsorted array if the number of entries is small, placing
    >> > the more frequent commands in the beginning).

    >
    >> Please correct me if I am way off base here but I think the
    >> complexity for the simple and naive algorithm I posted should
    >> be O(8) complexity for insert and lookup operations.

    >
    > There is no such thing as O(8). The complexity is O(1).


    Yikes! I was errounsly counting each diminsion as a seperate access:

    return m_table.m_fp
    [g_translator[name[0]]]
    [g_translator[name[1]]]
    [g_translator[name[2]]]
    [g_translator[name[3]]];

    Humm... Okay. I count 4 separate accesses into the g_translator table. I
    also count a single access into the m_table.m_fp table. Why would that not
    be O(5)? I guess I need to break out of that line of thinking.


    Thanks James.




    > But
    > the constant overhead is high due to the lack of locality. It's
    > a relatively poor space/speed trade-off, given that on modern
    > machines, excess space results in a noticeable slow-down.
    > Better solutions exist.
    >
    >> This tells me that storing a large amount of items in the
    >> array will have no impact on the complexity whatsoever.

    >
    > Array access is O(1), regardless of the size of the array. But
    > that's purely a theoretical figure. In practice, if the array
    > is very sparsely populated, the program will run slower,
    > compared to other algorithms.
    >
    >> I cannot exactly claim that for a binary search whose
    >> complexity parameters can be effected by N.

    >
    > Binary search is O(lg n). For small n (less than a couple of
    > hundred), the difference is not significant. (In the tests I've
    > done, binary search on a sorted vector or std::map beats hash
    > coding up to somewhere between 150 and 200 entries. Even though
    > hash coding is O(1) and the binary search or std::map are O(ln
    > n).


    have you tries using std::map as buckets in a static hash map?


    struct my_map
    {
    std::map<...> m_buckets[DEPTH];
    };

    ?




    > Big O complexity is only significant when the number of
    > elements becomes large. (How large depends on the difference in
    > the big O factor. The difference between constant and quadratic
    > can become significant very quickly. The difference between
    > constant an logrithmic does't become significant until you
    > really do have a lot of entries.


    I am going to populate the table with very many of entries (e.g., hundreds
    of thousands).




    >> > Below are just some stylistic comments...

    >
    >> [...]

    >
    >> >> int main()
    >> >> {
    >> >> {

    >
    >> > Why extra braces?

    >
    >> This is probably going to sound retarded, but I personally
    >> like to encapsulate everything in main within an extra scope
    >> in order to ensure that all objects are destructed before I
    >> actually return from main.

    >
    > That's interesting. Generally (in actual applications---not
    > necessarily is homework type projects), main will only be a
    > couple of function calls, maybe one to check the command line
    > arguments, and one to do the actual work; in the case of a Unix
    > like filter, you might put the loop over the filenames in main,
    > but that's about it.
    >
    > Rather than extra braces, I'd just put the processing itself in
    > a separate function. (But this is perhaps more a production
    > code consideration. If you don't have command line options or
    > arguments, configuration files, etc., it's probably overkill.)


    Do you think it's a bad habit of mine to add the extra braces? A separate
    function would most certainly work very well.




    >> Imagine I wanted to print a final message that occurs after
    >> all messages from the object dtors are printed:

    >
    > You can't, because you don't know the order the destructors of
    > static objects will be called:).


    Touché!

    :^)




    > [...]
    >> >> static cmap<void, char const*> cm_normal(command_unknown);
    >> >> static cmap<void, char const*> cm_reverse(command_unknown);

    >
    >> > Why static?

    >
    >> They blow the stack on a Windows platform.

    >
    > You can change the default stack size at link time, if that's
    > all that's bothering you.
    >
    >> BTW, I don't really see a need for dynamic memory allocation
    >> here.

    >
    > For large tables, it doesn't hurt anything. It costs the same
    > to allocate int[5] and int[5000000]. In the first case, that
    > cost has to be amortised over accessing 5 elements, which means
    > it may be significant. In the second, it's amortised over
    > accessing 5000000 elements, which means that it's practically
    > nothing.


    I clearly need to change my line of thinking wrt this issue. When I polish
    this up I will post the code again so you call all see if I have learned
    anything...

    ;^)




    >> [...]

    >
    >> >> BTW, can you think of a faster way to perform the lookup
    >> >> process???

    >
    >> > Given that the number of used entries in the map is 5 as in
    >> > your example, it is quite certain that anything which uses a
    >> > compact map (of size about 20 bytes) will be faster than
    >> > your solution (using more than 500000 bytes) (of course
    >> > assuming there is any kind of memory caching present in the
    >> > system).

    >
    >> I fully intend to insert many command strings with random
    >> names assigned to random command functions. I have to turn
    >> this in on Monday. IMVVVHO, the sparse 256-ary array based
    >> tree should work fine for the large data-set.

    >
    > You mean a 4*27 array, don't you. 4*256 probably won't fit into
    > memory.


    I made a stupid mistake! I was thinking of the translator array which is 256
    elements wide.


    Humm... I do still think of the 4*27 array as being similar to a 4-level
    27-are trie. Am I off my rocker James?!

    ;^/




    > If you're looking for the simplest solution, then linear search
    > is fine. If the number of entries because too large, you can
    > either sort the array and use binary search, or switch to a trie
    > or a hash table. (I'd probably use a trie, but I've a certain
    > experience with this sort of code, and it probably wouldn't take
    > me more than an hour to implement it. If you've never done
    > anything similar before, it could easily take a day or two, once
    > you've fully understood the algorithm.)


    I created this tree with this homework problem in mind:

    http://groups.google.com/group/comp.programming/browse_frm/thread/b6394b5bb7c59ac1

    Do you think I should use it instead of the 4D array?




    >> > But as this is homework, the correctness of the solution
    >> > should be what matters, and the speed should not be an issue
    >> > at all (as long as the demo completes before the professor
    >> > gets bored ;-).

    >>
    >> LOL!

    >
    >> The only reason why I am doing this is so I can get some extra
    >> credit. I am interested in getting a good grade, and plan on
    >> becoming at least a fairly decent programmer. Perhaps I can
    >> even make some $$$ in the future.

    >
    > If making money is your goal, the commercial side generally pays
    > a lot better:). (Bill Gate's original Basic used a linear
    > search in the name table. It was, in fact, recommended to use
    > the most frequently used variables at the beginning of the
    > program, so they would be at the front of the table.)


    :^)






    BTW, thank you for all of your expert advise James! I have been lurking
    around this news groups for a while and you're input is invaluable, IMVHO of
    course.

    ;^)
     
    James, Nov 12, 2009
    #17
  18. James

    James Guest

    "James" <> wrote in message news:hdjcg3$kq9$...
    > "James Kanze" <> wrote in message

    [...]
    >>> Please correct me if I am way off base here but I think the
    >>> complexity for the simple and naive algorithm I posted should
    >>> be O(8) complexity for insert and lookup operations.

    >>
    >> There is no such thing as O(8). The complexity is O(1).

    >
    > Yikes! I was errounsly counting each diminsion as a seperate access:
    >
    > return m_table.m_fp
    > [g_translator[name[0]]]
    > [g_translator[name[1]]]
    > [g_translator[name[2]]]
    > [g_translator[name[3]]];
    >
    > Humm... Okay. I count 4 separate accesses into the g_translator table.



    That could be rewritten as:


    size_t d1 = g_translator[name[0]];
    size_t d2 = g_translator[name[1]];
    size_t d3 = g_translator[name[2]];
    size_t d4 = g_translator[name[3]];


    return m_table.m_fp[d1][d2][d3][d4];





    Actually, there are 4 accesses to the name array:



    char c1 = name[0];
    char c2 = name[1];
    char c3 = name[2];
    char c4 = name[3];

    size_t d1 = g_translator[c1];
    size_t d2 = g_translator[c2];
    size_t d3 = g_translator[c3];
    size_t d4 = g_translator[c4];

    return m_table.m_fp[d1][d2][d3][d4];




    Can I legitimately claim that the code above is O(1)?



    Please help me clear my ignorance!

    ;^(




    [...]
     
    James, Nov 12, 2009
    #18
  19. "Chris M. Thomasson" <> wrote in message
    news:M%9Lm.24129$...
    > "James" <> wrote in message news:hdj9lo$hmq$...

    [...]
    >> I have no idea if it will actually work the way I think it should work or
    >> not! I am most likely missing something important.
    >>
    >> YIKES!

    >
    > James, you should focus on your class instead of multi-threading. However,
    > to my amazement, memory visibility aside, you're idea would actually work!
    >
    > Wow!


    Just to clarify, you will need to use acquire/release semantics if there is
    any data that is dependant on the insertion. For instance, if thread `A'
    created datum `D' and inserted a command to the map that gave other threads
    access to `D', then you would need to ensure that D is rendered into a fully
    visible and coherent state before it becomes visible to threads other than
    `A'.
     
    Chris M. Thomasson, Nov 12, 2009
    #19
  20. "James" <> wrote in message news:hdjcg3$kq9$...
    > "James Kanze" <> wrote in message
    > news:...

    [...]
    >> Rather than extra braces, I'd just put the processing itself in
    >> a separate function. (But this is perhaps more a production
    >> code consideration. If you don't have command line options or
    >> arguments, configuration files, etc., it's probably overkill.)

    >
    > Do you think it's a bad habit of mine to add the extra braces? A separate
    > function would most certainly work very well.


    I sometimes do the exact same thing. So, no I don't think it's a bad
    habit...

    lol

    [...]
     
    Chris M. Thomasson, Nov 12, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris S.

    Efficient String Lookup?

    Chris S., Oct 16, 2004, in forum: Python
    Replies:
    21
    Views:
    4,287
    Win Carus
    Oct 18, 2004
  2. David Pratt

    Efficient lookup in list of dictionaries

    David Pratt, Dec 5, 2005, in forum: Python
    Replies:
    2
    Views:
    311
    bruno at modulix
    Dec 5, 2005
  3. Timasmith
    Replies:
    1
    Views:
    2,036
    Daniel Pitts
    Nov 6, 2006
  4. Replies:
    1
    Views:
    315
    Dietmar Kuehl
    Mar 3, 2006
  5. www
    Replies:
    51
    Views:
    1,519
Loading...

Share This Page