More efficient strcmp

Discussion in 'C Programming' started by pembed2012, Mar 13, 2012.

  1. pembed2012

    pembed2012 Guest

    Dear friends

    I have created a more efficient strcmp function. However it only works on
    big endian architectures. I would like someone with same to run some
    timing tests to see how much is the speed up.

    Thank you

    int strcmp_fast(char* s,char* t){
    uls=(unsigned long*)s;
    ult=(unsigned long*)t;
    if(*uls>*ult) return(1);
    if(*uls<*ult) return(-1);
    else return(strcmp(s,t));
    }
    pembed2012, Mar 13, 2012
    #1
    1. Advertising

  2. pembed2012

    Kleuske Guest

    On Tue, 13 Mar 2012 21:59:41 +0000, pembed2012 saw fit to publish the
    following:

    > Dear friends
    >
    > I have created a more efficient strcmp function. However it only works
    > on big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    What if the strings are longer than an unsigned long and differ only in the
    last bit?

    Just asking...

    --
    Women are more easily and more deeply terrified ... generating more
    sheer horror than the male of the species.
    -- Spock, "Wolf in the Fold", stardate 3615.4
    Kleuske, Mar 13, 2012
    #2
    1. Advertising

  3. pembed2012

    James Kuyper Guest

    On 03/13/2012 05:59 PM, pembed2012 wrote:
    > Dear friends
    >
    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    char strings[0] = "a\0a\0b";

    Compare the results of strcmp(strings, strings+2) and
    strcmp_fast(strings, strings+2).

    Fast is nice; defined behavior is better; correct behavior is even better.
    James Kuyper, Mar 13, 2012
    #3
  4. pembed2012

    Stefan Ram Guest

    pembed2012 <> writes:
    >I have created a more efficient strcmp function.


    What did you compare its efficiency with?
    How did you measure that it is more efficient?
    Stefan Ram, Mar 13, 2012
    #4
  5. pembed2012

    BartC Guest

    "Kleuske" <> wrote in message
    news:jjogv1$iqp$...
    > On Tue, 13 Mar 2012 21:59:41 +0000, pembed2012 saw fit to publish the
    > following:


    >> I have created a more efficient strcmp function. However it only works
    >> on big endian architectures. I would like someone with same to run some
    >> timing tests to see how much is the speed up.


    >> int strcmp_fast(char* s,char* t){
    >> uls=(unsigned long*)s;
    >> ult=(unsigned long*)t;
    >> if(*uls>*ult) return(1);
    >> if(*uls<*ult) return(-1);
    >> else return(strcmp(s,t));
    >> }

    >
    > What if the strings are longer than an unsigned long and differ only in
    > the
    > last bit?


    Or shorter.

    --
    Bartc
    BartC, Mar 13, 2012
    #5
  6. On 13-Mar-12 16:59, pembed2012 wrote:
    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    You can make lots of things faster if you aren't worried about the code
    crashing or giving incorrect answers when called with valid inputs.

    However, that is a rather dubious definition of "more efficient".

    Note that any reasonable strcmp() implementation _does_ use the
    multi-byte comparison trick; however, unlike your code, they also ensure
    that the pointers are aligned, strings are not compared past their ends,
    etc. so that the results are _correct_, and that does slow things down a
    bit. Ditto for memcmp().

    S

    --
    Stephen Sprunk "God does not play dice." --Albert Einstein
    CCIE #3723 "God is an inveterate gambler, and He throws the
    K5SSS dice at every possible opportunity." --Stephen Hawking
    Stephen Sprunk, Mar 13, 2012
    #6
  7. pembed2012

    James Kuyper Guest

    On 03/13/2012 06:16 PM, Kleuske wrote:
    > On Tue, 13 Mar 2012 21:59:41 +0000, pembed2012 saw fit to publish the
    > following:
    >
    >> Dear friends
    >>
    >> I have created a more efficient strcmp function. However it only works
    >> on big endian architectures. I would like someone with same to run some
    >> timing tests to see how much is the speed up.
    >>
    >> Thank you
    >>
    >> int strcmp_fast(char* s,char* t){
    >> uls=(unsigned long*)s;
    >> ult=(unsigned long*)t;
    >> if(*uls>*ult) return(1);
    >> if(*uls<*ult) return(-1);
    >> else return(strcmp(s,t));
    >> }

    >
    > What if the strings are longer than an unsigned long and differ only in the
    > last bit?


    Then it ends up calling strcmp(s,t), and ends up actually being slower.
    That's not one of the several serious problems with this code:

    1. Either s or t might not have an alignment suitable for conversion to
    unsigned long*. Behavior: undefined.
    2. *uls or *ult each access sizeof(unsigned long) bytes; if the strings
    pointed at are shorter than that, those expressions might attempt to
    access memory that the program doesn't have permission to read.
    Behavior: undefined.
    3. The comparison compares the entire sizeof(unsigned long) bytes, even
    if that includes bytes past the end of the strings being compared.
    Behavior: incorrect
    4. In the unlikely case that unsigned long has padding bits, they won't
    participate in the comparison. Behavior: incorrect.
    James Kuyper, Mar 13, 2012
    #7
  8. pembed2012 <> writes:

    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.


    Without having done tests, aren't you jumping the gun by saying it's more
    efficient?

    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    The trouble is it doesn't work, regardless of the machine's byte
    ordering. There are two big issues. First, a pointer to a string won't
    always be correctly aligned for a larger access, and, second:

    char test[5] = {0, 0, 1};
    strcmp_fast(test, test+1);

    By the way, the Campaign to Remove All Spaces from Source would no
    approve of your half-hearted effort -- four whole spaces have survived
    the delete key!

    --
    Ben.
    Ben Bacarisse, Mar 13, 2012
    #8
  9. pembed2012

    jacob navia Guest

    Le 13/03/12 23:19, James Kuyper a écrit :
    >
    > char strings[0] = "a\0a\0b";
    >


    ?????

    An array of zero chars?
    And you assign a char * to it?

    Can't parse that, sorry.

    Maybe you wanted to write:

    char *strings = "a\0a\0b";
    jacob navia, Mar 13, 2012
    #9
  10. pembed2012

    Kaz Kylheku Guest

    On 2012-03-13, pembed2012 <> wrote:
    > Dear friends
    >
    > I have created a more efficient strcmp function.


    Gee, thanks a lot.

    Now, you wouldn't happen to have a more efficient paper towel for wiping coffee
    spit from a monitor? :)
    Kaz Kylheku, Mar 13, 2012
    #10
  11. pembed2012

    Kaz Kylheku Guest

    On 2012-03-13, Kleuske <> wrote:
    > On Tue, 13 Mar 2012 21:59:41 +0000, pembed2012 saw fit to publish the
    >> int strcmp_fast(char* s,char* t){
    >> uls=(unsigned long*)s;
    >> ult=(unsigned long*)t;
    >> if(*uls>*ult) return(1);
    >> if(*uls<*ult) return(-1);
    >> else return(strcmp(s,t));
    >> }

    >
    > What if the strings are longer than an unsigned long and differ only in the
    > last bit?


    More importantly, what if they are shorter?

    What if the compiler doesn't like the aliasing?

    Also, this fails on x86 and elsewhere, because strings, if treated as numbers this way, are
    big-endian.
    Kaz Kylheku, Mar 13, 2012
    #11
  12. pembed2012

    BartC Guest

    "pembed2012" <> wrote in message
    news:jjog0d$rcc$...

    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    That won't work well for various reasons already explained.

    Try inventing a fast strcmp_eq() function instead; usually I only care about
    equality, not whether one string is more or less than other.

    That also means that endianness doesn't come into it so much. And if the
    caller happens to have the string lengths already available, have a version
    that makes use of that (which can be really fast when the lengths are
    different).

    However you'll have a job beating a good built-in strcmp(), unless you do
    already have the lengths, to give you an edge.

    --
    Bartc
    BartC, Mar 13, 2012
    #12
  13. pembed2012

    Bl0ckeduser Guest

    pembed2012 wrote:
    > Dear friends
    >
    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    This slightly modified version also works on a little-endian machine,
    but I'm not sure if it's quite as fast as yours.

    int strcmp_recursive(char* s,char* t){
    if(!*s && *s==*t) return(0);
    if(*s>*t) return(1);
    if(*s<*t) return(-1);
    return(strcmp_recursive(s+1,t+1));
    }
    Bl0ckeduser, Mar 14, 2012
    #13
  14. pembed2012

    James Kuyper Guest

    On 03/14/2012 02:37 AM, Gareth Owen wrote:
    > James Kuyper <> writes:

    ....
    >> char strings[0] = "a\0a\0b";

    >
    > Typo:
    >
    > char strings[] = "a\0a\0b"; // mutable
    > const char *strings = "a\0a\0b"; // immutable


    Aiee! Where'd that first 0 come from? I don't remember typing it.
    I meant my example to be mutable, since, for some reason, strcmp_fast()
    takes char* arguments.
    --
    James Kuyper
    James Kuyper, Mar 14, 2012
    #14
  15. pembed2012

    Andre Guest

    On 13.03.2012 22:59, pembed2012 wrote:
    > Dear friends
    >
    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }


    A string (or at least a substring) may start at an odd address, and some
    hardware architectures will fail if you try to cast this to a long...
    Andre, Mar 14, 2012
    #15
  16. Kaz Kylheku <> wrote:

    > On 2012-03-13, pembed2012 <> wrote:
    > > Dear friends
    > >
    > > I have created a more efficient strcmp function.

    >
    > Gee, thanks a lot.
    >
    > Now, you wouldn't happen to have a more efficient paper towel for wiping coffee
    > spit from a monitor? :)


    if your monitor is spitting coffee may i suggest perhaps your monitor is
    actually a faulty coffee maker? in case it's not then microfiber rags work a
    treat for that, ask me how i know ;-)

    watching a dope get his arse handed to him by the pros here is one of the
    best things about this newsgroup. high in content and amusement all at once.
    Fritz Wuehler, Mar 14, 2012
    #16
  17. pembed2012

    Kleuske Guest

    On Tue, 13 Mar 2012 22:25:31 +0000, BartC saw fit to publish the
    following:

    > "Kleuske" <> wrote in message
    > news:jjogv1$iqp$...
    >> On Tue, 13 Mar 2012 21:59:41 +0000, pembed2012 saw fit to publish the
    >> following:

    >
    >>> I have created a more efficient strcmp function. However it only works
    >>> on big endian architectures. I would like someone with same to run
    >>> some timing tests to see how much is the speed up.

    >
    >>> int strcmp_fast(char* s,char* t){
    >>> uls=(unsigned long*)s;
    >>> ult=(unsigned long*)t;
    >>> if(*uls>*ult) return(1);
    >>> if(*uls<*ult) return(-1);
    >>> else return(strcmp(s,t));
    >>> }

    >>
    >> What if the strings are longer than an unsigned long and differ only in
    >> the
    >> last bit?

    >
    > Or shorter.


    Ouch!

    .... I should have seen that ...

    --
    Murder is contrary to the laws of man and God.
    -- M-5 Computer, "The Ultimate Computer", stardate 4731.3
    Kleuske, Mar 14, 2012
    #17
  18. pembed2012

    Tim Rentsch Guest

    Bl0ckeduser <> writes:

    > pembed2012 wrote:
    >> Dear friends
    >>
    >> I have created a more efficient strcmp function. However it only
    >> works on big endian architectures. I would like someone with same to
    >> run some timing tests to see how much is the speed up.
    >>
    >> Thank you
    >>
    >> int strcmp_fast(char* s,char* t){
    >> uls=(unsigned long*)s;
    >> ult=(unsigned long*)t;
    >> if(*uls>*ult) return(1);
    >> if(*uls<*ult) return(-1);
    >> else return(strcmp(s,t));
    >> }

    >
    > This slightly modified version also works on a little-endian machine,
    > but I'm not sure if it's quite as fast as yours.
    >
    > int strcmp_recursive(char* s,char* t){
    > if(!*s && *s==*t) return(0);
    > if(*s>*t) return(1);
    > if(*s<*t) return(-1);
    > return(strcmp_recursive(s+1,t+1));
    > }


    You have made an important semantic change (not counting the
    changes that allow termination and provide for returning 0
    if the strings are equal). Do you see what it is? Hint:
    this version of strcmp does not match the specification for
    strcmp() in the standard library (even ignoring the 'const'
    modifiers in strcmp()'s prototype).

    By the way, 'return' statements don't need parentheses
    around the return expression. The statements

    return 0;
    return strcmp_recursive( s+1, t+1 );

    will work just fine.
    Tim Rentsch, Mar 14, 2012
    #18
  19. pembed2012

    Bl0ckeduser Guest

    Tim Rentsch wrote:
    > You have made an important semantic change (not counting the
    > changes that allow termination and provide for returning 0
    > if the strings are equal). Do you see what it is? Hint:
    > this version of strcmp does not match the specification for
    > strcmp() in the standard library (even ignoring the 'const'
    > modifiers in strcmp()'s prototype).


    Hmm, I have to do the comparisons using the unsigned char type (or
    perhaps another unsigned type, like the OP), right ?

    > By the way, 'return' statements don't need parentheses
    > around the return expression. The statements
    >
    > return 0;
    > return strcmp_recursive( s+1, t+1 );
    >
    > will work just fine.


    By the way, thanks for reviewing my code :). As a hobbyist, it's always
    a thrill to talk with experts.
    Bl0ckeduser, Mar 14, 2012
    #19
  20. pembed2012

    vaysagekv Guest

    On 14/03/12 3:29 AM, pembed2012 wrote:
    > Dear friends
    >
    > I have created a more efficient strcmp function. However it only works on
    > big endian architectures. I would like someone with same to run some
    > timing tests to see how much is the speed up.
    >
    > Thank you
    >
    > int strcmp_fast(char* s,char* t){
    > uls=(unsigned long*)s;
    > ult=(unsigned long*)t;
    > if(*uls>*ult) return(1);
    > if(*uls<*ult) return(-1);
    > else return(strcmp(s,t));
    > }

    Do you mean
    int strcmp_fast(char* s,char* t){
    unsigned long* uls=(unsigned long*)s;
    unsigned long* ult=(unsigned long*)t;
    if(*uls>*ult) return(1);
    if(*uls<*ult) return(-1);
    else return(strcmp(s,t));
    }
    vaysagekv, Mar 15, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shane Peck

    strcmp problem

    Shane Peck, Sep 19, 2003, in forum: C++
    Replies:
    6
    Views:
    2,339
    Kevin Goodsell
    Sep 22, 2003
  2. muser

    strcmp

    muser, Oct 3, 2003, in forum: C++
    Replies:
    6
    Views:
    1,093
    Frank Schmitt
    Oct 9, 2003
  3. Andrej Hocevar

    please help with strcmp()

    Andrej Hocevar, Jul 19, 2003, in forum: C Programming
    Replies:
    3
    Views:
    319
    Gordon Burditt
    Jul 19, 2003
  4. Michael
    Replies:
    4
    Views:
    396
    Matt Hammond
    Jun 26, 2006
  5. Robert Klemme

    With a Ruby Yell: more, more more!

    Robert Klemme, Sep 28, 2005, in forum: Ruby
    Replies:
    5
    Views:
    202
    Jeff Wood
    Sep 29, 2005
Loading...

Share This Page