how to do a "REAL" string compare?

Discussion in 'C++' started by Nobody, Mar 12, 2005.

  1. Nobody

    Nobody Guest

    Heres the deal... I have an application where I have a list (as in a Windows
    list control, but thats not important) displayed to the user. I sort this
    list based on the list controls sort function (again, its not important that
    its Windows) which ends up calling a compare function in my code:

    int CompareFunc(char* str1, char* str2)
    {
    }

    this function returns -1, 0 or 1 which gets passed on to the internal quick
    sort algorithm. No problem, it all works fine.

    Now I have a user in which this list displays "multi-part" items. You can
    guess where this is headed :), the list ends up like this:

    Item (1/100)
    Item (11/100)
    Item (2/100)

    Now while that is a "correct" string sort, its kind of lame. I could force
    the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
    either putting requirements on the user or changing his item text. I'd much
    rather end up with:

    Item (1/100)
    Item (2/100)
    ..
    ..
    ..
    Item (11/100)

    As it should. Now keep in mind that this could end up in dozens of formats,
    brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
    characters a user might enter. Even the forward slash may not be the part
    separator and there may be stuff after the part #s.

    I've seen some applications do this in the past, but never saw the source
    for them. How can this be sorted properly without requiring the user to
    enter it in a very specific format? I could never handle every possible
    format in my code. There must be some kind of cool generic way to do this.
    Nobody, Mar 12, 2005
    #1
    1. Advertising

  2. Nobody wrote:

    > Heres the deal... I have an application where I have a list (as in a Windows
    > list control, but thats not important) displayed to the user. I sort this
    > list based on the list controls sort function (again, its not important that
    > its Windows) which ends up calling a compare function in my code:
    >
    > int CompareFunc(char* str1, char* str2)
    > {
    > }
    >
    > this function returns -1, 0 or 1 which gets passed on to the internal quick
    > sort algorithm. No problem, it all works fine.
    >
    > Now I have a user in which this list displays "multi-part" items. You can
    > guess where this is headed :), the list ends up like this:
    >
    > Item (1/100)
    > Item (11/100)
    > Item (2/100)
    >
    > Now while that is a "correct" string sort, its kind of lame. I could force
    > the user to zero-pad or zero-pad myself, but both seem kind of hokey as I am
    > either putting requirements on the user or changing his item text. I'd much
    > rather end up with:
    >
    > Item (1/100)
    > Item (2/100)
    > .
    > .
    > .
    > Item (11/100)
    >
    > As it should. Now keep in mind that this could end up in dozens of formats,
    > brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
    > characters a user might enter. Even the forward slash may not be the part
    > separator and there may be stuff after the part #s.
    >
    > I've seen some applications do this in the past, but never saw the source
    > for them. How can this be sorted properly without requiring the user to
    > enter it in a very specific format? I could never handle every possible
    > format in my code. There must be some kind of cool generic way to do this.


    Prefixing single digits with 0 is a fix (I assume this is what you mean
    by "zero-pad").

    Also prefixing with space does the same job:


    #include <string>
    #include <algorithm>
    #include <iostream>
    #include <list>

    int main()
    {
    using namespace std;

    list<string> somelist;

    somelist.push_back("Item ( 2/100");
    somelist.push_back("Item ( 1/100)");
    somelist.push_back("Item (11/100)");

    somelist.sort();

    for(list<string>::const_iterator p= somelist.begin();
    p!=somelist.end(); ++p)
    cout<<*p<<"\n";
    }


    C:\c>temp
    Item ( 1/100)
    Item ( 2/100
    Item (11/100)

    C:\c>


    In summary just make sure all strings share the same length.


    I am interested in a cleaner approach myself too.



    --
    Ioannis Vranos

    http://www23.brinkster.com/noicys
    Ioannis Vranos, Mar 12, 2005
    #2
    1. Advertising

  3. Nobody

    Nobody Guest

    "Ioannis Vranos" <> wrote in message
    news:1110659832.315731@athnrd02...
    > Nobody wrote:
    >
    >> Heres the deal... I have an application where I have a list (as in a
    >> Windows list control, but thats not important) displayed to the user. I
    >> sort this list based on the list controls sort function (again, its not
    >> important that its Windows) which ends up calling a compare function in
    >> my code:
    >>
    >> int CompareFunc(char* str1, char* str2)
    >> {
    >> }
    >>
    >> this function returns -1, 0 or 1 which gets passed on to the internal
    >> quick sort algorithm. No problem, it all works fine.
    >>
    >> Now I have a user in which this list displays "multi-part" items. You can
    >> guess where this is headed :), the list ends up like this:
    >>
    >> Item (1/100)
    >> Item (11/100)
    >> Item (2/100)
    >>
    >> Now while that is a "correct" string sort, its kind of lame. I could
    >> force the user to zero-pad or zero-pad myself, but both seem kind of
    >> hokey as I am either putting requirements on the user or changing his
    >> item text. I'd much rather end up with:
    >>
    >> Item (1/100)
    >> Item (2/100)
    >> .
    >> .
    >> .
    >> Item (11/100)
    >>
    >> As it should. Now keep in mind that this could end up in dozens of
    >> formats, brackets, parents, dashes, asterisks, etc or any endless supply
    >> of cutesy characters a user might enter. Even the forward slash may not
    >> be the part separator and there may be stuff after the part #s.
    >>
    >> I've seen some applications do this in the past, but never saw the source
    >> for them. How can this be sorted properly without requiring the user to
    >> enter it in a very specific format? I could never handle every possible
    >> format in my code. There must be some kind of cool generic way to do
    >> this.

    >
    > Prefixing single digits with 0 is a fix (I assume this is what you mean by
    > "zero-pad").
    >
    > Also prefixing with space does the same job:
    >
    >
    > #include <string>
    > #include <algorithm>
    > #include <iostream>
    > #include <list>
    >
    > int main()
    > {
    > using namespace std;
    >
    > list<string> somelist;
    >
    > somelist.push_back("Item ( 2/100");
    > somelist.push_back("Item ( 1/100)");
    > somelist.push_back("Item (11/100)");
    >
    > somelist.sort();
    >
    > for(list<string>::const_iterator p= somelist.begin();
    > p!=somelist.end(); ++p)
    > cout<<*p<<"\n";
    > }
    >
    >
    > C:\c>temp
    > Item ( 1/100)
    > Item ( 2/100
    > Item (11/100)
    >
    > C:\c>
    >
    >
    > In summary just make sure all strings share the same length.
    >
    >
    > I am interested in a cleaner approach myself too.


    Well, as I said, I don't want to change the text the user entered. Space
    padding is just as bad as zero padding :). Just finding the part number in
    the string could be tough since you can have:

    Item #1 of 100
    Item 1/100
    Item (1/100)
    Item 1-100
    Item 1 / 100
    Item 1/ 100
    Item *1/100*
    Item <1/100>
    Item [1/100]

    etc.

    You could never handle every case... and what if someone wants to be cute
    and do something like:

    Item [1/100] -= by Fred =-

    or

    Item Part # 1-100 [1/100]
    Item Part # 1-100 [2/100]

    etc.

    Some users would follow format requirements, but I guess a huge majority
    wouldn't. And a lot of cutesy users would rather put the cutesy crap on the
    description then have it properly sorted.
    Nobody, Mar 12, 2005
    #3
  4. Nobody wrote:

    > Now I have a user in which this list displays "multi-part" items.
    > ... the list ends up like this:
    >
    > Item (1/100)
    > Item (11/100)
    > Item (2/100)
    >
    > Now while that is a "correct" string sort, its kind of lame. I could
    > force the user to zero-pad or zero-pad myself, but both seem kind of
    > hokey as I am either putting requirements on the user or changing his
    > item text. I'd much rather end up with:
    >
    > Item (1/100)
    > Item (2/100)
    > .
    > .
    > .
    > Item (11/100)
    >
    > As it should.


    > source for them. How can this be sorted properly without requiring
    > the user to enter it in a very specific format? I could never handle
    > every possible format in my code. There must be some kind of cool
    > generic way to do this.


    There are infinitely many criteria for sorting strings. There's no way to answer
    your question without getting more of an idea of the range of possible inputs,
    what they represent, and why you think various algorithms are lame.

    One thing you might do is count any non-alphanumeric character as a separator,
    and then lexicographically sort the resulting sequences with indivual components
    ordered numrically. OTOH, maybe this, too, is lame.

    Jonathan
    Jonathan Turkanis, Mar 12, 2005
    #4
  5. "Nobody" <> wrote in message
    news:gOHYd.18095$KK5.2916@fed1read03...
    > Heres the deal... I have an application where I have a list (as in a
    > Windows list control, but thats not important) displayed to the user. I
    > sort this list based on the list controls sort function (again, its not
    > important that its Windows) which ends up calling a compare function in my
    > code:
    >
    > int CompareFunc(char* str1, char* str2)
    > {
    > }
    >
    > this function returns -1, 0 or 1 which gets passed on to the internal
    > quick sort algorithm. No problem, it all works fine.

    ....
    > I'd much rather end up with:
    >
    > Item (1/100)
    > Item (2/100)
    > .
    > Item (11/100)

    ....
    > I've seen some applications do this in the past, but never saw the source
    > for them. How can this be sorted properly without requiring the user to
    > enter it in a very specific format? I could never handle every possible
    > format in my code. There must be some kind of cool generic way to do this.


    What about something like:

    int CompareFunc(char const* s1, char const* s2)
    {
    for(;;) {
    unsigned char c1 = *s1++;
    unsigned char c2 = *s2++;
    if( isdigit(c1) && isdigit(c2) ) {
    unsigned long v1 = strtoul(s1-1,&s1,10);
    unsigned long v2 = strtoul(s2-1,&s2,10);
    if(v1!=v2) return (v1<v2) ? -1 : +1;
    continue;
    }
    else if(c1!=c2) {
    ...compare single chars as usual...
    }
    else if( !c1 )
    return 0; // reached end of strings
    }
    }



    I hope this helps,
    Ivan
    Ivan Vecerina, Mar 12, 2005
    #5
  6. Nobody

    Nobody Guest


    > What about something like:
    >
    > int CompareFunc(char const* s1, char const* s2)
    > {
    > for(;;) {
    > unsigned char c1 = *s1++;
    > unsigned char c2 = *s2++;
    > if( isdigit(c1) && isdigit(c2) ) {
    > unsigned long v1 = strtoul(s1-1,&s1,10);
    > unsigned long v2 = strtoul(s2-1,&s2,10);
    > if(v1!=v2) return (v1<v2) ? -1 : +1;
    > continue;
    > }
    > else if(c1!=c2) {
    > ...compare single chars as usual...
    > }
    > else if( !c1 )
    > return 0; // reached end of strings
    > }
    > }
    >


    This would only work if *any* number was assumed to be the part #. If the
    description contained any other number, this would fail.
    Nobody, Mar 12, 2005
    #6
  7. Nobody

    Nobody Guest

    "Jonathan Turkanis" <> wrote in message
    news:...
    > Nobody wrote:
    >
    >> Now I have a user in which this list displays "multi-part" items.
    >> ... the list ends up like this:
    >>
    >> Item (1/100)
    >> Item (11/100)
    >> Item (2/100)
    >>
    >> Now while that is a "correct" string sort, its kind of lame. I could
    >> force the user to zero-pad or zero-pad myself, but both seem kind of
    >> hokey as I am either putting requirements on the user or changing his
    >> item text. I'd much rather end up with:
    >>
    >> Item (1/100)
    >> Item (2/100)
    >> .
    >> .
    >> .
    >> Item (11/100)
    >>
    >> As it should.

    >
    >> source for them. How can this be sorted properly without requiring
    >> the user to enter it in a very specific format? I could never handle
    >> every possible format in my code. There must be some kind of cool
    >> generic way to do this.

    >
    > There are infinitely many criteria for sorting strings. There's no way to
    > answer
    > your question without getting more of an idea of the range of possible
    > inputs,
    > what they represent, and why you think various algorithms are lame.
    >
    > One thing you might do is count any non-alphanumeric character as a
    > separator,
    > and then lexicographically sort the resulting sequences with indivual
    > components
    > ordered numrically. OTOH, maybe this, too, is lame.
    >
    > Jonathan
    >
    >


    LOL, nah, all I said was zero-padding or space-padding is lame because we
    dont want to change the original string or force the user to do anything
    special (like zero pad the numbers themselves). The description text
    represents a file or multiple files that a user uploads. Since these files
    can get large, they are often split up into multiple parts.

    I guess we could code this for the few most common forms and then add 'em as
    we see 'em.
    Nobody, Mar 12, 2005
    #7
  8. Nobody wrote:

    > LOL, nah, all I said was zero-padding or space-padding is lame because we
    > dont want to change the original string or force the user to do anything
    > special (like zero pad the numbers themselves). The description text
    > represents a file or multiple files that a user uploads. Since these files
    > can get large, they are often split up into multiple parts.
    >
    > I guess we could code this for the few most common forms and then add 'em as
    > we see 'em.



    At first I observed this:

    #include <string>
    #include <iostream>
    #include <list>

    int main()
    {
    using namespace std;

    list<string> somelist;

    somelist.push_back("Item ( 2/100)");
    somelist.push_back("Item ( 1/100)");
    somelist.push_back("Item (11/100)");


    for(list<string>::const_iterator p= somelist.begin();
    p!=somelist.end(); ++p)
    {
    int sum=0;

    for(string::size_type i=0; i<p->size(); ++i)
    sum+= p->operator[](i);

    cout<<sum<<"\n";
    }
    }


    C:\c>temp
    786
    785
    802

    C:\c>


    The question is how does it scale.


    Anyway let's try to break this into steps. At first, I think such a
    problem would arise to strings whose prefix substring is the same.


    So the first step would be to check any string up to the first digit (if
    existent), and then check the rest strings if they begin in the same way.

    Regular expressions can help write minimal code for this, at first you
    would check if there is a match for the regular expression "\\d+"
    (digit) and then find where in the string a first occurrence of it
    exists, and then use the substring up to that digit to check if some
    other of the rest strings begin the same way (let's suppose we have the
    string ""Item ( 2/100)", we then use the regular expression "^Item (.*"
    with ^ denoting the beginning of the string - and $ the end -).


    Then if we found a match, we would check the rest digits.



    --
    Ioannis Vranos

    http://www23.brinkster.com/noicys
    Ioannis Vranos, Mar 12, 2005
    #8
  9. Nobody

    Guest

    What about if you looked for the last two numbers in the string (or
    just the second to last really) and sorted on that?

    The code would be similar to Ivan's, but going backwards in the string.
    (I'm gonna assume you can write the actual code, so I won't.)
    , Mar 13, 2005
    #9
  10. "Nobody" <> wrote in message
    news:RCJYd.18970$KK5.12712@fed1read03...
    >>
    >> int CompareFunc(char const* s1, char const* s2)
    >> {
    >> for(;;) {
    >> unsigned char c1 = *s1++;
    >> unsigned char c2 = *s2++;
    >> if( isdigit(c1) && isdigit(c2) ) {
    >> unsigned long v1 = strtoul(s1-1,&s1,10);
    >> unsigned long v2 = strtoul(s2-1,&s2,10);
    >> if(v1!=v2) return (v1<v2) ? -1 : +1;
    >> continue;
    >> }
    >> else if(c1!=c2) {
    >> ...compare single chars as usual...
    >> }
    >> else if( !c1 )
    >> return 0; // reached end of strings
    >> }
    >> }
    >>

    >
    > This would only work if *any* number was assumed to be the part #. If the
    > description contained any other number, this would fail.


    It depends what is considered as success.
    Did I miss something, or is this a new requirement that wasn't in
    your initial spec? I don't know, and from what I read in your posts,
    I wouldn't be able to say what is a part number and what isn't!

    If you want some numbers to be sorted by value, and others
    lexicographically,
    you obviously need to give the comparison function some knowledge about your
    field's formatting.
    Just add a criterion to conditionally enable the value-based comparison.

    For example, if you only want the second-last number in the string
    to be specially treated as a (part) number, you could pre-scan each
    string (backwards) to identify the start of that number - and do the
    value-based comparison only when you reach that point.

    Ivan
    --
    http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
    Ivan Vecerina, Mar 13, 2005
    #10
  11. Nobody

    Chris Theis Guest

    "Nobody" <> schrieb im Newsbeitrag
    news:RCJYd.18970$KK5.12712@fed1read03...
    >
    > > What about something like:
    > >
    > > int CompareFunc(char const* s1, char const* s2)
    > > {
    > > for(;;) {
    > > unsigned char c1 = *s1++;
    > > unsigned char c2 = *s2++;
    > > if( isdigit(c1) && isdigit(c2) ) {
    > > unsigned long v1 = strtoul(s1-1,&s1,10);
    > > unsigned long v2 = strtoul(s2-1,&s2,10);
    > > if(v1!=v2) return (v1<v2) ? -1 : +1;
    > > continue;
    > > }
    > > else if(c1!=c2) {
    > > ...compare single chars as usual...
    > > }
    > > else if( !c1 )
    > > return 0; // reached end of strings
    > > }
    > > }
    > >

    >
    > This would only work if *any* number was assumed to be the part #. If the
    > description contained any other number, this would fail.
    >


    As you can easily see there is simply no way to create a sort that handles
    all kinds of situations without *any* knowledge. This is impossible by the
    nature of sorting, which implicitly requires at least some information ;-)
    What you could do for example, is to resort to an associative container with
    keys (like a map). The keys are filled with the minimum information that you
    try to extract from the values and sorting is based on the keys. This way
    you do not impose too many restrictive requirements on your clients but,
    naturally, the success depends very much on you "information gathering
    scheme".

    Cheers
    Chris
    Chris Theis, Mar 13, 2005
    #11
  12. "Nobody" <> wrote in message
    news:gOHYd.18095$KK5.2916@fed1read03...
    > Heres the deal... I have an application where I have a list (as in a

    Windows
    > list control, but thats not important) displayed to the user. I sort this
    > list based on the list controls sort function (again, its not important

    that
    > its Windows) which ends up calling a compare function in my code:
    >
    > int CompareFunc(char* str1, char* str2)
    > {
    > }
    >
    > this function returns -1, 0 or 1 which gets passed on to the internal

    quick
    > sort algorithm. No problem, it all works fine.


    It works fine, but I would expect it to work for 'char*' types, apparently
    they are C strings as you say.

    > Now I have a user in which this list displays "multi-part" items. You can
    > guess where this is headed :), the list ends up like this:
    >
    > Item (1/100)
    > Item (11/100)
    > Item (2/100)


    I think the problem here is treating the data as C strings. These are not
    strings but string representations of complex objects. I don't think you
    should compare them as strings; because as you note, it doesn't work for
    this type of data.

    > Now while that is a "correct" string sort, its kind of lame. I could force
    > the user to zero-pad or zero-pad myself, but both seem kind of hokey as I

    am
    > either putting requirements on the user or changing his item text. I'd

    much
    > rather end up with:
    >
    > Item (1/100)
    > Item (2/100)
    > .
    > .
    > .
    > Item (11/100)
    >
    > As it should. Now keep in mind that this could end up in dozens of

    formats,
    > brackets, parents, dashes, asterisks, etc or any endless supply of cutesy
    > characters a user might enter. Even the forward slash may not be the part
    > separator and there may be stuff after the part #s.


    It means that the data will be parsed differently in each case. (Boost's
    spirit or some other parser might help.)

    > I've seen some applications do this in the past, but never saw the source
    > for them. How can this be sorted properly without requiring the user to
    > enter it in a very specific format?


    The data can be sorted properly by first parsing the input, then creating
    the objects, and finaly comparing the objects.

    Ali
    =?utf-8?Q?Ali_=C3=87ehreli?=, Mar 15, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. senthil
    Replies:
    5
    Views:
    1,363
    senthil
    Jan 24, 2004
  2. Curious Trigger
    Replies:
    2
    Views:
    1,795
    Curious Trigger
    Sep 9, 2006
  3. Replies:
    10
    Views:
    776
  4. Replies:
    10
    Views:
    753
    Roland Pibinger
    Jan 24, 2007
  5. m
    Replies:
    2
    Views:
    398
    Shawn Milochik
    Aug 6, 2008
Loading...

Share This Page