Re: Why is the return type of count_if() "signed" rather than"unsigned"?

Discussion in 'C++' started by xmllmx, Jun 22, 2010.

  1. xmllmx

    xmllmx Guest

    On Jun 22, 4:04 pm, "Bo Persson" <> wrote:
    > xmllmx wrote:
    > > As we know, count_if() will never return a negative number. So, it
    > > seems evident that the return type of count_if() should be "unsigned
    > > integral type" rather than "signed integral type".

    >
    > > However, to my surprise, the C++ standard should define the return
    > > type is "signed integer type", which causes a lot of conceptual
    > > confusions and annoying compiling warnings such as "signed/unsigned
    > > mismatch".

    >
    > > What's the rationale for the C++ standard committee to do so ?

    >
    > > Thanks in advance!

    >
    > The distance between two iterators is signed, because in the general
    > case it could be negative. Here it can not, but count_if still uses
    > the difference_type, to be consistent with other algorithms using
    > iterators.
    >
    > Bo Persson


    Thank you for your quick response.

    Could you give me a convincing example to illustrate that it is
    necessary to do so?

    Below is excerpted from the early MSDN. Hope this helps.

    ======================================
    count_if
    template<class InIt, class Pred, class Dist>
    size_t count_if(InIt first, InIt last,
    Pred pr);
    The template function sets a count n to zero. It then executes ++n for
    each N in the range [0, last - first) for which the predicate
    pr(*(first + N)) is true. It evaluates the predicate exactly last -
    first times.

    In this implementation, if a translator does not support partial
    specialization of templates, the return type is size_t instead of
    iterator_traits<InIt>::distance_type.
    xmllmx, Jun 22, 2010
    #1
    1. Advertising

  2. xmllmx

    Daniel Pitts Guest

    Re: Why is the return type of count_if() "signed" rather than "unsigned"?

    On 6/22/2010 4:05 PM, Paavo Helde wrote:
    > Daniel Pitts<> wrote in
    > news:Xs7Un.104230$:
    >
    >> On 6/22/2010 6:30 AM, Bo Persson wrote:
    >>> xmllmx wrote:
    >>>> On Jun 22, 7:32 pm, "Bo Persson"<> wrote:
    >>>>> xmllmx wrote:
    >>>>>> On Jun 22, 4:04 pm, "Bo Persson"<> wrote:
    >>>>>>> xmllmx wrote:
    >>>>>>>> As we know, count_if() will never return a negative number. So,
    >>>>>>>> it seems evident that the return type of count_if() should be
    >>>>>>>> "unsigned integral type" rather than "signed integral type".
    >>>>>
    >>>>>>>> However, to my surprise, the C++ standard should define the
    >>>>>>>> return type is "signed integer type", which causes a lot of
    >>>>>>>> conceptual confusions and annoying compiling warnings such as
    >>>>>>>> "signed/unsigned mismatch".
    >>>>>
    >>>>>>>> What's the rationale for the C++ standard committee to do so ?
    >>>>>
    >>>>>>>> Thanks in advance!
    >>>>>
    >>>>>>> The distance between two iterators is signed, because in the
    >>>>>>> general case it could be negative. Here it can not, but count_if
    >>>>>>> still uses the difference_type, to be consistent with other
    >>>>>>> algorithms using iterators.
    >>>>>
    >>>>>>> Bo Persson
    >>>>>
    >>>>>> Thank you for your quick response.
    >>>>>
    >>>>>> Could you give me a convincing example to illustrate that it is
    >>>>>> necessary to do so?
    >>>>>
    >>>>> You can look at std::distance instead, whihc computes the distance
    >>>>> between two iterators. If you have random access iterators, like
    >>>>> pointers or std::vector::iterator, this can result in a negative
    >>>>> value. Therefore the difference_type is signed.
    >>>>>
    >>>>>
    >>>>
    >>>> Thank you very much.
    >>>>
    >>>> I want to know WHY rather than WHAT. Much to my surprise, WHY does
    >>>> not the C++ standard define the return type as "unsigned integral
    >>>> type"?
    >>>>
    >>>> distance() may return negative number, it is imaginable.
    >>>>
    >>>> However, I cannot imagine count() returns a negative number. So, it
    >>>> is rather counterintuitive that its return type is "signed".
    >>>
    >>> Ok. :)
    >>>
    >>> The reason is that some of the functions taking a pair of iterators
    >>> can return signed values, so to be consistent all of them do. I
    >>> believe it is that simple.

    >> That is not a good reason IMO.
    >>
    >> "Some guns can backfire, so to be consistent all of them do."
    >>
    >> Consistency is one consideration for API design, but should not be the
    >> overriding reason. Correctness should always come first. Ease of use
    >> is very important. Consistency is only important if it is sensible.

    >
    > We are back to the signed-unsigned fight. Correctness always comes first,
    > right.
    >
    > Here, the correctness would be potentially jeopardized only if one has an
    > array of 1-byte objects, which is larger than half of the adressable
    > space (in flat addressing mode). This is a very special case, most
    > probably encountered in some OS kernel code, and it would be hard to
    > imagine why kernel code should want to count all of the address space
    > bytes with count_if(). So I guess the consistency here overweights the
    > concerns of potential incorrectness.

    Actually, you could also have a container who's iterators iterate over a
    non-memory resource (such as a socket stream or file). Just because you
    can't load everything into memory at once doesn't mean you can't iterate
    over it, or have a container abstraction around it. So it isn't such a
    special case after all.
    >
    > What concerns of compiler warnings of using the result with size_t and
    > friends this is the fault of size_t that it is unsigned. This is an old
    > war^B^B^B discussion, and I only know of people who have run from the
    > unsigned camp to the signed camp (me included). YMMV, of course.

    While I comprehend your argument, and I was almost fooled by it briefly,
    I think that it is incorrect for never-negative values to be stored in a
    signed container when an unsigned container of appropriate size is
    available.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Jun 23, 2010
    #2
    1. Advertising

  3. xmllmx

    xmllmx Guest

    On Jun 23, 7:59 am, Daniel Pitts
    <> wrote:
    > On 6/22/2010 4:05 PM, Paavo Helde wrote:
    >
    >
    >
    > > Daniel Pitts<>  wrote in
    > >news:Xs7Un.104230$:

    >
    > >> On 6/22/2010 6:30 AM, Bo Persson wrote:
    > >>> xmllmx wrote:
    > >>>> On Jun 22, 7:32 pm, "Bo Persson"<>   wrote:
    > >>>>> xmllmx wrote:
    > >>>>>> On Jun 22, 4:04 pm, "Bo Persson"<>   wrote:
    > >>>>>>> xmllmx wrote:
    > >>>>>>>> As we know, count_if() will never return a negative number. So,
    > >>>>>>>> it seems evident that the return type of count_if() should be
    > >>>>>>>> "unsigned integral type" rather than "signed integral type".

    >
    > >>>>>>>> However, to my surprise, the C++ standard should define the
    > >>>>>>>> return type is "signed integer type", which causes a lot of
    > >>>>>>>> conceptual confusions and annoying compiling warnings such as
    > >>>>>>>> "signed/unsigned mismatch".

    >
    > >>>>>>>> What's the rationale for the C++ standard committee to do so ?

    >
    > >>>>>>>> Thanks in advance!

    >
    > >>>>>>> The distance between two iterators is signed, because in the
    > >>>>>>> general case it could be negative. Here it can not, but count_if
    > >>>>>>> still uses the difference_type, to be consistent with other
    > >>>>>>> algorithms using iterators.

    >
    > >>>>>>> Bo Persson

    >
    > >>>>>> Thank you for your quick response.

    >
    > >>>>>> Could you give me a convincing example to illustrate that it is
    > >>>>>> necessary to do so?

    >
    > >>>>> You can look at std::distance instead, whihc computes the distance
    > >>>>> between two iterators. If you have random access iterators, like
    > >>>>> pointers or std::vector::iterator, this can result in a negative
    > >>>>> value. Therefore the difference_type is signed.

    >
    > >>>> Thank you very much.

    >
    > >>>> I want to know WHY rather than WHAT. Much to my surprise, WHY does
    > >>>> not the C++ standard define the return type as "unsigned integral
    > >>>> type"?

    >
    > >>>> distance() may return negative number, it is imaginable.

    >
    > >>>> However, I cannot imagine count() returns a negative number. So, it
    > >>>> is rather counterintuitive that its return type is "signed".

    >
    > >>> Ok.  :)

    >
    > >>> The reason is that some of the functions taking a pair of iterators
    > >>> can return signed values, so to be consistent all of them do. I
    > >>> believe it is that simple.
    > >> That is not a good reason IMO.

    >
    > >> "Some guns can backfire, so to be consistent all of them do."

    >
    > >> Consistency is one consideration for API design, but should not be the
    > >> overriding reason.  Correctness should always come first.  Ease of use
    > >> is very important. Consistency is only important if it is sensible.

    >
    > > We are back to the signed-unsigned fight. Correctness always comes first,
    > > right.

    >
    > > Here, the correctness would be potentially jeopardized only if one has an
    > > array of 1-byte objects, which is larger than half of the adressable
    > > space (in flat addressing mode). This is a very special case, most
    > > probably encountered in some OS kernel code, and it would be hard to
    > > imagine why kernel code should want to count all of the address space
    > > bytes with count_if(). So I guess the consistency here overweights the
    > > concerns of potential incorrectness.

    >
    > Actually, you could also have a container who's iterators iterate over a
    > non-memory resource (such as a socket stream or file).  Just because you
    > can't load everything into memory at once doesn't mean you can't iterate
    > over it, or have a container abstraction around it.  So it isn't such a
    > special case after all.
    >
    > > What concerns of compiler warnings of using the result with size_t and
    > > friends this is the fault of size_t that it is unsigned. This is an old
    > > war^B^B^B discussion, and I only know of people who have run from the
    > > unsigned camp to the signed camp (me included). YMMV, of course.

    >
    > While I comprehend your argument, and I was almost fooled by it briefly,
    > I think that it is incorrect for never-negative values to be stored in a
    > signed container when an unsigned container of appropriate size is
    > available.
    >
    > --
    > Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>- Hide quoted text -
    >
    > - Show quoted text -


    Many thanks to all C++ fans in this topic.

    Let me stress the key question: "What's the fundamental reason to
    define it as "signed"?"

    Hope someone can give us a convincing explanation. Code illustration
    will be highly appreciated. Thanks.
    xmllmx, Jun 23, 2010
    #3
  4. Re: Why is the return type of count_if() "signed" rather than "unsigned"?

    * xmllmx, on 23.06.2010 03:10:
    >
    > Let me stress the key question: "What's the fundamental reason to
    > define it as "signed"?"
    >
    > Hope someone can give us a convincing explanation. Code illustration
    > will be highly appreciated. Thanks.


    I can give you a 20-20 hindsight rationale: that iterator_traits was designed to
    be minimal in order to reduce complexity, and so that it includes the most
    widely usable 'difference_type' but not a limited usage 'size_type'.

    That is however not the full story.

    Originally, before the STL was adopted into the C++ standard library, count_if
    was designed a little differently. Instead of the modern C++ version ...


    template <class InputIterator, class Predicate>
    iterator_traits<InputIterator>::difference_type
    count_if(InputIterator first, InputIterator last, Predicate pred);


    .... you had[1] ...


    template <class InputIterator, class Predicate, class Size>
    void
    count_if(InputIterator first, InputIterator last,
    Predicate pred,
    Size& n);


    This was a bit akward because client code had to declare a variable to store the
    result in. But it avoided using iterator traits. A traits class requires partial
    specialization in order to deal with pointers in general (or else it would have
    to be fully specialized for every pointer type!), and partial specialization
    wasn't there before standardization.

    And given the 1998 change-over to a result producing function using a traits
    class, I suspect that the reason that I gave at the top applied.

    Note that since a count is likely to be used in arithmetic and to be compared to
    computed values, a signed type may be likely to yield fewest signed/unsigned
    problems, but that may depend on the coding conventions. To mostly avoid
    signed/unsigned problems (they can't be completely avoided, but one can reduce
    their incident rate significantly), see my blog posting at <url:
    http://alfps.wordpress.com/2010/05/10/how-to-avoid-disastrous-integer-wrap-around/>.
    With that approach you'll find the count_if design, and the general
    iterator_traits design, to be very very sensible and practical.


    Cheers & hth.,

    - Alf


    Notes:
    [1] See e.g. <url: http://www.sgi.com/tech/stl/count_if.html>, but note that
    that site documents a particular implementation of the STL and not (necessarily)
    the C++ standard library.

    --
    blog at <url: http://alfps.wordpress.com>
    Alf P. Steinbach /Usenet, Jun 23, 2010
    #4
  5. xmllmx

    xmllmx Guest

    On Jun 23, 11:34 am, "Alf P. Steinbach /Usenet" <alf.p.steinbach
    > wrote:
    > * xmllmx, on 23.06.2010 03:10:
    >
    >
    >
    > > Let me stress the key question: "What's the fundamental reason to
    > > define it as "signed"?"

    >
    > > Hope someone can give us a convincing explanation. Code illustration
    > > will be highly appreciated. Thanks.

    >
    > I can give you a 20-20 hindsight rationale: that iterator_traits was designed to
    > be minimal in order to reduce complexity, and so that it includes the most
    > widely usable 'difference_type' but not a limited usage 'size_type'.
    >
    > That is however not the full story.
    >
    > Originally, before the STL was adopted into the C++ standard library, count_if
    > was designed a little differently. Instead of the modern C++ version ...
    >
    >    template <class InputIterator, class Predicate>
    >        iterator_traits<InputIterator>::difference_type
    >    count_if(InputIterator first, InputIterator last, Predicate pred);
    >
    > ... you had[1] ...
    >
    >    template <class InputIterator, class Predicate, class Size>
    >        void
    >    count_if(InputIterator first, InputIterator last,
    >             Predicate pred,
    >             Size& n);
    >
    > This was a bit akward because client code had to declare a variable to store the
    > result in. But it avoided using iterator traits. A traits class requires partial
    > specialization in order to deal with pointers in general (or else it would have
    > to be fully specialized for every pointer type!), and partial specialization
    > wasn't there before standardization.
    >
    > And given the 1998 change-over to a result producing function using a traits
    > class, I suspect that the reason that I gave at the top applied.
    >
    > Note that since a count is likely to be used in arithmetic and to be compared to
    > computed values, a signed type may be likely to yield fewest signed/unsigned
    > problems, but that may depend on the coding conventions. To mostly avoid
    > signed/unsigned problems (they can't be completely avoided, but one can reduce
    > their incident rate significantly), see my blog posting at <url:http://alfps.wordpress.com/2010/05/10/how-to-avoid-disastrous-integer...>.
    > With that approach you'll find the count_if design, and the general
    > iterator_traits design, to be very very sensible and practical.
    >
    > Cheers & hth.,
    >
    > - Alf
    >
    > Notes:
    > [1] See e.g. <url:http://www.sgi.com/tech/stl/count_if.html>, but note that
    > that site documents a particular implementation of the STL and not (necessarily)
    > the C++ standard library.
    >
    > --
    > blog at <url:http://alfps.wordpress.com>


    Tons of thanks are due Alf!

    Your explanation are convincing and profound!

    I'm clear now. Thank you again!
    xmllmx, Jun 23, 2010
    #5
  6. xmllmx

    Bo Persson Guest

    Re: Why is the return type of count_if() "signed" rather than "unsigned"?

    Daniel Pitts wrote:
    > On 6/22/2010 4:05 PM, Paavo Helde wrote:
    >>
    >> Here, the correctness would be potentially jeopardized only if one
    >> has an array of 1-byte objects, which is larger than half of the
    >> adressable space (in flat addressing mode). This is a very special
    >> case, most probably encountered in some OS kernel code, and it
    >> would be hard to imagine why kernel code should want to count all
    >> of the address space bytes with count_if(). So I guess the
    >> consistency here overweights the concerns of potential
    >> incorrectness.

    > Actually, you could also have a container who's iterators iterate
    > over a non-memory resource (such as a socket stream or file). Just
    > because you can't load everything into memory at once doesn't mean
    > you can't iterate over it, or have a container abstraction around
    > it. So it isn't such a special case after all.


    On a 32 bit system an unsigned value lets you count the bytes in a 4
    GB file, instead of a 2 GB file. What if you need to count the bytes
    of a 5GB file?

    One extra bit only gives you twice the range, which is not much of a
    difference - otherwise we could have had 33 bit computers. So it is a
    special case!

    >> What concerns of compiler warnings of using the result with size_t
    >> and friends this is the fault of size_t that it is unsigned. This
    >> is an old war^B^B^B discussion, and I only know of people who have
    >> run from the unsigned camp to the signed camp (me included). YMMV,
    >> of course.

    > While I comprehend your argument, and I was almost fooled by it
    > briefly, I think that it is incorrect for never-negative values to
    > be stored in a signed container when an unsigned container of
    > appropriate size is available.


    This is where it gets religious. I'll pass.


    Bo Persson
    Bo Persson, Jun 23, 2010
    #6
  7. xmllmx

    gwowen Guest

    > Tons of thanks are due Alf!
    >
    > Your explanation are convincing and profound!
    >
    > I'm clear now. Thank you again!


    Remember, you can always do:

    namespace xmllmx {
    size_t template<class InputIterator, class Predicate> inline
    ucount_if(InputIterator First, InputIterator Last,Predicate P)
    {
    return (size_t) std::count_if(First,Last,P);
    }
    }
    using xmllmx::ucount_if;

    (ObGuruQuestion: Could name lookup get fouled up by this?)
    If so, you could always use a macro...
    gwowen, Jun 23, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,762
    Smokey Grindel
    Dec 2, 2006
  2. , India
    Replies:
    7
    Views:
    314
    kwikius
    Apr 28, 2008
  3. FE
    Replies:
    6
    Views:
    502
    Balog Pal
    Aug 4, 2009
  4. xmllmx
    Replies:
    2
    Views:
    375
    Saeed Amrollahi
    Jun 22, 2010
  5. Bo Persson
    Replies:
    3
    Views:
    376
    Daniel Pitts
    Jun 22, 2010
Loading...

Share This Page