Time complexity of size() for std::set

L

Lionel B

Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

Thanks,
 
P

peter koch

Hi,

Anyone know if the Standard has anything to say about the time complexity
of size() for std::set?

I need to access a set's size (/not/ to know if it is empty!) heavily
during an algorithm and was thus wondering whether I'd be better off
tracking the size "manually" or whether that'd be pointless.

That would be pointless. size() is O(1).

/Peter
 
K

Kai-Uwe Bux

peter said:
That would be pointless. size() is O(1).

Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<> as a part of
some trade-off with other operation. However, this reason would not apply
to std::set<> as far as I can see.


Best

Kai-Uwe Bux
 
S

Sylvester Hesp

peter koch said:
That would be pointless. size() is O(1).

/Peter

Actually, size() is preferred to be O(1) by the standard but it's not
guaranteed. So an implementation could very well choose to make it O(n) (see
23.1/5: "[size()] should have constant complexity". It says "should", not
"shall").

- Sylvester
 
L

Lionel B

Nit: the standard says "should" have constant time. Implementations may take
license to do worse. I know that some do this for std::list<> as a part of
some trade-off with other operation.

I was aware of that, hence my reluctance to use size() for std::set.
However, this reason would not apply to std::set<> as far as I can see.

Ok, I guess the only option is to try it and see...

Thanks,
 
P

Pete Becker

Sylvester said:
peter koch said:
That would be pointless. size() is O(1).

/Peter

Actually, size() is preferred to be O(1) by the standard but it's not
guaranteed. So an implementation could very well choose to make it O(n) (see
23.1/5: "[size()] should have constant complexity". It says "should", not
"shall").

That's correct. On the other hand, for associative containers, an
implementation that didn't implement size() as O(1) would be doing so
simply out of perversity. The underlying issue is list::splice, where
you can insert a range of arbitrary size from some other list. The
implementation can either count the elements being inserted, making
splice O(n), or it can make size() O(n). With an associative container,
inserting more than one element is inherently O(n), since each element
has to be put into the right place. So there's no reason to make size()
slow.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
H

Howard Hinnant

"Lionel B said:
I was aware of that, hence my reluctance to use size() for std::set.


Ok, I guess the only option is to try it and see...

Lionel,

I agree that this is your only option. I also believe this represents a
serious defect in the C++ standard. I am asking your permission to use
this thread as the basis for a defect report against the current working
draft. Please see:

http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-active
..html#632

for an unofficial preview. Please feel free to contact me, publicly or
privately. Your options are:

1. Don't contact me at all. I will interpret this as refusing
permission to use your words and name in the defect report, and I will
remove them prior to submitting this document to the committee.

2. Tell me to remove your name/words from the issue and I will do so at
once.

3. Give me permission, but rewrite some or all of the text of the
issue. What is said here is completely up to you. You may propose a
resolution or not. Note that the committee may not accept your proposal
though.

4. Give me permission to go with what I have unchanged.

(#3 is my actual preference :))

Respectfully,
Howard Hinnant
Library Working Group Chair
C++ Standards Committee
(contact info in the header of this post, and at the top of the linked
webpage is correct)
 
P

peter koch

That would be pointless. size() is O(1).

/Peter

I prefer to answer my own post rather than reply to all the other
knowledgeable people who have corrected me. So far as I see it,
"should" is a very strong suggestion that requires quite a lot to
disregard. For std::list this reason exists in splice, that in special
situations can be made signifiantly faster. This "excuse" does not
exist for std::set (and std::map), which in my opinion means that you
will not be able to find a quality standard library where size is not
O(1).
I agree that this might be considered a defect as suggested by Howard
Hinnant.

/Peter
 
P

P.J. Plauger

I prefer to answer my own post rather than reply to all the other
knowledgeable people who have corrected me. So far as I see it,
"should" is a very strong suggestion that requires quite a lot to
disregard. For std::list this reason exists in splice, that in special
situations can be made signifiantly faster. This "excuse" does not
exist for std::set (and std::map), which in my opinion means that you
will not be able to find a quality standard library where size is not
O(1).

I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain. Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.

I remember being told years ago that the purpose of OOP was to give
the member functions exclusive access to the stored state so that
nothing bad can ever happen to that state. If we're not going to
preserve that property, why bother with encapsulation?
I agree that this might be considered a defect as suggested by Howard
Hinnant.

Yup.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
K

Kai-Uwe Bux

P.J. Plauger said:
I'd go a step farther. The one case in list::splice that goes faster
if size is O(N) is one that *should not* go faster. It lets you splice
a segment from another list whose head follows its tail -- thus snagging
the head node (in all implementations I know) and destroying the
integrity of both lists in the bargain.

Huh? That was a little too condensed for me; I got lost at the "whose head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?
Better to walk the candidate
splice looking for a head node and counting the elements to be moved
between lists. That way size() can remain O(1) and the list takes the
time necessary to preserve its integrity.

I remember being told years ago that the purpose of OOP was to give
the member functions exclusive access to the stored state so that
nothing bad can ever happen to that state. If we're not going to
preserve that property, why bother with encapsulation?

[snip]


Best

Kai-Uwe Bux
 
P

P.J. Plauger

P.J. Plauger wrote:
.....

Huh? That was a little too condensed for me; I got lost at the "whose head
follows its tail" clause. Could you give an example in code that
demonstrates the problem?

std::list<wchar_t> c1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_t> c2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
L

Lionel B

Lionel,

I agree that this is your only option. I also believe this represents a
serious defect in the C++ standard. I am asking your permission to use
this thread as the basis for a defect report against the current working
draft. Please see:

http://home.twcny.rr.com/hinnant/cpp_extensions/issues_preview/lwg-active
.html#632

for an unofficial preview. Please feel free to contact me, publicly or
privately. Your options are:

As far as I am concerned this conversation is already in the public domain
and you may do with it as you please - short, obviously, of misrepresenting
my views.

In particular, note that I wasn't actually criticising the standard, merely
seeking clarification. I have followed the debate on time complexity of
container size() functions - in this and previous threads - with interest,
but really don't have much to contribute to it.

If I have any recommendation to the C++ Standards Committee it is that
implementations must (not "should"!) document clearly[1], where known, the
time complexity of *all* container access operations.

[1] In my case (gcc 4.1.1) I can't swear that the time complexity of size()
for std::set is not documented... but if it is it's certainly well hidden
away.

[...]

Best regards,
 
K

Kai-Uwe Bux

P.J. Plauger said:
std::list<wchar_t> c1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_t> c2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

Thanks a lot. I see the problem now.

However, following your suggestion, I thaught "of all the time you've saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if it
was implemented to be linear time. I would even contend that for some
applications a list is the container of choice precisely _because_ you safe
time not checking for this problem.

I am not yet convinced that this would be better. Maybe sorting is the only
reasonable application for this splice and that is taken care of by a
member function. However, it seems that making splice O(n) and size O(1)
would reduce the advantages of std::list<> to the other sequence containers
to the validity of iterators.

I think I would prefer to have an implementation that puts range checks into
debug versions and leaves them out with -DNDEBUG. This way, I can check my
programs for correctness and still get optimum performance once I convinced
myself that the code is sound.


Best

Kai-Uwe Bux
 
P

P.J. Plauger

Thanks a lot. I see the problem now.

However, following your suggestion, I thaught "of all the time you've
saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if
it
was implemented to be linear time. I would even contend that for some
applications a list is the container of choice precisely _because_ you
safe
time not checking for this problem.

Gee, my version of list merge-sort splices the *entire* contents
of c2 into c1 each time. And that overload of splice does *not*
have to do the check I describe. You can splice:

-- all of another list into this one

-- one element of this list somewhere else

-- one element of another list into this one

-- a subrange of this list somewhere else

-- a subrange of another list into this one

It's only the last case, where the subrange is neither empty
nor the full list, that you must walk it to count up elements
and check for a head node.
I am not yet convinced that this would be better. Maybe sorting is the
only
reasonable application for this splice and that is taken care of by a
member function. However, it seems that making splice O(n) and size O(1)
would reduce the advantages of std::list<> to the other sequence
containers
to the validity of iterators.

I'm not talking about *always* making splice O(N) -- just this
one case. (And the C++ Standard allows that case already,
oddly enough.) Most cases are and should remain O(1).
I think I would prefer to have an implementation that puts range checks
into
debug versions and leaves them out with -DNDEBUG. This way, I can check my
programs for correctness and still get optimum performance once I
convinced
myself that the code is sound.

If only the failure mode weren't so awful.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
H

Howard Hinnant

Kai-Uwe Bux said:
However, following your suggestion, I thaught "of all the time you've saved
not checking for this problem". And I think: it's a considerable amount. I
hit my code and found that the merge-sort implementation for list makes
heavy use of this version of splice and would be severely slowed down if it
was implemented to be linear time.

<nod> I've implemented list::sort using merge-sort and "splice some from
other" as well. And I also preferred an O(1) list::size. And I also
knew that I could not afford an O(N) splice-some-from-other in the
middle of my merge sort.

I solved it by implementing a new splice-some-from-other signature that
is guaranteed to be O(1) even when size() is also O(1):

void splice(iterator position, list& x, iterator first,
iterator last, size_type n);

Precondition: distance(first, last) == n
Complexity: O(1)

In the merge sort, if you know the size of the list (in O(1) time), then
you also know the size of the half of the list that you're wanting to
splice into an auxiliary list. So you can pass that information (n)
into the splice function and it then does not need to compute it. A
debug build can easily verify the precondition.

After coding merge sort this way I started looking around for other
use-cases of splice-some-from-other where distance(first, last) is *not*
known a-priori, or easily computed without changing the complexity of
the overall algorithm. I didn't find any. I'm not claiming such
applications don't exist. But I am claiming that they appear rare.

During this search I also looked for use-cases which assumed an O(1)
list::size. I found lots.

http://home.twcny.rr.com/hinnant/cpp_extensions/On_list_size.html

-Howard
 
J

Jorgen Grahn

std::list<wchar_t> c1;
c1.push_back(L'a');
c1.push_back(L'b');
c1.push_back(L'c');

std::list<wchar_t> c2(c1);
std::list<wchar_t>::iterator it = c2.begin();
c1.splice(c1.begin(), c2, ++it, c2.begin());

This is an ill-formed interval in c2, but if you just go ahead
and splice it without checking you'll add {b, c, HEAD2, a} from
c2 to the front of {a b c HEAD1} in c1. Among other bad things,
the next call to c2.begin() will probably point somewhere inside
c1.

But think of all the time you've saved not checking for this
problem.

Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?

/Jorgen
 
P

P.J. Plauger

Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

Where did that come from? I said nothing like that.
What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?

Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.

You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but not
all. And even if an implementation wanted to do this sort of checking all
the time, it can't. Time complexity requirements in the C++ Standard often
forbid it. So the C++ Standard partly institutionalizes the wild 'n wooly
approach to programming.

Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
A

Alf P. Steinbach

* P.J. Plauger:
Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

Ah, well. Essentially there are then, in practice, two kinds of
implementations of std::list, namely fast size, and fast (unsafe) splice.

But with a given standard library implementation there's just one.

Wouldn't it be better to have both, accessible to the user, who could
then make the decision to use one or the other or both, with an
implementation-defined typedef of std::list for default? Was this
considered? And if it was, considering that the two could share most of
their implementations, why was it rejected?
 
A

Alf P. Steinbach

* Alf P. Steinbach:
* P.J. Plauger:

Ah, well. Essentially there are then, in practice, two kinds of
implementations of std::list, namely fast size, and fast (unsafe) splice.

But with a given standard library implementation there's just one.

Wouldn't it be better to have both, accessible to the user, who could
then make the decision to use one or the other or both, with an
implementation-defined typedef of std::list for default?

Ouch, I wrote that without recalling that no template typedef at the
time, or still, for that matter, but one could have used e.g.

template< typename A >
struct list { typedef fast_size_list<A> type; };

(with all the template parameters of std::list of course), or, for the
same convenience as we have now with std::list, exact same usage,

template< typename A >
 
K

Kai-Uwe Bux

P.J. Plauger said:
Sorry, I don't get this. Do you propose that we should avoid much of the
standard library?

Where did that come from? I said nothing like that.
What is different here compared to std::copy and all other functions that
take two iterators and expect [first, last) to be valid? Am I missing
something?

Yes and no. One philosophy of C++, inherited from C, is that you should
be given enough rope to hang yourself, and speedily at that. Another,
inherited from OOP, is that a class has the right and obligation to
protect itself from misuse.

The second philosophy may have been documented in design papers for C++; if
so, however, it does not show in the standard. The standard defines C++ as
a set of mechanism that one can use to implement all sorts of misschief.
C++ does in no way enforce or even promote OOP; designing an OO-approved
class with guarded invariants is not even simpler in C++ than designing an
OO-nightmare.

To propose an amendment of the standard based on a philosophical inclination
that so far has not shown in the standard seems a little bit of a stretch.

You can indeed call copy with a bad pair of iterators and write all over
memory. Extra iterator debugging logic can catch some such calls, but not
all. And even if an implementation wanted to do this sort of checking all
the time, it can't. Time complexity requirements in the C++ Standard often
forbid it.

Often? With regard to range checking I wonder: if every iterator carries a
pointer to its container one can check the validity of a range in linear
time (and constant time for random access iterators). Any algorithm that
takes a range allows for linear time on non-random access iterators anyway.

Could you give an example where a checking implementation is ruled out by
the complexity requirements of the standard?

So the C++ Standard partly institutionalizes the wild 'n wooly
approach to programming.

True! Now, opinions may vary as to whether that is a good thing or not.

By and large, I get by with putting a safety layer between my code and the
language: e.g., a pointer_to<T> template that wraps raw pointers and allows
for checking 0-dereferencing and tracing allocations and deallocations, or
a safe<int> type that checks for arithmetic overflows. I am not sure
whether I want the standard to change in this regard. After all, if you
want safety, you can have it. It just comes at a price.

Template class list has just one member function, which splices in part
of another list, that *requires* linear time to do its work safely.
Some of us have seen to it that the C++ Standard *permits* this member
function to take that time if the implementer so chooses. It so happens
that this is the only member function across all containers that *can*
go faster by taking advantage of another latitude in the C++ Standard --
to have list::size() take linear time instead of constant. Put simply,
slow size() is there to permit fast partial splice (which is still not
guaranteed).

So here is at least one case in the C++ Standard where:

a) people don't like the prospect that calling size() for a container
might take linear time

b) we could remove that requirement from the C++ Standard if all
implementations did their counting at partial-splice time

c) while we're counting the nodes moved during a partial splice, we
can even enforce the integrity goals of OOP

That would be the only place in the standard then. Maybe you are onto
something in setting a precedence.

What we give up is the *permission* that implementers currently
have to let people tie granny knots in lists, to be discovered at
a later date, so that one admittedly rare form of splice can do
its damage really fast. I consider this case at least philosophically
different from calling copy with bad arguments. YMMV.

I am by and large happy with the undefined behavior solution that the
current standard employs, although I would love to have a debug-version of
the standard library that defines all those unsound ranges and out of
bounds accesses to violate some assert().

What bothers me though, is the latitude within the complexity requirements.
I'd rather have the standard declare unambiguously that size() is constant
time. That would at least be predictable. (Similarly, I think the standard
could require sort() to be O(n log(n)) and nth_element() to be linear.)


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top