A solution for a fast std::list::size() *and* a fast std::list::splice()

J

Juha Nieminen

If I'm not completely mistaken, the only reason why std::list::size()
may be (and usually is) a linear-time operation is because they want
std::list::splice() to be a constant-time operation, and if you execute
the latter, the size of the resulting lists cannot be known without
explicitly counting the sizes of the new lists.

I was thinking: What if size() was an O(n) operation only *after*
a splice() operation has been performed (and only the first time size()
is called after that), but O(1) otherwise?

In other words, keep a size variable in the list class and update
it as necessary, and additionally keep a flag which tells if this
size variable is valid. As long as the flag tells that it's valid,
list::size() can return the value of the variable. Only if the flag
says it's invalid, then the O(n) counting will be performed, updating
the variable, after which the flag can be set to valid.

A splice() operation would set the flag to invalid in both lists.

This way splice() will always be O(1), and size() will be O(n) only
once after a splice(), but O(1) otherwise. You will get speed benefits
in all cases except if you alternatively call splice() and size()
repeatedly (in which case you just get the same behavior as in most
current list implementations, so it's not like the result would be
worse).
 
T

tony_in_da_uk

In other words, keep a size variable in the list class and update
it as necessary, and additionally keep a flag which tells if this
size variable is valid. As long as the flag tells that it's valid,
list::size() can return the value of the variable. Only if the flag
says it's invalid, then the O(n) counting will be performed, updating
the variable, after which the flag can be set to valid.

A splice() operation would set the flag to invalid in both lists.

All sensible and in my opinion useful, but there are people who're
already using lists who wouldn't want them to slow down, or grow their
data member size - even a little bit - to maintain this extra state.
You can easily provide this in a wrapper class around the STL, and
publish it if you think others would want it. Maybe even submit it to
boost....

Tony
 
J

Jerry Coffin

If I'm not completely mistaken, the only reason why std::list::size()
may be (and usually is) a linear-time operation is because they want
std::list::splice() to be a constant-time operation, and if you execute
the latter, the size of the resulting lists cannot be known without
explicitly counting the sizes of the new lists.

I was thinking: What if size() was an O(n) operation only *after*
a splice() operation has been performed (and only the first time size()
is called after that), but O(1) otherwise?

This is a perfectly legitimate way of implementing std::list. It's not
required, of course, but you can do it if you want to. Then again, you
can't count on it in general.
 
J

Juha Nieminen

All sensible and in my opinion useful, but there are people who're
already using lists who wouldn't want them to slow down

This solution would only speed it up. The alternative is to use
the current typical solution, which is an O(n) size() function.
or grow their data member size

The size of the list elements would not change at all. Only the
size of the std::list class would grow by one flag and one size
variable. Hardly catastrophical (especially taking into account
how much it speeds up size()).
You can easily provide this in a wrapper class around the STL

Can I? I would have to reimplement most of the list member functions.
Not a trivial task.
 
J

Juha Nieminen

Jerry said:
This is a perfectly legitimate way of implementing std::list.

If it's a perfectly working solution, why aren't they using it
to implement std::list? It sounds quite an obvious solution to me.
 
G

Guest

If it's a perfectly working solution, why aren't they using it
to implement std::list? It sounds quite an obvious solution to me.

First figure out who have not implemented list in that way (there are
many vendors of C++ standard libraries) and then ask them. My guess
would be that the value of optimisations not guaranteed in the standard
is debatable, the users could never depend on them.
 
M

Michael DOUBEZ

Juha Nieminen a écrit :
If it's a perfectly working solution, why aren't they using it
to implement std::list? It sounds quite an obvious solution to me.

Perhaps because nobody needs it and if you need it, it is
straightforward to add it.

From a design point of view, you rarely need to know the size of a list.

Michael
 
B

Bo Persson

Juha Nieminen wrote:
:: Jerry Coffin wrote:
::: This is a perfectly legitimate way of implementing std::list.
::
:: If it's a perfectly working solution, why aren't they using it
:: to implement std::list? It sounds quite an obvious solution to me.

But it still makes the size() call O(n). The fact that it sometimes
(often) is O(1) doesn't help me writing a function that uses
list::size().

How would I know if someone had called splice recently? So I would
still have to assume O(n) !



Bo Persson
 
P

P.J. Plauger

----- Original Message -----
From: "Juha Nieminen" <[email protected]>
Newsgroups: comp.lang.c++
Sent: Tuesday, October 09, 2007 20:50
Subject: A solution for a fast std::list::size() *and* a fast
std::list::splice()
If I'm not completely mistaken, the only reason why std::list::size()
may be (and usually is) a linear-time operation

It's a constant time operation in our implementation, and always
has been.
is
because they want
std::list::splice() to be a constant-time operation, and if you execute
the latter, the size of the resulting lists cannot be known without
explicitly counting the sizes of the new lists.

There are three forms of splice:

-- splice a single element from this or another list

-- splice the entire contents of another list

-- splice partial contents of another list

The resultant size of the donee list is well known, as is that of any
donating list, in all but the last case. The C++ Standard has
always required that all but the last case be constant time, while
the last one can be linear time.

Moreover, in all but the last case it's a trivial matter to preserve
the integrity of both donor and donee. In the last case the only
way to avoid pirating the head node of another list is to walk
the donated sublist looking for that head node. It's a trivial
matter to count up the nodes while you're doing this very useful
checking. Some of us think this checking is absolutely essential.
I was thinking: What if size() was an O(n) operation only *after*
a splice() operation has been performed (and only the first time size()
is called after that), but O(1) otherwise?

In other words, keep a size variable in the list class and update
it as necessary, and additionally keep a flag which tells if this
size variable is valid. As long as the flag tells that it's valid,
list::size() can return the value of the variable. Only if the flag
says it's invalid, then the O(n) counting will be performed, updating
the variable, after which the flag can be set to valid.

A splice() operation would set the flag to invalid in both lists.

This way splice() will always be O(1), and size() will be O(n) only
once after a splice(), but O(1) otherwise. You will get speed benefits
in all cases except if you alternatively call splice() and size()
repeatedly (in which case you just get the same behavior as in most
current list implementations, so it's not like the result would be
worse).

This discussion comes up about once a year, and this solution is
offered each time. I consider it mooted by the need to check a
partial splice anyway, which takes linear time. Other disagree,
so YMMV.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
J

Jerry Coffin

If it's a perfectly working solution, why aren't they using it
to implement std::list? It sounds quite an obvious solution to me.

Some of them are, at least when it's practical. It seems like this comes
up around every 10 months or so, and as I recall the last time it did
either P.J. Plaugar or Pete Becker (who was still working for Dinkumware
around then) pointed out that under most circumstances, Dinkumware's
implementation does execute in constant time. In fact, this post will
probably cross with another that says essentially the same thing again.

You still can't depend on it though, and I don't know how to make
list::size() execute in cosntant time without forcing list::splice
require linear time in at least some cases. Given the frequency with
which I've seen good uses for std::list::size() (i.e. essentially never)
I'm not sure that would be a good tradeoff.

Then again, I have to admit that I tend to think the presence of
std::list does more harm than good, since its presence seems to give
soem people the idea that they should use it, and over time I've become
convinced that suitable uses for linked lists are about as common as
magnetic monopoles (i.e. they should theoretically exist, but the few
times people think they might have seen one in reality, confirmation
seems impossible).
 
K

Kai-Uwe Bux

Jerry said:
Jerry Coffin wrote:
[snip]
Then again, I have to admit that I tend to think the presence of
std::list does more harm than good, since its presence seems to give
soem people the idea that they should use it, and over time I've become
convinced that suitable uses for linked lists are about as common as
magnetic monopoles (i.e. they should theoretically exist, but the few
times people think they might have seen one in reality, confirmation
seems impossible).

What about this one:

class command_bag {
public:

typedef std::tr1::function< void(void) > command;

private:

typedef std::list< command > command_list;

command_list the_list;

public:

typedef command_list::iterator removal_ticket;

template < typename Command >
removal_ticket insert ( Command c ) {
the_list.push_front( command( c ) );
return ( removal_ticket( the_list.begin() ) );
}

void remove ( removal_ticket t ) {
the_list.erase( t );
}

void execute ( void ) const {
for ( command_list::const_iterator iter = the_list.begin();
iter != the_list.end(); ++iter ) {
(*iter)();
}
}

};

A command_bag allows you to register an action. You get a ticket which you
can use to unregister the action. This can be useful if you have some
observers that want to be notified of certain events but may lose interest
at some point in the future.

The guarantees of std::list about invalidation of iterators are _exactly_
what you want in this case.


Best

Kai-Uwe Bux
 
J

Juha Nieminen

Jerry said:
Given the frequency with
which I've seen good uses for std::list::size() (i.e. essentially never)
I'm not sure that would be a good tradeoff.

In my proposed solution splice() would still be constant-time. Only
size() would be faster (except immediately after a splice()). I see
no "tradeoff" here. It's pure speedup with no negative side-effects.

Granted, in the vast majority of cases where lists are useful knowing
its size is usually unneeded (because a list is the kind of data
structure where its size is not really that important because more often
than not you are only interested in either the first/last item, or all
the items at the same time, and even if you are interested in one single
item somewhere in the list, you are going to have to traverse it anyways).

However, the problem with list::size() is that most people who don't
know it can be linear time may well assume it's constant-time because
it's constant-time in all the other STL data containers too, and thus
they will carelessly use list::size() without giving it a second
thought, thus resulting in needlessly inefficient programs. If
list::size() can be made faster without affecting negatively anything
else, I don't see any reason why it couldn't be done.
Then again, I have to admit that I tend to think the presence of
std::list does more harm than good, since its presence seems to give
soem people the idea that they should use it, and over time I've become
convinced that suitable uses for linked lists are about as common as
magnetic monopoles (i.e. they should theoretically exist, but the few
times people think they might have seen one in reality, confirmation
seems impossible).

I have used std::list for useful purposes many times. Linked lists
have certain useful properties when you know how to use them.

It would be a real pain to have to re-implement linked lists every
time I need one.
 
J

Juha Nieminen

P.J. Plauger said:
-- splice partial contents of another list

The resultant size of the donee list is well known, as is that of any
donating list, in all but the last case. The C++ Standard has
always required that all but the last case be constant time, while
the last one can be linear time.

I can imagine that any algorithm which needs list splicing assumes
that the splicing is constant-time. If splice() cannot be assumed to be
constant-time, then I suppose it becomes a completely useless function
for any purpose you may want to need that operation. Any algorithm
needing list splicing would have to implement a list of their own.

So the question is: If splice() cannot be assumed to be constant-time,
why support it at all? Better just remove it completely.

Compare this to eg. std::vector not having a push_front() function:
Since it cannot be done faster than linear time, it's not supported at all.
 
J

Jerry Coffin

In my proposed solution splice() would still be constant-time. Only
size() would be faster (except immediately after a splice()). I see
no "tradeoff" here. It's pure speedup with no negative side-effects.

I apparently wasn't clear, for which I apologize. What I meant was that
1) at least some implementations already do basically what you've
advocated. 2) the standard could specify that size() was always constant
complexity -- but only at the expense of making splice() linear in at
least a few cases.

That's the tradeoff: between size() and splice(), you get one with
constant complexity and the other with linear complexity. It's up to you
to choose which will be which. As you (and others) have noted, even when
size() is only required to have linear complexity, it can usually be
constant anyway. In the other direction, things aren't so rosy: if you
require that size() has linear complexity, then splicing a partial list
is unavoidably linear.

So, the tradeoff is between size() always being constant at the expense
of splice() sometimes being linear, or vice versa. As I said, given the
frequency with which size() is typically used, I think it would be a
poor tradeoff to make splice linear in exchange for assuring that size()
was always constant.

[ ... ]
However, the problem with list::size() is that most people who don't
know it can be linear time may well assume it's constant-time because
it's constant-time in all the other STL data containers too, and thus
they will carelessly use list::size() without giving it a second
thought, thus resulting in needlessly inefficient programs. If
list::size() can be made faster without affecting negatively anything
else, I don't see any reason why it couldn't be done.

Right -- which comed backk to my original comment (since confirmed by
P.J. Plaugar) that this is already done, in at least one widely-used
implementation.
I have used std::list for useful purposes many times. Linked lists
have certain useful properties when you know how to use them.

It would be a real pain to have to re-implement linked lists every
time I need one.

While I'll admit I may have slightly over-stated the case, I'm still not
sure. No insult intended, but this case seems to be like most I've seen:
though the statement is made that it's been useful for them, on
explanation is offered as to when, why or under what circumstances. I
see that Kai-Uwe Bux has offered an example elsethread, and I'll take a
look at it when I get a chance. It's true that (for one example) a list
assures that iterators remain valid, even when most other containers
would not do so. Nonetheless, I've yet to encounter a situation where
they worked out as the preferred container for the job.
 
J

Jerry Coffin

[ I had mentioned the rarity of examples of good uses for linked lists ]
What about this one:

[ code elided .... ]
A command_bag allows you to register an action. You get a ticket which you
can use to unregister the action. This can be useful if you have some
observers that want to be notified of certain events but may lose interest
at some point in the future.

The guarantees of std::list about invalidation of iterators are _exactly_
what you want in this case.

While I'll admit that in theory this is true, I also have to say that
when I've encountered similar situtations, it really didn't work out.
What you're doing is essentially equivalent to handing out a pointer to
the object's internal data, which we all know is generally a poor idea.

When I've encountered situations roughly similar to this, it ended up
being much cleaner to use a set. Instead of giving the client a unique
key in exchange for their function, when they register a function, you
return nothing. When they want to de-register the function, they just
tell you the function they want to de-register, and you look it up and
remove it from the set.

Theoretically, this should be marginally less efficient -- inserting and
removing items from a set is logarithmic instead of constant. Likewise,
traversing a set is theoretically marginally slower as well. In reality,
the difference between logarithmic and constant is usually quite small.
In addition, registering and unregistering functions doesn't happen
extremely often anyway. The difference in run-time speed wasn't even
dependably measureable, but the difference in the cleanliness of the
interface substantial and almost immediately obvious.

In the end, the validity of iterators matters primarily when you "hand
out" iterators to the object's data. I would posit that this is really
no different from handing out pointers to the object's data, which we've
all known for years is usually a bad idea (in fact, I believe that's the
theme of an item in one of Scott Meyers's books).
 
G

Guest

In my proposed solution splice() would still be constant-time. Only
size() would be faster (except immediately after a splice()). I see
no "tradeoff" here. It's pure speedup with no negative side-effects.

The problem is that when you specify that size() have constant
complexity except after a call to splice() you just defined size() to
have linear complexity. Remember that it is the worst case that is
given, and with your solution that would be after a call to splice(). So
what you suggest is that we keep size() as it is and make the last case
of splice() (splicing a partial sequence from another list) constant.
Only problem is that the only real expert on library implementation who
have said anything claims that this is a bad thing to do.
 
P

P.J. Plauger

I can imagine that any algorithm which needs list splicing assumes
that the splicing is constant-time.

You can imagine it. Indeed, several other people have imagined just
that over the past several years. One or two of the brighter ones even
tried to contrive use cases where a partial splice absolutely positively
had to be constant time or the algorithm would be too slow. Every
one of those cases I saw could easily be transmogrified to a case
where the spliced sublist could be made into an entire list, and hence
would splice in constant time.

An obvious case is list::sort, which does a Towers of Hanoi merge
sort by building sublists. The easiest way to implement this is to
minimize the number of temporary lists and splice sublists. But the
common implementation, based on the original H-P STL, uses
more sublists and does complete sublist splices. Hence, the sort
is fastern 'n blazes.
If splice() cannot be assumed to
be
constant-time, then I suppose it becomes a completely useless function
for any purpose you may want to need that operation.

What a crass overstatement. Moreover, in my extensive experience
algorithms with terrible theoretical time complexity almost always
have satisfactory real-world performance because a) data sets
seldom get large enough to matter and b) computers are blindingly
fast for most uses to begin with. In any case, others will find uses
for a linear splice even if you can't.
Any algorithm
needing list splicing would have to implement a list of their own.

Or get one of the implementations that aren't afraid to damage lists
occasionally in the interest of getting fast results all the time.
So the question is: If splice() cannot be assumed to be constant-time,
why support it at all? Better just remove it completely.

See above.
Compare this to eg. std::vector not having a push_front() function:
Since it cannot be done faster than linear time, it's not supported at
all.

Well, that would be one solution -- simply remove the partial splice option.
Note, however, that single-element splice and full-list splice are still
reliably constant-time operations. Or would you have us ditch these too
because people might *fear* they're not going to be fast enough?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
K

Kai-Uwe Bux

Jerry said:
[ I had mentioned the rarity of examples of good uses for linked lists ]
What about this one:

[ code elided .... ]
A command_bag allows you to register an action. You get a ticket which
you can use to unregister the action. This can be useful if you have some
observers that want to be notified of certain events but may lose
interest at some point in the future.

The guarantees of std::list about invalidation of iterators are _exactly_
what you want in this case.

While I'll admit that in theory this is true, I also have to say that
when I've encountered similar situtations, it really didn't work out.
What you're doing is essentially equivalent to handing out a pointer to
the object's internal data, which we all know is generally a poor idea.

When I've encountered situations roughly similar to this, it ended up
being much cleaner to use a set. Instead of giving the client a unique
key in exchange for their function, when they register a function, you
return nothing. When they want to de-register the function, they just
tell you the function they want to de-register, and you look it up and
remove it from the set.

Theoretically, this should be marginally less efficient -- inserting and
removing items from a set is logarithmic instead of constant. Likewise,
traversing a set is theoretically marginally slower as well. In reality,
the difference between logarithmic and constant is usually quite small.
In addition, registering and unregistering functions doesn't happen
extremely often anyway. The difference in run-time speed wasn't even
dependably measureable, but the difference in the cleanliness of the
interface substantial and almost immediately obvious.

In the end, the validity of iterators matters primarily when you "hand
out" iterators to the object's data. I would posit that this is really
no different from handing out pointers to the object's data, which we've
all known for years is usually a bad idea (in fact, I believe that's the
theme of an item in one of Scott Meyers's books).

a) You got distracted by a minor issue that arose from code simplification
for the sake of exposition. In the real version, the return_ticket is not a
raw iterator but a class that contains that iterator (in fact, a shared
pointer to that iterator). The return_ticket will _not allow_ any access to
the registered object at all. It can _only_ be used to remove the object
from the bag (and in fact, it checks whether the object had been removed
before or whether the object is registered with a different command_bag and
triggers assertions for contract violations accordingly). Thus, it is not
at all comparable to handing out raw pointers.

Here is the actual version (rc_ptr is a reference counted pointer):

struct command_bag {
public:

typedef function< void(void) > command;

private:

typedef std::list< command > command_list;

command_list the_list;

static
command_list::iterator null ( void ) {
static command_list dummy;
return ( dummy.end() );
}

public:

class removal_ticket {

friend class command_bag;

rc_ptr< command_list::iterator > iter_ptr;

command_list::iterator list_end;

public:

removal_ticket ( command_list::iterator iter_a = null(),
command_list::iterator iter_b = null() )
: iter_ptr ( iter_a )
, list_end ( iter_b )
{}

};

template < typename Command >
removal_ticket insert ( Command c ) {
the_list.push_front( command( c ) );
return ( removal_ticket( the_list.begin(), the_list.end() ) );
}

void remove ( removal_ticket t ) {
ASSERT( *(t.iter_ptr) != null() );
ASSERT( t.list_end == the_list.end() );
the_list.erase( *(t.iter_ptr) );
*(t.iter_ptr) = null();
}

void execute ( void ) const {
for ( command_list::const_iterator iter = the_list.begin();
iter != the_list.end(); ++iter ) {
(*iter)();
}
}

};


b) The drawback of the approach using std::set<> is not so much that it is
marginally less efficient. The major drawback is that it imposes additional
conceptual requirements on the objects you can register: they must be
comparable with one another. This is generally not feasible for objects of
type tr1::function< void( whatever ) >.


Best

Kai-Uwe Bux
 
J

Juha Nieminen

P.J. Plauger said:
You can imagine it. Indeed, several other people have imagined just
that over the past several years. One or two of the brighter ones even
tried to contrive use cases where a partial splice absolutely positively
had to be constant time or the algorithm would be too slow. Every
one of those cases I saw could easily be transmogrified to a case
where the spliced sublist could be made into an entire list, and hence
would splice in constant time.

Several years ago, while working at the university here, I had to
implement a labeled transition system (which is basically a directed
graph with possibly invisible transitions) reduction which preserves
branching bisimilarity formalism (and from that code it was easy to also
make a fast strong bisimilarity reduction program as well).
The graphs involved could be enormous (millions of states and
transitions), so speed was quite an important element.

It was many years ago so I don't remember the details of the algorithm
anymore, but I do remember that some kind of list splicing was necessary
(no matter how much I thought I could not come up with a solution which
would avoid using linked lists, and believe me I tried; the algorithm
just has to be done with linked lists, I don't think there's a way
around it). I must admit, however, that I'm not absolutely certain if
any of the list splicing required was of the kind that you mentioned can
be easily done in constant-time even with a constant-time size(), but
it's very possible.

Anyways, I just remember that as a good example where enormous amounts
of data have to be handled very fast, using linked list is just
practically the only viable solution, and where constant-time splicing
is critical. When I implemented the algorithm I assumed that
list::splice() is indeed constant-time. Perhaps I shouldn't have.
 
J

Jerry Coffin

[ ... ]
a) You got distracted by a minor issue that arose from code simplification
for the sake of exposition.

Well, I'll admit about all I can really look at is what's actually in a
post...
In the real version, the return_ticket is not a
raw iterator but a class that contains that iterator (in fact, a shared
pointer to that iterator). The return_ticket will _not allow_ any access to
the registered object at all. It can _only_ be used to remove the object
from the bag (and in fact, it checks whether the object had been removed
before or whether the object is registered with a different command_bag and
triggers assertions for contract violations accordingly). Thus, it is not
at all comparable to handing out raw pointers.

Okay, so with enough extra work you can (to at least some degree) work
around the most obvious shortcomings of using iterators in this way, but
I'm left wondering whether (and if so how) there's a real advantage. So
far, I haven't seen one...

[ ... ]
b) The drawback of the approach using std::set<> is not so much that it is
marginally less efficient. The major drawback is that it imposes additional
conceptual requirements on the objects you can register: they must be
comparable with one another. This is generally not feasible for objects of
type tr1::function< void( whatever ) >.

I'll admit that when I did it, I was using addresses of functions (this
was a while ago, long before TR1 was written). Offhand, I'm hard put to
see the problem though: instantiations of tr1::function should be
objects with addresses, and std::less is required to support comparison
of addresses, even in cases where the built-in < operator wouldn't (e.g.
separate objects).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top