[ANN] Drake: Distributed Rake

  • Thread starter quixoticsycophant
  • Start date
T

Trans

They *should* be the same, but if we're discussing legacy rakefiles
where people have implicitly relied on their being different...

I agree that there's really no 'right' thing to do, though - either
you've specified your depgraph properly or you haven't.

*should*? How is one correct and the other not? They are just
different behaviors. Ie. rake is the same as drake -j1.

The problem is that drake -j2 or more can royally screw a Rakefile not
written for it. Thus the "fix" is to remain backward compatible but
add a syntactical distinction for j-ready tasks. Then there is no
problem.

T.
 
J

Jim Weirich

James said:
In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. Consider

task :a => [:x, :y, :z]

With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
before _a_ is invoked.

Just to clarify: In standard rake x, y and z will be invoked by task a
in that order. However, that doesn't provide any guarantees that they
will be executed in that order.

For example, consider the following additional dependencies:

task :x => :z

Then the code for z will be executed before task x.

The moral of the story is that depending upon ordering of dependencies
to determine the ordering of execution is a bug in standard rake too.
(its just more likely that the drake will make this kind of bug
manifest).

BTW, good job James.
 
J

Jim Weirich

unknown said:
If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time. That's probably not what you want :)

[...] It's a math problem.
'multitask' stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

I'm not quite sure what you are saying here, but if you are trying to
imply that multitask does not honor dependencies in ordering, you are
incorrect. If there is dependency declared, then a task won't run until
all of its dependencies have finished.

That being said, there is a known bug in multitask where failures in
dependencies are not properly transmitted to all dependent tasks. But I
don't think you were refering to that.

-- Jim Weirich
 
J

Jim Weirich

David said:
Actually, no, I assumed that 'multitask' only ran that specific task in
parallel.

Actually, multitask will run all of the tasks dependencies in parallel,
not the task itself.

-- Jim Weirich
 
J

Jim Weirich

Thomas said:
Run in parrallel.

That way all old script work fine, and as we get smart and make our
tasks j-safe we can add the extra "j-array".

I thought someone else might notice it and elaborate but there is
another potential benefit of this notation. Eg.

task :a => [[:x, :y, :z], [:m, :n], :r]

Where :x, :y, :z can be run in parallel, as can :m and :n, but the
groups must run one before the other.

As stated before, assuming execution order amoung dependencies is a bug
even in standard rake. If groups of tasks need to be ordered in time,
declare a dependency. Anything else is just wrong.

-- Jim Weirich
 
T

Trans

James said:
In a given Rakefile, it is possible (even likely) that the dependency
tree has not been properly defined. =A0Consider
=A0 =A0task :a =3D> [:x, :y, :z]
With single-threaded Rake, _x_,_y_,_z_ will be invoked *in that order*
before _a_ is invoked. =A0

Just to clarify: =A0In standard rake x, y and z will be invoked by task a
in that order. =A0However, that doesn't provide any guarantees that they
will be executed in that order.

For example, consider the following additional dependencies:

task :x =3D> :z

Then the code for z will be executed before task x.

The moral of the story is that depending upon ordering of dependencies
to determine the ordering of execution is a bug in standard rake too.
(its just more likely that the drake will make this kind of bug
manifest).

Ah, so by design you consider it a bug. You could have fixed that from
day one by randomizing the order of the prerequisites. Now you have a
situation where many Rakefiles depend on that bug. So, why not turn
lemons into lemonade, and make this bug a feature?

T.
 
J

Jim Weirich

Thomas said:
Ah, so by design you consider it a bug. You could have fixed that from
day one by randomizing the order of the prerequisites. Now you have a
situation where many Rakefiles depend on that bug. So, why not turn
lemons into lemonade, and make this bug a feature?

I'm not sure what you are advocating here:

(1) Guarantee that rake will invoke the prerequisites in the defined
order? ... we already do that (for standard non-tasking rake).

(2) Guarantee that rake will execute the prerequisites in the defined
order? ... Can't do that, prerequisite constraints elsewhere may
constrain the execution to be a different order.

(3) Declare that I don't mind if you make unwarranted assumptions about
execution order? ... Well, as long as you don't file bug reports, I'm ok
with that.

More clarification on Rake terminology:

To execute a task means to execute any code blocks attached to the task
(i.e. the do/end part of a task).

To invoke a task means to make sure all the prerequisites for the task
have been invoked and then execute the task if it has not yet been
executed. A task invocation will not execute the task if it has already
been executed.

In standard rake, the order of dependencies only specifies the
invocation order, not the execution order. You never were able to
directly control execution order of tasks via the order of the
dependency list.

In moving to drake, what you lose is direct control over invocation
order. You never had direct control of execution order.

-- Jim Weirich
 
T

Trans

I'm not sure what you are advocating here:

(1) Guarantee that rake will invoke the prerequisites in the defined
order? ... we already do that (for standard non-tasking rake).

(2) Guarantee that rake will execute the prerequisites in the defined
order? =A0... Can't do that, prerequisite constraints elsewhere may
constrain the execution to be a different order.

(3) Declare that I don't mind if you make unwarranted assumptions about
execution order? ... Well, as long as you don't file bug reports, I'm ok
with that.

More clarification on Rake terminology:

To execute a task means to execute any code blocks attached to the task
(i.e. the do/end part of a task).

To invoke a task means to make sure all the prerequisites for the task
have been invoked and then execute the task if it has not yet been
executed. =A0A task invocation will not execute the task if it has alread= y
been executed.

In standard rake, the order of dependencies only specifies the
invocation order, not the execution order. =A0You never were able to
directly control execution order of tasks via the order of the
dependency list.

And yet we can use the execution order in practice:

F =3D []
G =3D []

task :f do
F.replace([1,2,3])
end

task :g do
if F.empty?
G.replace([4,5,6])
else
G.replace(F)
end
end

desc "use f and g not defined by f"
task :g1 =3D> [:g, :f] do
p G, F
end

desc "use f and g defined by f"
task :g2 =3D> [:f, :g] do
p G, F
end

I understand that the formal design did not intend for this. But
implementation allows it.

Is it worth potentially breaking Rakefiles to prevent this sort of
thing (like drake -j2 or more does)? I'm not so sure. While one might
consider this Rakefile "bad design" because it doesn't fit the
original formal notion, it nonetheless does what one would expect it
to do. I think I'd rather have that, than the potential for ambiguous
behavior.

T.
 
E

Eric Hodel

Is it worth potentially breaking Rakefiles to prevent this sort of
thing (like drake -j2 or more does)? I'm not so sure. While one might
consider this Rakefile "bad design" because it doesn't fit the
original formal notion, it nonetheless does what one would expect it
to do. I think I'd rather have that, than the potential for ambiguous
behavior.


They aren't potentially broken, they are broken. If it happens to
work, you've just gotten lucky.

I've helped rework the Rubinius rakefiles twice and I can assure you
it's perfectly possible to having broken rakefiles without -j. We
were able to use drake with only one change due to having working
rakefiles beforehand.

Furthermore, this is a feature that is not enabled by default. I
don't see where this is an issue.
 
A

Anton Ivanov

Eric said:
We were able to use drake with only one change due to having working
rakefiles beforehand.

I'm anxious to try it, but as I wrote the gem seems not to work. How
did you do it?
 
?

.

I'm anxious to try it, but as I wrote the gem seems not to work.  How
did you do it?

Did you see my response above? What is the output when you run drake
on this

task :default do
puts $LOAD_PATH
end

For some reason rubygems isn't manipulating your $LOAD_PATH correctly,
or something is overriding it.
 
T

Trans

They aren't potentially broken, they are broken. =A0If it happens to =A0
work, you've just gotten lucky.

There's no such thing as luck in computer programming.

T.
 
?

.

James said:
If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time.  That's probably not what you want :)
[...] It's a math problem.
'multitask' stomps it all to pieces, having the power to declare 2 + 5
= 8 if it so chooses.

I'm not quite sure what you are saying here, but if you are trying to
imply that multitask does not honor dependencies in ordering, you are
incorrect.  If there is dependency declared, then a task won't run until
all of its dependencies have finished.

That being said, there is a known bug in multitask where failures in
dependencies are not properly transmitted to all dependent tasks.  But I
don't think you were refering to that.

I did not mean to imply there was something wrong with standard Rake.
It was being argued that Drake should retain the 'multitask' feature
for backwards compatibility. But from the point of view of Drake,
'multitask' is a mistake, the antithesis of everything Drake tries to
achieve. At a given point during execution, what is parallelizable is
a math problem with only one answer. But multitask comes prancing in
and announces: "I'm going to parallelize you, you, and you, just
because it's my birthday and I'm in a good mood." And then Drake is
like, "WTF are you doing? Not only is that the wrong answer to the
math problem, you're also trashing the node-locking algorithms."
If 'task' became 'multitask', Rake would run all your tasks at once --
all at the same time. That's probably not what you want :)

Now that I read my own words here, I see it is misleading. I should
have said: If 'task' became 'multitask', Rake would run all your tasks
at once, one per thread all at the same time, but each thread would
still block until prereqs are filled. Which *could* be what you want
on a small project, but even a small project can have enough tasks to
bog it down.

Looking back at the beginning of this thread, I did not mean to imply
Jim moved to github just for me. He was moving anyway. Due to my
asking for a branch commit in the SVN repository, he may have moved a
little sooner. Which was convenient for me. Thanks.

JL
 
J

Jim Weirich

Thomas said:
*should*? How is one correct and the other not?

Because one assumes a dependency that is not explicitly declared. Rake
only guarantees execution ordering in the face of explicit dependencies.

-- Jim Weirich
 
J

Jim Weirich

Jos said:
I'm misremembering. SIGINT seems to work okay, it's SIGTERM that leaves
orphaned children (with ppid 1) around with rake, presumably because it
doesn't catch that signal. Same with drake (0.8.1.11.0.1)

What would be a good way to fix this?

-- Jim Weirich
 
?

.

While one might consider this Rakefile "bad design" because it
doesn't fit the original formal notion, it nonetheless does what one
would expect it to do. I think I'd rather have that, than the
potential for ambiguous behavior.

Underspecified dependencies + parallel execution == ambiguous behavior
There's no such thing as luck in computer programming.

Yes, there is.
 
D

David Masover

If by "non-thread-safe libraries" you mean a library whose Rakefile is
not j-safe, then you would just run it without -j.

Which is, by the way, one of the most irritating things about Makefiles.

While in the simple case, a Makefile author might not know about -j, and write
a safe Makefile anyway (because that's really simpler, after all), it's
really troublesome that there's no standard way to tell whether something's
j-safe or not.

Seems like the best I can do is run something with -j, and if it seems to
work, well, hope for the best.

That's one reason I like multitask -- it forces the programmer to be
explicitly thinking about threading.
If it is a library
inside a larger project, you have at least two options:

(a) Run single-threaded rake in a subprocess for that library.

(b) Use the Rake module directly, as the unit tests do. The
no-invoke-inside-invoke rule applies per TaskManager, so you could
create a new TaskManager and do whatever you wish with it.

I was talking about an even simpler problem:

Let's pretend, for a moment, that we're talking to an HTTP library that's not
thread-safe. Our Rakefile, for whatever reason, needs to download stuff and
then work with it. So we can't use this HTTP library directly -- we need to
wrap synchronization around it.

But, there's still an advantage to running the actual meat of the tasks in
parallel -- maybe we're doing some complex hpricot parsing, and we're
connecting to a potentially-slow server. Ideally, we want to download as fast
as we can, but once it's downloaded, we want to start crunching in worker
threads.

So there's still a benefit to Drake/Multitask, but there's the added
complexity of having to wrap that non-thread-safe library.

Contrived example, I know.

That's an advantage to single-threaded Rake, by the way -- by default, you
don't need to think about any of this. (-j1 isn't an excuse, unless the
Rakefile can force it, because then it's up to the user to figure out what
j-level to use. That should be transparent.)
 
T

Trans

Underspecified dependencies + parallel execution =3D=3D ambiguous behavio=
r

They are only unspecified according to an interpretation of how things
ought to be. In the current implementation Rake is executing in a
predictable order. One can use it, and people have. Maybe not formally
ideal but the functionality is there. But that's not whast really
concerns me. The issue I was looking at was:

drake -j2 + Rakefile =3D ambiguous behavior

So I was suggesting that it would perhaps be better to accept rake's
current implementation behavior; this ambiguity would then not arise;
and instead provide another notation to indicate parallel execution.
My particular idea might not be the best one, I was just looking for a
possible solution that could be useful in itself and address this
issue. Another possibility is just placing a statement at the
beginning of a Rakefile that could be used to indicate that the
rakefile is in fact "j-able".

I thought it prudent to address this b/c, personally, I'd like to see -
j end up in Rake itself. But perhaps it is better to just move forward
and expect people to fix all there old Rakefiles (and lets just hope
nothing really ugly happens when they haven't).
Yes, there is.

And his name is _why? ;) Well, i suppose if we want to take chances,
then there is.

T.
 
D

David Masover

There is a mathematical reality we cannot avoid, from which
special-case syntax and backwards-compatibility acrobatics cannot save
us. The problem is in our thinking. We didn't specify what depends
on what. We thought we did, but it turns out we were fooling
ourselves all along.

That's not always the problem. Given that Rake itself doesn't guarantee any
kind of ordering, we have to assume that dependencies are specified
correctly, or close to it.

But we're not writing Erlang, which means spec-ing dependencies correctly
isn't enough.
Trans suggested that this

task :a => [:x, :y, :z]

should be translated into this

task :a => :z
task :z => :y
task :y => :x

while this

task :a => [[:x, :y, :z]]

is translated into this

task :a => :x
task :a => :y
task :a => :z

OK, but there are a million ways in which a programmer can
insufficiently define dependencies. This will not come close to
saving us.

No, but it does take us back to the behavior of Rake, or of Drake -j1. If you
really want to provide bug-for-bug compatibility, dig into the Rake code and
figure out what the ordering should be.
There is already a historical precedent with Makefiles. A new syntax
could have been added to Makefiles, but none was. The Makefiles had
bugs, but instead of timidly skirting around the problems while
praising the gods of backwards compatibility, people faced them
head-on, solving them one at at time.

Some did, yes.

And some let their Makefiles remain, with the existing syntax and bugs, and
left it to their users to figure out whether they could be parellized or not.

I'm sorry, but if you're already asking me to manually run a rake task, you
don't get to also ask me to read the source code of your Rakefile and figure
out whether or not it will work with -j2. Nor should I have to use trial and
error, potentially with very subtle bugs, to figure out what's happened.



And it's worth mentioning again: We're not writing Erlang, we're writing Ruby.

That means shared memory. It means locking issues. And it means thread-unsafe
libraries.

It means that a Rakefile could very well crash if run with -j2.

Understand, I don't mean it will be run in the wrong order, or that the
dependencies are wrong. The dependencies may well be perfect, and it will run
exactly as designed to.

Except that at some point, two separate tasks will simultaneously do something
a library won't like, and that library will deadlock. Or segfault. Or worse,
give corrupt data.

Which means that the Rakefile author is responsible, then, for fixing the
deficiencies in the library. Or they have to contact the library author, and
attempt to get the library fixed. Making every single Ruby library
thread-safe is a laudable goal, but also not going to happen.



You could solve a lot of that, I suppose, by forking instead -- but that
introduces its own problems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top