Why doesn't threading.join() return a value?

R

Roy Smith

I have a function I want to run in a thread and return a value. It
seems like the most obvious way to do this is to have my target
function return the value, the Thread object stash that someplace, and
return it as the return value for join().

Yes, I know there's other ways for a thread to return values (pass the
target a queue, for example), but making the return value of the
target function available would have been the most convenient. I'm
curious why threading wasn't implemented this way.
 
S

Steven D'Aprano

Roy said:
I have a function I want to run in a thread and return a value. It
seems like the most obvious way to do this is to have my target
function return the value, the Thread object stash that someplace, and
return it as the return value for join().

Yes, I know there's other ways for a thread to return values (pass the
target a queue, for example), but making the return value of the
target function available would have been the most convenient. I'm
curious why threading wasn't implemented this way.

Because then the code launching the thread would have to block, waiting
until the thread is completed, so it will have a result to return.
 
S

Seebs

Because then the code launching the thread would have to block, waiting
until the thread is completed, so it will have a result to return.

Isn't "waiting until the thread is completed" sort of the point of join()?

-s
 
A

Adam Skutt

I have a function I want to run in a thread and return a value.  It
seems like the most obvious way to do this is to have my target
function return the value, the Thread object stash that someplace, and
return it as the return value for join().
target a queue, for example), but making the return value of the
target function available would have been the most convenient.  I'm
curious why threading wasn't implemented this way.

I assume it is because the underlying operating system APIs do not
support it. Windows and POSIX threads only support returning an
integer when a thread exits, similar to the exit code of a process.
More importantly, there's no way to tell whether the exit code of a
thread was set by user code or by the system. Even worse, some of
those integer values are reserved by some operating systems. If your
thread died via an exception, it still has an error code set by the
operating system. How would you going to distinguish those codes from
your own?

Adam
 
A

Alain Ketterlin

Adam Skutt said:
I assume it is because the underlying operating system APIs do not
support it. Windows and POSIX threads only support returning an
integer when a thread exits, similar to the exit code of a process.

Sorry, you're wrong, at least for POSIX threads:

void pthread_exit(void *value_ptr);
int pthread_join(pthread_t thread, void **value_ptr);

pthread_exit can pass anything, and that value will be retrieved with
pthread_join. Threads of a process share their address space, there is
no reason to restrict their return value to an int.
More importantly, there's no way to tell whether the exit code of a
thread was set by user code or by the system. Even worse, some of
those integer values are reserved by some operating systems.

I'm not sure what you are talking about here. Maybe you confuse threads
with processes?

Re. the original question: since you can define your own Thread
subclass, with wathever attribute you want, I guess there was no need to
use join() to communicate the result. The Thread's run() can store its
result in an attribute, and the "client" can get it from the same
attribute after a successful call to join().

-- Alain.
 
A

Adam Skutt

Sorry, you're wrong, at least for POSIX threads:

void pthread_exit(void *value_ptr);
int pthread_join(pthread_t thread, void **value_ptr);

pthread_exit can pass anything, and that value will be retrieved with
pthread_join.

No, it can only pass a void*, which isn't much better than passing an
int. Passing a void* is not equivalent to passing anything, not even
in C. Moreover, specific values are still reserved, like
PTHREAD_CANCELLED. Yes, it was strictly inappropriate for me to say
both return solely integers, but my error doesn't meaningful alter my
description of the situation. The interface provided by the
underlying APIs is not especially usable for arbitrary data transfer.
Doubly so when we're discussing something like Python's threading
module.
I'm not sure what you are talking about here. Maybe you confuse threads
with processes?

Windows threads have exit codes, just like processes. At least one
code is reserved and cannot be used by the programmer.

Adam
 
C

Chris Torek

Sorry, you're wrong, at least for POSIX threads:

void pthread_exit(void *value_ptr);
int pthread_join(pthread_t thread, void **value_ptr);

pthread_exit can pass anything, and that value will be retrieved with
pthread_join.
[/QUOTE]

No, it can only pass a void*, which isn't much better than passing an
int.

It is far better than passing an int, although it leaves you with
an annoying storage-management issue, and sidesteps any reasonable
attempts at type-checking (both of which are of course "par for
the course" in C). For instance:

struct some_big_value {
... lots of stuff ...
};
struct some_big_value storage_management_problem[SIZE];
...
void *func(void *initial_args) {
...
#ifdef ONE_WAY_TO_DO_IT
pthread_exit(&storage_management_problem[index]);
/* NOTREACHED */
#else /* the other way */
return &storage_management_problem[index];
#endif
}
...
int error;
pthread_t threadinfo;
pthread_attr_t attr;
...
pthread_attr_init(&attr);
/* set attributes if desired */
error = pthread_create(&threadinfo, &attr, func, &args_to_func);
if (error) {
... handle error ...
} else {
...
void *rv;
result = pthread_join(&threadinfo, &rv);
if (rv == PTHREAD_CANCELED) {
... the thread was canceled ...
} else {
struct some_big_value *ret = rv;
... work with ret->field ...
}
}

(Or, do dynamic allocation, and have a struct with a distinguishing
ID followed by a union of multiple possible values, or a flexible
array member, or whatever. This means you can pass any arbitrary
data structure back, provided you can manage the storage somehow.)
Passing a void* is not equivalent to passing anything, not even
in C. Moreover, specific values are still reserved, like
PTHREAD_CANCELLED.

Some manual pages are clearer about this than others. Here is one
that I think is not bad:

The symbolic constant PTHREAD_CANCELED expands to a constant
expression of type (void *), whose value matches no pointer to
an object in memory nor the value NULL.

So, provided you use pthread_exit() "correctly" (always pass either
NULL or the address of some actual object in memory), the special
reserved value is different from all of "your" values.

(POSIX threads are certainly klunky, but not all *that* badly designed
given the constraints.)

For that matter, you can use the following to get what the OP asked
for. (Change all the instance variables to __-prefixed versions
if you want them to be Mostly Private.)

import threading

class ValThread(threading.Thread):
"like threading.Thread, but the target function's return val is captured"
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
super(ValThread, self).__init__(group, None, name, None, None, verbose)
self.value = None
self.target = target
self.args = args
self.kwargs = {} if kwargs is None else kwargs

def run(self):
"run the thread"
if self.target:
self.value = self.target(*self.args, **self.kwargs)

def join(self, timeout = None):
"join, then return value set by target function"
super(ValThread, self).join(timeout)
return self.value
 
S

Steven D'Aprano

Seebs said:
On 2011-09-02, Steven D'Aprano <[email protected]>
wrote: [...]
Because then the code launching the thread would have to block, waiting
until the thread is completed, so it will have a result to return.

Isn't "waiting until the thread is completed" sort of the point of join()?

Doh!

I mean, well done, you have passed my little test!

<wink>
 
A

Adam Skutt

It is far better than passing an int, although it leaves you with
an annoying storage-management issue, and sidesteps any reasonable
attempts at type-checking (both of which are of course "par for
the course" in C).

And when written out, makes it sound distinctly worse than passing an
int :p. And let's not kid ourselves, unless you're a C programmer, it
is distinctly worse than passing an int. Heck, your example (snipped)
goes out of your way to unnecessarily leverage the functionality
provided by pthreads.
Some manual pages are clearer about this than others.  Here is one
that I think is not bad:

    The symbolic constant PTHREAD_CANCELED expands to a constant
    expression of type (void *), whose value matches no pointer to
    an object in memory nor the value NULL.

So, provided you use pthread_exit() "correctly" (always pass either
NULL or the address of some actual object in memory), the special
reserved value is different from all of "your" values.

Unfortunately, I'm not sure all implementations behave that way. Not
that cancellation is really worth bothering with anyway, but it's a
pretty nasty corner case.

Adam
 
R

Roy Smith

Adam Skutt said:
I assume it is because the underlying operating system APIs do not
support it. Windows and POSIX threads only support returning an
integer when a thread exits, similar to the exit code of a process.

But the whole point of higher level languages is to hide the warts of
the lower-level APIs they are built on top of. Just because a POSIX
thread can only return an int (actually, a void *) doesn't mean that
level of detail needed to be exposed at the Python threading library
level.
More importantly, there's no way to tell whether the exit code of a
thread was set by user code or by the system. Even worse, some of
those integer values are reserved by some operating systems. If your
thread died via an exception, it still has an error code set by the
operating system. How would you going to distinguish those codes from
your own?

I think you're talking about processes, not threads, but in any case,
it's a non-sequitur. Thread.join() currently returns None, so there's
no chance for confusion.
 
C

Chris Torek

Thread.join() currently returns None, so there's
no chance for [return value] confusion.

Well, still some actually. If you use my example code (posted
elsethread), you need to know:

- that there was a target function (my default return
value if there is none is None); and
- that the joined thread really did finish (if you pass
a timeout value, rather than None, and the join times
out, the return value is again None).

Of course, if your target function always exists and never returns
None, *then* there's no chance for confusion. :)
 
A

Alain Ketterlin

Adam Skutt said:
No, it can only pass a void*, which isn't much better than passing an
int.

We'll have to disagree. A void* simply can point to anything you want.
Since thread stacks disappear at end of thread, only dynamically
allocated memory can be used to store the result. That's why you get a
pointer. There is no restriction on that pointer provided it doesn't
point to memory that has been deallocated.
Passing a void* is not equivalent to passing anything, not even in C.
Moreover, specific values are still reserved, like PTHREAD_CANCELLED.

Thread cancellation is program logic (pthread_cancel), it doesn't mean
you thread crashed, it means your program decided to cancel the thread.
If you still care about the return value after having called
pthread_cancel(),
Yes, it was strictly inappropriate for me to say both return solely
integers, but my error doesn't meaningful alter my description of the
situation. The interface provided by the underlying APIs is not
especially usable for arbitrary data transfer.

Again, I may misunderstand your wording, but there is no "data transfer"
at all, since memory is shared between threads.
Doubly so when we're discussing something like Python's threading
module.

The OP was clearly discussing the case where a thread has a result, and
how to get it back. POSIX threads let you do that. There are of course
tons of other ways to do the same thing. Win32 will force you to use
some other way.
Windows threads have exit codes, just like processes. At least one
code is reserved and cannot be used by the programmer.

Is that STILL_ACTIVE that we are talking about? That's an artefact of
the design of GetExitCodeThread, which will return either the thread
exit code or its own error code. The python lib could easily hide this,
and use run()'s return value to store the (python) result somewhere.

-- Alain.
 
A

Alain Ketterlin

Alain Ketterlin said:
Thread cancellation is program logic (pthread_cancel), it doesn't mean
you thread crashed, it means your program decided to cancel the thread.
If you still care about the return value after having called
pthread_cancel(),

Sotry, forgot to end this sentence... What I mean is:

If you still care about the return value after having called
pthread_cancel(), your program logic is unnecessarily complex, and
you should find some other way to handle this case.

-- Alain.
 
C

Carl Banks

I assume it is because the underlying operating system APIs do not
support it.

Nope. This could easily be implemented by storing the return value in the Thread object.

It's not done that way probably because no one thought of doing it.


Carl Bannks
 
C

Carl Banks

No, it can only pass a void*, which isn't much better than passing an
int. Passing a void* is not equivalent to passing anything, not even
in C. Moreover, specific values are still reserved, like
PTHREAD_CANCELLED. Yes, it was strictly inappropriate for me to say
both return solely integers, but my error doesn't meaningful alter my
description of the situation. The interface provided by the
underlying APIs is not especially usable for arbitrary data transfer.

I'm sorry, but your claim is flat out wrong. It's very common in C programming to use a void* to give a programmer ability to pass arbitrary data through some third-party code.

The Python API itself uses void* in this way in several different places. For instance, ake a look at the Capsule API (http://docs.python.org/c-api/capsule.html). You'll notice it uses a void* to let a user pass in opaque data. Another case is when declaring properties in C: it's common to definea single get or set function, and only vary some piece of data for the different properties. The API provides a void* so that the extension writer can pass arbitrary data to the get and set functions.


Carl Banks
 
R

Roy Smith

Chris Torek said:
For that matter, you can use the following to get what the OP asked
for. (Change all the instance variables to __-prefixed versions
if you want them to be Mostly Private.)

import threading

class ValThread(threading.Thread):
"like threading.Thread, but the target function's return val is captured"
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
super(ValThread, self).__init__(group, None, name, None, None,
verbose)
self.value = None
self.target = target
self.args = args
self.kwargs = {} if kwargs is None else kwargs

def run(self):
"run the thread"
if self.target:
self.value = self.target(*self.args, **self.kwargs)

def join(self, timeout = None):
"join, then return value set by target function"
super(ValThread, self).join(timeout)
return self.value

Yeah, that's pretty much what I had in mind. I'm inclined to write up a
PEP proposing that this become the standard behavior of
threading.Thread. It seems useful, and I can't see any way it would
break any existing code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top