Default scope of variables

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

Recently, there was a thread where people discussed variable
declarations, with a couple people stating that they wished that Python
required you to declare local variables, instead of globals.

I'm sure they have their (foolish, pathetic) *wink* reasons for this, but
I thought I'd explain why I think Python makes the right decision to
require declarations for globals but not locals.

I was reading some Javascript today, and wading through masses of code
that looked something like this:

function update_results() {
var n1 = foo;
var n2 = bar;
var m1 = foobar;
var m2 = foobarbaz;
var fe = n1 * m1;
var fi = n2 * m2;
var fo = n1 * m2;
var fum = n2 * m1;
...
}

and so on for about a page and a half. Looking at that page in my editor,
with that solid column of bold "var" keywords, it struck me just how
redundant that enormous column of "var"s was. Of course they were
variables, what else would they be?

Larry Wall, the creator of Perl, is fond of discussing Huffman coding as
it relates to programming syntax. Common things should be short, rare
things can be longer. Wall is not concerned about saving a few bytes in
your source files, but about programmer effort: typing effort, reading
effort, mental processing effort. Which do you prefer?

total = subtotal + extra

set total to subtotal plus extra

Even though the second is only 8 more characters, I bet you prefer the
first version.

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

Not just theoretically bad. Here's a real-world case where a single
missed "var" lead to something much, much worse than a crash: code that
kept going, but doing the wrong thing.

http://blog.safeshepherd.com/23/how-one-missing-var-ruined-our-launch/


The two situations:

1) Accidentally scope an intended local as global;
2) Accidentally scope an intended global as local;

are not symmetrical. In the first case, you get multiple invocations of
your function overwriting each other's data. Confusion reigns, but the
function calls will likely continue, pumping out garbage results instead
of crashing. The likely result is typically fail-unsafe rather than fail-
safe. [Aside: fail-safe does not mean "safe from failing", but "fails in
a safe manner".]

In the second case, any failure is far more likely to result in the
function call failing hard with an exception (fail-safe) rather than
churning out bad results, since each call of the function gets its own
set of locals rather than using those from some other call.

So in Javascript, it's easy to get unsafe globals by accident; in Python,
it's hard to get unsafe globals by accident. In my opinion, Python gets
it right.




[1] As in, "Global variables considered harmful", one of the classic
papers of computer science:
http://c2.com/cgi/wiki?GlobalVariablesConsideredHarmful

[2] Actually, Javascript gives you something a little closer to Python's
"nonlocal" by default: each enclosing function is searched for a matching
variable, terminating at the global scope.
 
C

Chris Angelico

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.
Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration
is. That's pointless. C++, on the other hand, lets you do this:

void somefunc() {
for (int i=0;i<10;++i) {
// do something with outer i
for (int i=0;i<4;++i) {
// do something with inner i
}
// outer i is visible again
}
// neither i is visible
}

Also, C++ overlays the "this is local" declaration with the "this
contains this type" declaration, which neither Python nor Javascript
bothers with; that makes the declaration feel less redundant.

Granted, this flexibility is mostly of value when writing huge
functions with complex nesting, but it is something that I find
convenient.

In terms of Huffman coding, every C++ variable must be declared,
somewhere. It's not a matter of declaring globals or declaring locals
- you just declare variables. If you declare them at file scope,
they're globals; if at function scope, they're locals. There's really
no difference. Everything's visible at its own level and those further
in, and not those further out.

I do see the convenience of the Python system, and I do like it; but
someone needs to speak up for the foolish and pathetic :)

ChrisA
 
J

Joshua Landau

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.
Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration
is.

Coffeescript, which compiles to Javascript, "fixes" the problem Steven
brought up by automatically declaring variables so that you don't have
to. But what do you think this does?:

a = 1
func = ->
a = 2
b = 2

The "a" in "func" is global, the "b" is local. And Coffeescript
*doesn't let* you shadow even if you explicitly want to. There just
isn't syntax for it.

That said, I'm not too convinced. Personally, the proper way to do
what you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.
 
C

Chris Angelico

That said, I'm not too convinced. Personally, the proper way to do
what you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.

def foo():
for i in range(3):
print("outer",i)
def inner():
for i in range(4):
print("inner",i)
inner()
print("outer",i)

That works, but you then have to declare all your nonlocals, and it
hardly reads well.

ChrisA
 
J

Joshua Landau

def foo():
for i in range(3):
print("outer",i)
def inner():
for i in range(4):
print("inner",i)
inner()
print("outer",i)

That works, but you then have to declare all your nonlocals, and it
hardly reads well.

Unfortunately that's what people, I included, end up doing. Stuff like:

def paranoia(...):
def safe_recursive(...):
safe_recursive(...)
return safe_recursive
safe_recursive = paranoia()

is blimmin ugly. Then you're only really left with

class safe_recursive:
def __call__(self, ...):
self(...)

which only solves it for recursive functions.

I guess this means I actually agree with your sentiment, just not the specifics.
 
S

Steven D'Aprano

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more
typing. Locals are safer, better, more desirable than globals, and so
it should be easier to use locals than globals, not the other way
around. Having to declare "give me the safe kind of variable", while
getting the harmful[1] kind of variable for free, strikes me as
arse-backwards. Lazy, naive or careless coders get globals[2] by
default or accident. That's bad.

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind. Until then, I'm pretty sure you can trivially
avoid name clashes with globals that you wish to avoid clashing with.

Accidental shadowing can be a problem, but I've never heard of anyone
saying that they were *forced* to shadow a global they needed access to.
Just pick a different name.

Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration is.
That's pointless. C++, on the other hand, lets you do this:

void somefunc() {
for (int i=0;i<10;++i) {
// do something with outer i
for (int i=0;i<4;++i) {
// do something with inner i
}
// outer i is visible again
}
// neither i is visible
}

That's truly horrible. If the cost of this "flexibility" is that I'll
have to read, and debug, other people's code with this sort of thing, I'm
happy to be less flexible. For what possible reason other than "because I
can" would you want to use the same loop variable name in two nested
loops?

I'm not suggesting that C++ should prohibit it. But just because a
language allows something doesn't make it a *feature*. I can write this
in Python:

a = a = a = a = a = 1

and it works, but the ability to do so is hardly a feature. It's just a
side effect of how Python works.

I believe that the function is the right level for scope changes, not to
large, not to small. I'm not even sure I like the fact that generator
expressions (in Python 2 & 3) and list comprehensions (in Python 3)
introduce their own scope.



[...]
Granted, this flexibility is mostly of value when writing huge functions
with complex nesting, but it is something that I find convenient.

"This feature is mostly of value when poking myself, and any developers
who have to maintain my code after I'm gone, in the eye with a red-hot
poker."




[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.
 
S

Steven D'Aprano

That said, I'm not too convinced. Personally, the proper way to do what
you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.

Probably not, but Python does support this:


for i in range(100):
for j in range(100):
func(j)
func(i) # Using i from original loop


which solves the problem of inner i overwriting outer i nicely.
 
C

Chris Angelico

Accidental shadowing can be a problem, but I've never heard of anyone
saying that they were *forced* to shadow a global they needed access to.
Just pick a different name.

Here's one example of shadowing that comes from a C++ project at work.
I have a class that represents a database transaction (constructing it
begins a transaction, it has methods for doing queries, and its
destructor rolls back). There's also a class for a subtransation (same
thing, but it uses savepoints within the transaction). So to bracket a
piece of code in a subtransaction, I want to declare a new
subtransaction object with the same name as the outer transaction
object, and then dispose of it and "reveal" the original. There will
always be an object called "trans", and it will always be the
appropriate transaction to do queries on, but it'll change what it is.

ChrisA
 
P

Peter Otten

Steven said:
Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind. Until then, I'm pretty sure you can trivially
avoid name clashes with globals that you wish to avoid clashing with.
[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

That would be 63**10**6. Or 53*63**999999 if I were to nitpick...
 
D

Dave Angel

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.
[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,207,646,748,720,415,212,786,780,258,721,683,540,870,960,267,706,738,947,655,539,422,295,787,680,882,091,181,482,626,114,653,152,637,456,091,641,990,601,474,111,018,521,295,858,424,750,289,461,372,414,431,396,326,232,796,267,104,001

variables. (The number has 180 digits)
 
I

Ian Kelly

That's truly horrible. If the cost of this "flexibility" is that I'll
have to read, and debug, other people's code with this sort of thing, I'm
happy to be less flexible. For what possible reason other than "because I
can" would you want to use the same loop variable name in two nested
loops?

It's interesting to note that while Java and C# also allow reuse of
local variable names, they do not allow local variables declared in
inner scopes to shadow variables declared in enclosing scopes, as in
the example above. But the following would be perfectly legal:

void somefunc() {
for (int i = 0; i < a.size; ++i) {
// do something with a
}
for (int i = 0; i < b.size; ++i) {
// do something with b
}
}

And the two i's are treated as completely separate variables here, as
arguably they should be since they're used for two distinct purposes.
 
L

Lele Gaifax

Dave Angel said:
Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

Uhm, if we are talking about Py2, then you should not count all the
combinations starting with a digit, while under Py3 the number explodes,
as this is valid code:
1

:)

back to easily-enumerable issues,
ciao, lele.
 
S

Steven D'Aprano

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.
[1] Based on empirical evidence that Python supports names with length
at least up to one million characters long, and assuming that each
character can be an ASCII letter, digit or underscore.
Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,207,646,748,720,415,212,786,780,258,721,683,540,870,960,267,706,738,947,655,539,422,295,787,680,882,091,181,482,626,114,653,152,637,456,091,641,990,601,474,111,018,521,295,858,424,750,289,461,372,414,431,396,326,232,796,267,104,001

variables. (The number has 180 digits)


I think that's more than 63,000,000 :)


Thanks Dave and Peter for the correction.
 
S

Steven D'Aprano

Here's one example of shadowing that comes from a C++ project at work. I
have a class that represents a database transaction (constructing it
begins a transaction, it has methods for doing queries, and its
destructor rolls back).

When the object finally gets garbage collected, doesn't that mean the
last transaction will be rolled back?
There's also a class for a subtransation (same
thing, but it uses savepoints within the transaction). So to bracket a
piece of code in a subtransaction, I want to declare a new
subtransaction object with the same name as the outer transaction
object, and then dispose of it and "reveal" the original. There will
always be an object called "trans", and it will always be the
appropriate transaction to do queries on, but it'll change what it is.

You don't need to introduce such scoping rules for variables for that use-
case. We have namespaces (classes) for that sort of thing :)

Python 3.3's ChainMap is probably a better solution, but here's another
way to get the same sort of behaviour:


def function():
class Namespace:
# Set a class attribute.
trans = Transaction()
ns = Namespace()
do_stuff_with(ns.trans)
# Enter a subtransaction.
ns.trans = Subtransaction()
do_stuff_with(ns.trans)
del ns.trans
do_stuff_with(ns.trans)


Yes, it looks weird to see ns.trans used immediately after deleting it,
but that's because it is a weird (or at least unusual) use-case.
 
R

Rotwang

Sorry to be OT, but this is sending my pedantry glands haywire:

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.
[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,[etc.]


variables. (The number has 180 digits)

That's 63**100. Note that 10**1000000 has 1000001 digits, and is
somewhat smaller than 63**1000000.

Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long. The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is

sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62


It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.
 
C

Chris Angelico

When the object finally gets garbage collected, doesn't that mean the
last transaction will be rolled back?

Oh. Uhm... ahh... it would have helped to mention that it also has a
commit() method! But yes, that's correct; if the object expires (this
is C++, so it's guaranteed to call the destructor at that close brace
- none of the Python vagueness about when __del__ is called) without
commit() being called, then the transaction will be rolled back. And
since this is PostgreSQL we use, the same applies if the process is
SIGKILLed or the power fails. If commit() doesn't happen, neither does
the transaction. (There are a few actions the program can take that
are deliberately non-transactional - log entries of various sorts,
mainly - but everything else is guarded in this way.)

ChrisA
 
P

Peter Otten

Rotwang said:
Sorry to be OT, but this is sending my pedantry glands haywire:

We are mostly pedants, too -- so this is well-deserved...
Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.
[1] Based on empirical evidence that Python supports names with length
[at
least up to one million characters long, and assuming that each
character can be an ASCII letter, digit or underscore.

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,[etc.]


variables. (The number has 180 digits)

That's 63**100. Note that 10**1000000 has 1000001 digits, and is
somewhat smaller than 63**1000000.

Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.

I think we have a winner ;)
 
J

Joshua Landau

53*(63**1000000 - 1)//62

Or about 10**10**6.255 (so about 1.80M digits long).


For the unicode side (Python 3, in other words) and reusing your math
(ya better hope it's right!), you are talking:

97812*((97812+2020)**1000000 - 1)/(97812+2020-1)

Or about 10**10**6.699

Which has about 5.00M digits.


Good luck running out.
 
S

Steven D'Aprano

]
Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.


Not in *my* code they don't!!!

*wink*

The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is

sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62


I take my hat of to you sir, or possibly madam. That is truly an inspired
piece of pedantry.

It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.

PEP 3131 describes the rules:

http://www.python.org/dev/peps/pep-3131/

For example:

py> import unicodedata as ud
py> for c in 'é極¿μЖᚃ‰⇄∞':
.... print(c, ud.name(c), c.isidentifier(), ud.category(c))
....
é LATIN SMALL LETTER E WITH ACUTE True Ll
æ LATIN SMALL LETTER AE True Ll
Â¥ YEN SIGN False Sc
µ MICRO SIGN True Ll
¿ INVERTED QUESTION MARK False Po
μ GREEK SMALL LETTER MU True Ll
Ж CYRILLIC CAPITAL LETTER ZHE True Lu
ᚃ OGHAM LETTER FEARN True Lo
‰ PER MILLE SIGN False Po
⇄ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
∞ INFINITY False Sm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top