Default scope of variables

Steven D'Aprano · Jul 3, 2013

Recently, there was a thread where people discussed variable
declarations, with a couple people stating that they wished that Python
required you to declare local variables, instead of globals.

I'm sure they have their (foolish, pathetic) *wink* reasons for this, but
I thought I'd explain why I think Python makes the right decision to
require declarations for globals but not locals.

I was reading some Javascript today, and wading through masses of code
that looked something like this:

function update_results() {
var n1 = foo;
var n2 = bar;
var m1 = foobar;
var m2 = foobarbaz;
var fe = n1 * m1;
var fi = n2 * m2;
var fo = n1 * m2;
var fum = n2 * m1;
...
}

and so on for about a page and a half. Looking at that page in my editor,
with that solid column of bold "var" keywords, it struck me just how
redundant that enormous column of "var"s was. Of course they were
variables, what else would they be?

Larry Wall, the creator of Perl, is fond of discussing Huffman coding as
it relates to programming syntax. Common things should be short, rare
things can be longer. Wall is not concerned about saving a few bytes in
your source files, but about programmer effort: typing effort, reading
effort, mental processing effort. Which do you prefer?

total = subtotal + extra

set total to subtotal plus extra

Even though the second is only 8 more characters, I bet you prefer the
first version.

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

Not just theoretically bad. Here's a real-world case where a single
missed "var" lead to something much, much worse than a crash: code that
kept going, but doing the wrong thing.

http://blog.safeshepherd.com/23/how-one-missing-var-ruined-our-launch/

The two situations:

1) Accidentally scope an intended local as global;
2) Accidentally scope an intended global as local;

are not symmetrical. In the first case, you get multiple invocations of
your function overwriting each other's data. Confusion reigns, but the
function calls will likely continue, pumping out garbage results instead
of crashing. The likely result is typically fail-unsafe rather than fail-
safe. [Aside: fail-safe does not mean "safe from failing", but "fails in
a safe manner".]

In the second case, any failure is far more likely to result in the
function call failing hard with an exception (fail-safe) rather than
churning out bad results, since each call of the function gets its own
set of locals rather than using those from some other call.

So in Javascript, it's easy to get unsafe globals by accident; in Python,
it's hard to get unsafe globals by accident. In my opinion, Python gets
it right.

[1] As in, "Global variables considered harmful", one of the classic
papers of computer science:
http://c2.com/cgi/wiki?GlobalVariablesConsideredHarmful

[2] Actually, Javascript gives you something a little closer to Python's
"nonlocal" by default: each enclosing function is searched for a matching
variable, terminating at the global scope.

Chris Angelico · Jul 4, 2013

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.
Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration
is. That's pointless. C++, on the other hand, lets you do this:

void somefunc() {
for (int i=0;i<10;++i) {
// do something with outer i
for (int i=0;i<4;++i) {
// do something with inner i
}
// outer i is visible again
}
// neither i is visible
}

Also, C++ overlays the "this is local" declaration with the "this
contains this type" declaration, which neither Python nor Javascript
bothers with; that makes the declaration feel less redundant.

Granted, this flexibility is mostly of value when writing huge
functions with complex nesting, but it is something that I find
convenient.

In terms of Huffman coding, every C++ variable must be declared,
somewhere. It's not a matter of declaring globals or declaring locals
- you just declare variables. If you declare them at file scope,
they're globals; if at function scope, they're locals. There's really
no difference. Everything's visible at its own level and those further
in, and not those further out.

I do see the convenience of the Python system, and I do like it; but
someone needs to speak up for the foolish and pathetic

ChrisA

Joshua Landau · Jul 4, 2013

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more typing.
Locals are safer, better, more desirable than globals, and so it should
be easier to use locals than globals, not the other way around. Having to
declare "give me the safe kind of variable", while getting the harmful[1]
kind of variable for free, strikes me as arse-backwards. Lazy, naive or
careless coders get globals[2] by default or accident. That's bad.

Click to expand...

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.
Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration
is.

Coffeescript, which compiles to Javascript, "fixes" the problem Steven
brought up by automatically declaring variables so that you don't have
to. But what do you think this does?:

a = 1
func = ->
a = 2
b = 2

The "a" in "func" is global, the "b" is local. And Coffeescript
*doesn't let* you shadow even if you explicitly want to. There just
isn't syntax for it.

That said, I'm not too convinced. Personally, the proper way to do
what you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.

Chris Angelico · Jul 4, 2013

That said, I'm not too convinced. Personally, the proper way to do
what you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.

def foo():
for i in range(3):
print("outer",i)
def inner():
for i in range(4):
print("inner",i)
inner()
print("outer",i)

That works, but you then have to declare all your nonlocals, and it
hardly reads well.

ChrisA

Joshua Landau · Jul 4, 2013

def foo():
for i in range(3):
print("outer",i)
def inner():
for i in range(4):
print("inner",i)
inner()
print("outer",i)

That works, but you then have to declare all your nonlocals, and it
hardly reads well.

Unfortunately that's what people, I included, end up doing. Stuff like:

def paranoia(...):
def safe_recursive(...):
safe_recursive(...)
return safe_recursive
safe_recursive = paranoia()

is blimmin ugly. Then you're only really left with

class safe_recursive:
def __call__(self, ...):
self(...)

which only solves it for recursive functions.

I guess this means I actually agree with your sentiment, just not the specifics.

Steven D'Aprano · Jul 4, 2013

With respect to the Huffman coding of declarations, Javascript gets it
backwards. Locals ought to be more common, but they require more
typing. Locals are safer, better, more desirable than globals, and so
it should be easier to use locals than globals, not the other way
around. Having to declare "give me the safe kind of variable", while
getting the harmful[1] kind of variable for free, strikes me as
arse-backwards. Lazy, naive or careless coders get globals[2] by
default or accident. That's bad.

Click to expand...

I agree that Javascript has it wrong, but not quite for the reason you
say. The advantage of declaring locals is a flexibility: you can have
multiple unique variables with the same name in the same function.

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind. Until then, I'm pretty sure you can trivially
avoid name clashes with globals that you wish to avoid clashing with.

Accidental shadowing can be a problem, but I've never heard of anyone
saying that they were *forced* to shadow a global they needed access to.
Just pick a different name.

Python lets you do that across but not within functions.

But Javascript/ECMAScript/whatever doesn't give you that. A var
declaration makes it function-local, no matter where the declaration is.
That's pointless. C++, on the other hand, lets you do this:

void somefunc() {
for (int i=0;i<10;++i) {
// do something with outer i
for (int i=0;i<4;++i) {
// do something with inner i
}
// outer i is visible again
}
// neither i is visible
}

That's truly horrible. If the cost of this "flexibility" is that I'll
have to read, and debug, other people's code with this sort of thing, I'm
happy to be less flexible. For what possible reason other than "because I
can" would you want to use the same loop variable name in two nested
loops?

I'm not suggesting that C++ should prohibit it. But just because a
language allows something doesn't make it a *feature*. I can write this
in Python:

a = a = a = a = a = 1

and it works, but the ability to do so is hardly a feature. It's just a
side effect of how Python works.

I believe that the function is the right level for scope changes, not to
large, not to small. I'm not even sure I like the fact that generator
expressions (in Python 2 & 3) and list comprehensions (in Python 3)
introduce their own scope.

[...]

Granted, this flexibility is mostly of value when writing huge functions
with complex nesting, but it is something that I find convenient.

"This feature is mostly of value when poking myself, and any developers
who have to maintain my code after I'm gone, in the eye with a red-hot
poker."

[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

Steven D'Aprano · Jul 4, 2013

That said, I'm not too convinced. Personally, the proper way to do what
you are talking about is creating a new closure. Like:

for i in range(100):
with new_scope():
for i in range(100):
func(i)
func(i) # Using i from original loop

But it's not like Python'll ever support that.

Probably not, but Python does support this:

for i in range(100):
for j in range(100):
func(j)
func(i) # Using i from original loop

which solves the problem of inner i overwriting outer i nicely.

Chris Angelico · Jul 4, 2013

Accidental shadowing can be a problem, but I've never heard of anyone
saying that they were *forced* to shadow a global they needed access to.
Just pick a different name.

Here's one example of shadowing that comes from a C++ project at work.
I have a class that represents a database transaction (constructing it
begins a transaction, it has methods for doing queries, and its
destructor rolls back). There's also a class for a subtransation (same
thing, but it uses savepoints within the transaction). So to bracket a
piece of code in a subtransaction, I want to declare a new
subtransaction object with the same name as the outer transaction
object, and then dispose of it and "reveal" the original. There will
always be an object called "trans", and it will always be the
appropriate transaction to do queries on, but it'll change what it is.

ChrisA

Peter Otten · Jul 4, 2013

Steven said:
Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind. Until then, I'm pretty sure you can trivially
avoid name clashes with globals that you wish to avoid clashing with.

[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

That would be 63**10**6. Or 53*63**999999 if I were to nitpick...

Dave Angel · Jul 4, 2013

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.

[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,207,646,748,720,415,212,786,780,258,721,683,540,870,960,267,706,738,947,655,539,422,295,787,680,882,091,181,482,626,114,653,152,637,456,091,641,990,601,474,111,018,521,295,858,424,750,289,461,372,414,431,396,326,232,796,267,104,001

variables. (The number has 180 digits)

Ian Kelly · Jul 4, 2013

That's truly horrible. If the cost of this "flexibility" is that I'll
have to read, and debug, other people's code with this sort of thing, I'm
happy to be less flexible. For what possible reason other than "because I
can" would you want to use the same loop variable name in two nested
loops?

It's interesting to note that while Java and C# also allow reuse of
local variable names, they do not allow local variables declared in
inner scopes to shadow variables declared in enclosing scopes, as in
the example above. But the following would be perfectly legal:

void somefunc() {
for (int i = 0; i < a.size; ++i) {
// do something with a
}
for (int i = 0; i < b.size; ++i) {
// do something with b
}
}

And the two i's are treated as completely separate variables here, as
arguably they should be since they're used for two distinct purposes.

Lele Gaifax · Jul 4, 2013

Dave Angel said:
Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

Uhm, if we are talking about Py2, then you should not count all the
combinations starting with a digit, while under Py3 the number explodes,
as this is valid code:
1

back to easily-enumerable issues,
ciao, lele.

Wayne Werner · Jul 4, 2013

[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

The specification *does* state unlimited length:

http://docs.python.org/release/2.5.2/ref/identifiers.html

Though practicality beats purity.

-W

Steven D'Aprano · Jul 4, 2013

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.

Click to expand...

[1] Based on empirical evidence that Python supports names with length
at least up to one million characters long, and assuming that each
character can be an ASCII letter, digit or underscore.

Click to expand...

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,207,646,748,720,415,212,786,780,258,721,683,540,870,960,267,706,738,947,655,539,422,295,787,680,882,091,181,482,626,114,653,152,637,456,091,641,990,601,474,111,018,521,295,858,424,750,289,461,372,414,431,396,326,232,796,267,104,001

variables. (The number has 180 digits)

I think that's more than 63,000,000

Thanks Dave and Peter for the correction.

Steven D'Aprano · Jul 4, 2013

Here's one example of shadowing that comes from a C++ project at work. I
have a class that represents a database transaction (constructing it
begins a transaction, it has methods for doing queries, and its
destructor rolls back).

When the object finally gets garbage collected, doesn't that mean the
last transaction will be rolled back?

There's also a class for a subtransation (same
thing, but it uses savepoints within the transaction). So to bracket a
piece of code in a subtransaction, I want to declare a new
subtransaction object with the same name as the outer transaction
object, and then dispose of it and "reveal" the original. There will
always be an object called "trans", and it will always be the
appropriate transaction to do queries on, but it'll change what it is.

You don't need to introduce such scoping rules for variables for that use-
case. We have namespaces (classes) for that sort of thing

Python 3.3's ChainMap is probably a better solution, but here's another
way to get the same sort of behaviour:

def function():
class Namespace:
# Set a class attribute.
trans = Transaction()
ns = Namespace()
do_stuff_with(ns.trans)
# Enter a subtransaction.
ns.trans = Subtransaction()
do_stuff_with(ns.trans)
del ns.trans
do_stuff_with(ns.trans)

Yes, it looks weird to see ns.trans used immediately after deleting it,
but that's because it is a weird (or at least unusual) use-case.

Rotwang · Jul 4, 2013

Sorry to be OT, but this is sending my pedantry glands haywire:

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.

Click to expand...

[1] Based on empirical evidence that Python supports names with length at
least up to one million characters long, and assuming that each character
can be an ASCII letter, digit or underscore.

Click to expand...

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,[etc.]

variables. (The number has 180 digits)

That's 63**100. Note that 10**1000000 has 1000001 digits, and is
somewhat smaller than 63**1000000.

Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long. The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is

sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62

It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.

Chris Angelico · Jul 4, 2013

When the object finally gets garbage collected, doesn't that mean the
last transaction will be rolled back?

Oh. Uhm... ahh... it would have helped to mention that it also has a
commit() method! But yes, that's correct; if the object expires (this
is C++, so it's guaranteed to call the destructor at that close brace
- none of the Python vagueness about when __del__ is called) without
commit() being called, then the transaction will be rolled back. And
since this is PostgreSQL we use, the same applies if the process is
SIGKILLed or the power fails. If commit() doesn't happen, neither does
the transaction. (There are a few actions the program can take that
are deliberately non-transactional - log entries of various sorts,
mainly - but everything else is guarded in this way.)

ChrisA

Peter Otten · Jul 4, 2013

Rotwang said:
Sorry to be OT, but this is sending my pedantry glands haywire:

We are mostly pedants, too -- so this is well-deserved...

Well, if I ever have more than 63,000,000 variables[1] in a function,
I'll keep that in mind.

Click to expand...

[1] Based on empirical evidence that Python supports names with length
[at
least up to one million characters long, and assuming that each
character can be an ASCII letter, digit or underscore.

Click to expand...

Well, the number wouldn't be 63,000,000. Rather it'd be 63**1000000

I probably have it wrong, but I think that looks like:

859,122,[etc.]

variables. (The number has 180 digits)

Click to expand...

That's 63**100. Note that 10**1000000 has 1000001 digits, and is
somewhat smaller than 63**1000000.

Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.

I think we have a winner

Joshua Landau · Jul 4, 2013

53*(63**1000000 - 1)//62

Or about 10**10**6.255 (so about 1.80M digits long).

For the unicode side (Python 3, in other words) and reusing your math
(ya better hope it's right!), you are talking:

97812*((97812+2020)**1000000 - 1)/(97812+2020-1)

Or about 10**10**6.699

Which has about 5.00M digits.

Good luck running out.

Steven D'Aprano · Jul 4, 2013

]

Anyway, none of the calculations that has been given takes into account
the fact that names can be /less/ than one million characters long.

Not in *my* code they don't!!!

*wink*

The
actual number of non-empty strings of length at most 1000000 characters,
that consist only of ascii letters, digits or underscores, and that
don't start with a digit, is

sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62

I take my hat of to you sir, or possibly madam. That is truly an inspired
piece of pedantry.

It's perhaps worth mentioning that some non-ascii characters are allowed
in identifiers in Python 3, though I don't know which ones.

PEP 3131 describes the rules:

http://www.python.org/dev/peps/pep-3131/

For example:

py> import unicodedata as ud
py> for c in 'Ã©Ã¦Â¥ÂµÂ¿Î¼Ð–ášƒâ€°â‡„âˆž':
.... print(c, ud.name(c), c.isidentifier(), ud.category(c))
....
Ã© LATIN SMALL LETTER E WITH ACUTE True Ll
Ã¦ LATIN SMALL LETTER AE True Ll
Â¥ YEN SIGN False Sc
Âµ MICRO SIGN True Ll
Â¿ INVERTED QUESTION MARK False Po
Î¼ GREEK SMALL LETTER MU True Ll
Ð– CYRILLIC CAPITAL LETTER ZHE True Lu
ášƒ OGHAM LETTER FEARN True Lo
â€° PER MILLE SIGN False Po
â‡„ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
âˆž INFINITY False Sm

C Python: Running Python code within function scope	1	Sep 4, 2012
scope of function parameters	21	May 29, 2011
PyMyth: Global variables are evil... WRONG!	20	Nov 11, 2013
variable scope	2	Jul 5, 2011
Translater + module + tkinter	1	Feb 16, 2023
Final chapter of "Learn PHP, MySQL and JavaScript"	3	Jun 4, 2024
Scope	8	Jun 3, 2005
trouble with nested closures: one of my variables is missing...	0	Oct 13, 2012

Default scope of variables

Steven D'Aprano

Chris Angelico

Joshua Landau

Chris Angelico

Joshua Landau

Steven D'Aprano

Steven D'Aprano

Chris Angelico

Peter Otten

Dave Angel

Ian Kelly

Lele Gaifax

Wayne Werner

Steven D'Aprano

Steven D'Aprano

Rotwang

Chris Angelico

Peter Otten

Joshua Landau

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads