Deep vs. shallow copy?

Alex van der Spek · Mar 12, 2014

I think I understand the difference between deep vs. shallow copies but
I was bitten by this:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row.append(year) for row in reader]

This does not work although the append does complete. The below works:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row + [year] for row in reader]

However in this context I am baffled. If someone can explain what is
going on here, I would be most grateful.

Alex van der Spek

Skip Montanaro · Mar 12, 2014

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row.append(year) for row in reader]

Forget deep v. shallow copies. What is the value of the variable year?
And why would you expect list.append to return anything?

Skip

Zachary Ware · Mar 12, 2014

I think I understand the difference between deep vs. shallow copies but
I was bitten by this:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row.append(year) for row in reader]

This does not work although the append does complete. The below works:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row + [year] for row in reader]

However in this context I am baffled. If someone can explain what is
going on here, I would be most grateful.

Deep/shallow copying doesn't really come into this. row.append()
mutates the list (row), it doesn't return a new list. Like most
in-place/mutating methods in Python, it returns None instead of self
to show that mutation was done, so your listcomp fills `data` with
Nones; there is no copying done at all. The second example works as
you expected because `row + [year]` results in a new list, which the
listcomp is happy to append to `data`--which does mean that `row` is
copied.

To avoid the copy that the second listcomp is doing (which really
shouldn't be necessary anyway, unless your rows are astronomically
huge), you have a couple of options. First, you can expand your
listcomp and use append:

with open(os.path.join('path', 'foo.txt'), 'rb') as txt: # with
your typo fixed

reader = csv.reader(txt)
data = []
for row in reader:
row.append(year)
data.append(row)

To me, that's pretty readable and pretty clear about what it's doing.
Then there's this option, which I don't recommend:

import operator
with open(os.path.join('path', 'foo.txt'), 'rb') as txt:
reader = csv.reader(txt)
data = [operator.iadd(row, [year]) for row in reader]

This works because operator.iadd is basically shorthand for
row.__iadd__([year]), which does return self (otherwise, the
assignment part of `row += [year]` couldn't work). But, it's not as
clear about what's happening, and only saves a whole two lines (maybe
3 if you already have operator imported).

Hope this helps,

Alex van der Spek · Mar 12, 2014

I think I understand the difference between deep vs. shallow copies but
I was bitten by this:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row.append(year) for row in reader]

This does not work although the append does complete. The below works:

with open(os.path.join('path', 'foo.txt', 'rb') as txt:
reader = csv.reader(txt)
data = [row + [year] for row in reader]

However in this context I am baffled. If someone can explain what is
going on here, I would be most grateful.

Click to expand...

Deep/shallow copying doesn't really come into this. row.append()
mutates the list (row), it doesn't return a new list. Like most
in-place/mutating methods in Python, it returns None instead of self to
show that mutation was done, so your listcomp fills `data` with Nones;
there is no copying done at all. The second example works as you
expected because `row + [year]` results in a new list, which the
listcomp is happy to append to `data`--which does mean that `row` is
copied.

To avoid the copy that the second listcomp is doing (which really
shouldn't be necessary anyway, unless your rows are astronomically
huge), you have a couple of options. First, you can expand your
listcomp and use append:

with open(os.path.join('path', 'foo.txt'), 'rb') as txt: # with
your typo fixed
reader = csv.reader(txt)
data = []
for row in reader:
row.append(year)
data.append(row)

To me, that's pretty readable and pretty clear about what it's doing.
Then there's this option, which I don't recommend:

import operator
with open(os.path.join('path', 'foo.txt'), 'rb') as txt:
reader = csv.reader(txt)
data = [operator.iadd(row, [year]) for row in reader]

This works because operator.iadd is basically shorthand for
row.__iadd__([year]), which does return self (otherwise, the assignment
part of `row += [year]` couldn't work). But, it's not as clear about
what's happening, and only saves a whole two lines (maybe 3 if you
already have operator imported).

Hope this helps,

Thank you, that helped immensely!

Having been taught programming in Algol60 Python still defeats me at times!
Particularly since Algol60 wasn't very long lived and what came
thereafter (FORTRAN) much worse.

I get it now, the below is equivalent!

I am perfectly happy with the one copy of the list row + [year].
Just wanted to learn something here and I have!

Python 2.6.5 (r265:79063, Feb 27 2014, 19:44:14)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

a = [1,2,3]
b = 'val'
a.append(b)
a [1, 2, 3, 'val']
c = a.append(b)
print c None

Click to expand...

Click to expand...

Steven D'Aprano · Mar 12, 2014

Having been taught programming in Algol60 Python still defeats me at
times! Particularly since Algol60 wasn't very long lived and what came
thereafter (FORTRAN) much worse.

Fortran came first. Fortran was the first high-level language which
allowed the programmer to write things that looked rather like the sorts
of mathematical expressions they were used to.

There were a few higher-level assembly languages that came before
Fortran, such as SpeedCoding, Fortran's predecessor, but Fortran was the
first truly high-level programming language, and even in 1957 it came
with an optimizing compiler.

I'm not really familiar with Algol, but I do know Pascal, and you should
think of the append method to be like a Pascal procedure. Because Python
doesn't have true procedures, it follows the convention that returning
the special object None signals the intention to return nothing at all.
Hence your example below:

Rustom Mody · Mar 13, 2014

I'm not really familiar with Algol, but I do know Pascal, and you should
think of the append method to be like a Pascal procedure. Because Python
doesn't have true procedures, it follows the convention that returning
the special object None signals the intention to return nothing at all.
Hence your example below:

Yes... Algol I dont know. But to the extent that it corresponds to
Pascal the following may help.

Pascal is remarkably clean and cleaner than what came before -- Lisp
-- and after -- C, all its derivatives, python... almost everything.

First off there are expressions and statements, and no messing with
the distinction. C started laissez-faire. No void, just default
functions intended to be used as procedures to int. This was
problematic to enough people that void was introduced soon enough.

But its still messy:
An assignment like x = 3 is both an expression and (like) a statement.
It is mostly used as a statement by appending a semicolon.
It can also be used as an expression as in y = x = 3.

Try explaining this to a first year class and you'd know what a mess it is.
For the most part if you think youve got it you probably havent.

Python is in some respects worse than C, in some better.

Its better in that assignment statements are not re-purposeable as
expressions. Its worse in that procedures are simulated by None
returning functions -- this means syntax errors which C would give
when using a void function as an expression, you wont get in python.

On the whole, whether your language supports it or not, its best to
think of the world as separated into actions and values.

Call the action-world the 'imperative' world.
Call the value-world the 'functional' world ('declarative' would be
better but 'functional' is too entrenched).

[Following table meant to be read with fixed (courier) font]

| | Imperative | Functional |
| Language entity | Statement | Expression |
| Denote (and think with) | Action | Value |
| Abstracted into | Procedure | Function |
| Atoms are | Assignment | Constant/Variable |
| Data Structures | Mutable | Immutable |
| Loop primitive | Recursion | Iteration |
| World is | In time | Timeless (Platonic) |

Generally mixing these two worlds is necessary for any non-trivial
programming but is commonly problematic.

Your error was to use a functional form -- list comprehension -- and
embed an imperative construct -- row.append -- into that. This is
always a gaffe

Legal mixing is done thus
Expression -> Statement by putting the expression on the rhs of an assignment
Statement -> Expression by putting the statement(s) into a function

While most programmers think this unproblematic, Backus' widely-cited
Turing Award lecture is devoted to showing why this is a problem
http://www.thocp.net/biographies/papers/backus_turingaward_lecture.pdf

For an old Algol/Fortran programmer it may be a worthwhile read considering
Backus invented Fortran

Ian · Mar 13, 2014

Call the action-world the 'imperative' world.
Call the value-world the 'functional' world ('declarative' would be
better but 'functional' is too entrenched).

[Following table meant to be read with fixed (courier) font]

| | Imperative | Functional |
| Language entity | Statement | Expression |
| Denote (and think with) | Action | Value |
| Abstracted into | Procedure | Function |
| Atoms are | Assignment | Constant/Variable |
| Data Structures | Mutable | Immutable |
| Loop primitive | Recursion | Iteration |
| World is | In time | Timeless (Platonic) |

Small typo I think in that the looping Primitives are switched about?

Regards

Ian

Roy Smith · Mar 13, 2014

Steven D'Aprano said:
Because Python doesn't have true procedures

What do you mean by "true procedure"? Are you just talking about
subroutines that don't return any value, i.e. fortran's SUBROUTINE vs.
FUNCTION?

Marko Rauhamaa · Mar 13, 2014

Roy Smith said:
What do you mean by "true procedure"? Are you just talking about
subroutines that don't return any value, i.e. fortran's SUBROUTINE vs.
FUNCTION?

Ah, the "no true procedure" argument:

- No true procedure returns a value.

- That's false. Python's procedures return None.

- They are not true procedures.

Marko

Rustom Mody · Mar 13, 2014

Call the action-world the 'imperative' world.
Call the value-world the 'functional' world ('declarative' would be
better but 'functional' is too entrenched).
[Following table meant to be read with fixed (courier) font]
| | Imperative | Functional |
| Language entity | Statement | Expression |
| Denote (and think with) | Action | Value |
| Abstracted into | Procedure | Function |
| Atoms are | Assignment | Constant/Variable |
| Data Structures | Mutable | Immutable |
| Loop primitive | Recursion | Iteration |
| World is | In time | Timeless (Platonic) |

Click to expand...

Small typo I think in that the looping Primitives are switched about?

Heh! I was hesitating to put that line at all: For one thing its a
hackneyed truth in the non-FP community. For another, in practical
Haskell, use of frank recursion is regarded as as sign of programming
immaturity:

http://www.willamette.edu/~fruehr/haskell/evolution.html

So I guess I ended up typing it in the wrong order!

Steven D'Aprano · Mar 13, 2014

What do you mean by "true procedure"? Are you just talking about
subroutines that don't return any value, i.e. fortran's SUBROUTINE vs.
FUNCTION?

Yes. If somebody wants to argue that the word "procedure" can mean
something else in other contexts, I won't argue, I'll just point out that
in the context of my post I had just mentioned Pascal procedures, which
don't return a value.

Steven D'Aprano · Mar 13, 2014

Ah, the "no true procedure" argument:

- No true procedure returns a value.

- That's false. Python's procedures return None.

Are you trolling again?

I'm sure that you know quite well that Python doesn't have a procedure
type. It uses a single keyword, def, for creating both functions and
functions-that-return-None.

We should all agree that functions-that-return-None are used for the same
purpose as procedures, but they are still functions, and they have a
return result, namely None. If you don't believe me, believe Python:

py> def func():
.... return 42
....
py> def proc():
.... pass
....
py> type(func)
<class 'function'>
py> type(proc)
<class 'function'>
py> repr(proc())
'None'

In languages with procedures, that last line would be an error (either at
compile-time, or run-time) since a procedure wouldn't return anything to
use as argument to repr. But I'm sure that you know that.

- They are not true procedures.

Correct. They are functions that return None, rather than a subroutine
that doesn't have any return value at all. But I'm sure you know that.

Chris Angelico · Mar 13, 2014

Are you trolling again?

I'm sure that you know quite well that Python doesn't have a procedure
type. It uses a single keyword, def, for creating both functions and
functions-that-return-None.

I'm going to troll for a moment and give you a function that has no
return value.

def procedure():
raise Exception

But seriously, this is something that some functions do when they need
to distinguish between returning something and not returning anything.
Look at a dictionary's subscripting (which is effectively a function
call):

x={1:2}
x[1] 2
x[3]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
x[3]
KeyError: 3

It can't return None to indicate "there was no such key in the
dictionary", so it raises instead. There's only one way for a Python
function to not have a return value: it has to not return.

ChrisA

Ian Kelly · Mar 14, 2014

I'm going to troll for a moment and give you a function that has no
return value.

def procedure():
raise Exception

2 0 LOAD_GLOBAL 0 (Exception)
3 RAISE_VARARGS 1
6 LOAD_CONST 0 (None)
9 RETURN_VALUE.... """Returns the return value of procedure()."""
.... return procedure.__code__.co_consts[0]
....None

Look, there it is!

Chris Angelico · Mar 14, 2014

2 0 LOAD_GLOBAL 0 (Exception)
3 RAISE_VARARGS 1
6 LOAD_CONST 0 (None)
9 RETURN_VALUE

That's a return value in the same way that exec() has a return value
[1]. If somehow the raise fails, it'll return None.

... """Returns the return value of procedure()."""
... return procedure.__code__.co_consts[0]
...None

Look, there it is!

Succeeds by coincidence. From what I can see, *every* CPython function
has const slot 0 dedicated to None. At least, I haven't been able to
do otherwise.
return x*2+1
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (2)
6 BINARY_MULTIPLY
7 LOAD_CONST 2 (1)
10 BINARY_ADD
11 RETURN_VALUE(None, 2, 1)

Your return value retriever would say it returns None still, but it doesn't.

Trollbridge: you have to pay a troll to cross.

ChrisA

[1] I'm not talking about Python's 'exec' statement, but about the
Unix exec() API, eg execlpe() - see http://linux.die.net/man/3/exec

Steven D'Aprano · Mar 14, 2014

I'm going to troll for a moment and give you a function that has no
return value.

Heh, you're not trolling. You're just trying to be pedantic. But not
pedantic enough...

def procedure():
raise Exception

This does have a return result, and it is None. It's just that the
function never reaches the return, it exits early via an exception.

py> from dis import dis
py> dis(procedure)
2 0 LOAD_GLOBAL 0 (Exception)
3 RAISE_VARARGS 1
6 LOAD_CONST 0 (None)
9 RETURN_VALUE

That *may* be able to be optimized away by a smarter compiler, or perhaps
it can't be. There may be some technical reason why code objects have to
end with a return no matter what:

py> dis(compile("func()", "", "exec"))
1 0 LOAD_NAME 0 (func)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE

Or maybe it's just an optimization that nobody has bothered with since
the benefit is so trivial. But either way, all callables (sub-routines)
in Python are functions, i.e. in principle they could, or should, return
a result, even if in practice some of them don't. There is no callable
type which lacks the ability to return a result.

Naturally they may not actually return a result if they never exit:

def this_is_a_function():
while 1:
pass
return "You'll never see this!"

or if they exit via an exception:

def also_a_function():
if 1:
raise ValueError
return "You'll never see this either!"

or if they just kill the running Python environment stone dead:

def still_a_function():
import os
os._exit(1)
return "Have I beaten this dead horse enough?"

But seriously, this is something that some functions do when they need
to distinguish between returning something and not returning anything.
Look at a dictionary's subscripting (which is effectively a function
call):

x={1:2}
x[1] 2
x[3]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
x[3]
KeyError: 3

Yes, I'm aware that Python functions can raise exceptions

It can't return None to indicate "there was no such key in the
dictionary", so it raises instead. There's only one way for a Python
function to not have a return value: it has to not return.

Exactly my point. Functions return a value, and None is a value;
procedures return, but not with a value. Python has the former, but not
the later. Instead, we make do with the convention that something which
is *intended* to be used as a procedure should return None to signal
that, and the caller should just ignore the return result.

Steven D'Aprano · Mar 14, 2014

Trollbridge: you have to pay a troll to cross.

Heh

But seriously, there is a distinction to be made between returning from a
sub-routine, and returning from a sub-routine with a return result. There
may be alternative methods of exiting the sub-routine, e.g. a GOTO or
COMEFROM that jumps outside of the function. Exceptions are a form of
safe, limited GOTO: they can only jump out of a function, not into the
middle of an arbitrary chunk of code, and they clean up the call stack
when they jump. But these alternative methods are not what people
consider *returning* from a sub-routine.

Unless they're trolling

Rustom Mody · Mar 14, 2014

Ah, the "no true procedure" argument:
- No true procedure returns a value.
- That's false. Python's procedures return None.

[/QUOTE]

Are you trolling again?

I'm sure that you know quite well that Python doesn't have a procedure
type. It uses a single keyword, def, for creating both functions and
functions-that-return-None.

We should all agree that functions-that-return-None are used for the same
purpose as procedures, but they are still functions, and they have a
return result, namely None. If you don't believe me, believe Python:

py> def func():
... return 42
...
py> def proc():
... pass
...
py> type(func)
py> type(proc)
py> repr(proc())
'None'

In languages with procedures, that last line would be an error (either at
compile-time, or run-time) since a procedure wouldn't return anything to
use as argument to repr. But I'm sure that you know that.

Correct. They are functions that return None, rather than a subroutine
that doesn't have any return value at all. But I'm sure you know that.

I believe that you, Marko (and I) are saying exactly the same thing:

Wear language-lawyer hat:
Python has no procedures -- just functions which may return None

Wear vanilla programmer hat:
The concept (Pascal) procedure is simulated by function-returning-None

Steven D'Aprano · Mar 14, 2014

I believe that you, Marko (and I) are saying exactly the same thing:

I believe that you and I are saying practically the same thing.

Wear language-lawyer hat:
Python has no procedures -- just functions which may return None

Almost. Functions (or methods) can return None as a regular value, e.g.
the re.match and re.search functions return a MatchObject if there is a
match, and None if there is not. Here, the fact the function returns None
is nothing special -- it could have return 0, or -1, or "Surprise!" if
the function author had wanted, the important thing is that you use it as
a function. Normally you call the function for it's return result, even
if that result happens to be None.

On the other hand, Python also has functions/methods which you call for
their side-effects, not for their return result. In Pascal, Fortran and
C, for example, the language provides a special type of subroutine that
can only be called for it's side-effects. It is common to call these
subroutines "procedures", and Python does not have them. We only have the
convention that if a function is intended to be called for it's side-
effects (a procedure), it should return None.

Wear vanilla programmer hat:
The concept (Pascal) procedure is simulated by function-returning-None

Yes, agreed on this one.

Mark Lawrence · Mar 14, 2014

Trollbridge: you have to pay a troll to cross.

And you mustn't be afraid of the Black Night?

Appending to dictionary of lists	2	May 3, 2011
.csv to .txt after adding columns	7	Sep 18, 2013
encoding error	1	Feb 20, 2013
CSV Issue	2	Jul 26, 2007
Newbie Question: CSV to XML	1	Jan 6, 2006
Python CSV writer confusion.	2	Sep 15, 2005
ANN main-4.4.0	0	Nov 25, 2010
[ANN] main-3.0.1	0	Oct 12, 2009

Deep vs. shallow copy?

Alex van der Spek

Skip Montanaro

Zachary Ware

Alex van der Spek

Steven D'Aprano

Rustom Mody

Ian

Roy Smith

Marko Rauhamaa

Rustom Mody

Steven D'Aprano

Steven D'Aprano

Chris Angelico

Ian Kelly

Chris Angelico

Steven D'Aprano

Steven D'Aprano

Rustom Mody

Steven D'Aprano

Mark Lawrence

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads