Python and PEP8 - Recommendations on breaking up long lines?

V

Victor Hooi

Hi,

I'm running pep8 across my code, and getting warnings about my long lines (> 80 characters).

I'm wonder what's the recommended way to handle the below cases, and fit under 80 characters.

First example - multiple context handlers:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:

and in my case, with indents, the 80-character marks is just before the ending "as output".

What's the standard recognised way to split this across multiple lines, so that I'm under 80 characters?

I can't just split after the "as input," as that isn't valid syntax, and there's no convenient parentheses for me to split over.

Is there a standard Pythonic way?

Second example - long error messages:

self.logger.error('Unable to open input or output file - %s. Please check you have sufficient permissions and the file and parent directory exist.' % e)

I can use triple quotes:

self.logger.error(
"""Unable to open input or output file - %s. Please check you
have sufficient permissions and the file and parent directory
exist.""" % e)

However, that will introduce newlines in the message, which I don't want.

I can use backslashes:

self.logger.error(
'Unable to open input or output file - %s. Please check you\
have sufficient permissions and the file and parent directory\
exist.' % e)

which won't introduce newlines.

Or I can put them all as separate strings, and trust Python to glue them together:

self.logger.error(
'Unable to open input or output file - %s. Please check you'
'have sufficient permissions and the file and parent directory'
'exist.' % e)

Which way is the recommended Pythonic way?

Third example - long comments:

""" NB - We can't use Psycopg2's parametised statements here, as
that automatically wraps everything in single quotes.
So s3://my_bucket/my_file.csv.gz would become s3://'my_bucket'/'my_file.csv.gz'.
Hence, we use Python's normal string formating - this could
potentially exposes us to SQL injection attacks via the config.yaml
file.
I'm not aware of any easy ways around this currently though - I'm
open to suggestions though.
See
http://stackoverflow.com/questions/...-with-sql-query-parameter-causes-syntax-error
for further information. """

In this case, I'm guessing a using triple quotes (""") is a better idea with multi-line comments, right?

However, I've noticed that I can't seem to put in line-breaks inside the comment without triggering a warning. For example, trying to put in another empty line in between lines 6 and 7 above causes a warning.

Also, how would I split up the long URLs? Breaking it up makes it annoying to use the URL. Thoughts?

Cheers,
Victor
 
V

Victor Hooi

Hi,

Also, forgot two other examples that are causing me grief:

cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)] for filename in filenames])

I've already broken it up using the parentheses, not sure what's the tidy way to break it up again to fit under 80? In this case, the 80-character mark is hitting me around the "for filename" towards the end.

and:

if os.path.join(root, file) not in previously_processed_files and os.path.join(root, file)[:-3] not in previously_processed_files:

In this case, the 80-character mark is actually partway through "previously processed files" (the first occurrence)...

Cheers,
Victor
 
T

Terry Reedy

On 11/27/2013 8:57 PM, Victor Hooi wrote:

[sorry if the re-wrapping mis-formats anything]
I'm running pep8

Ah yes, the module that turns PEP8 into the straightjacket it explicitly
says it is not meant to be.
across my code,

We mostly to not change existing stdlib modules to conform unless they
are being edited anyway for fixes and features.
and getting warnings about my long lines (> 80 characters).

You are free to ignore such

I'm wonder what's the recommended way to handle the below cases, and
fit under 80 characters.

First example - multiple context handlers:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab')
as output:

and in my case, with indents, the 80-character marks is just before
the ending "as output".

What's the standard recognised way to split this across multiple
lines, so that I'm under 80 characters?

I believe () does not work, so put \ after 'input,'.
I can't just split after the "as input," as that isn't valid syntax,
and there's no convenient parentheses for me to split over.

Is there a standard Pythonic way?

Second example - long error messages:

self.logger.error('Unable to open input or output file - %s. Please
check you have sufficient permissions and the file and parent
directory exist.' % e)

I can use triple quotes:

self.logger.error( """Unable to open input or output file - %s.
Please check you have sufficient permissions and the file and parent
directory exist.""" % e)

However, that will introduce newlines in the message, which I don't
want.

I can use backslashes:

self.logger.error( 'Unable to open input or output file - %s. Please
check you\ have sufficient permissions and the file and parent
directory\ exist.' % e)

which won't introduce newlines.

Or I can put them all as separate strings, and trust Python to glue
them together:

Such gluing in a documented feature and I think you can trust it to not
disappear (for at least a decade ;-).
self.logger.error( 'Unable to open input or output file - %s. Please
check you' 'have sufficient permissions and the file and parent
directory' 'exist.' % e)

Which way is the recommended Pythonic way?

Third example - long comments:

""" NB - We can't use Psycopg2's parametised statements here, as that
automatically wraps everything in single quotes. So
s3://my_bucket/my_file.csv.gz would become
s3://'my_bucket'/'my_file.csv.gz'. Hence, we use Python's normal
string formating - this could potentially exposes us to SQL injection
attacks via the config.yaml file. I'm not aware of any easy ways
around this currently though - I'm open to suggestions though. See
http://stackoverflow.com/questions/...-with-sql-query-parameter-causes-syntax-error

for further information. """

In this case, I'm guessing a using triple quotes (""") is a better
idea with multi-line comments, right?

Either are ok.
However, I've noticed that I can't seem to put in line-breaks inside
the comment without triggering a warning. For example, trying to put
in another empty line in between lines 6 and 7 above causes a
warning.

Blank lines are normal in triple-quoted strings and recommended for
docstrings longer than a line. So I am puzzled as to your problem.
Also, how would I split up the long URLs?

Don't if you can possibly avoid it.
Breaking it up makes it annoying to use the URL.
Yep.

Thoughts?

Some urls, especially long ones, have extra tracking/query junk that can
be removed while still pointing to the page.
 
N

Ned Batchelder

Hi,

I'm running pep8 across my code, and getting warnings about my long lines (> 80 characters).

I'm wonder what's the recommended way to handle the below cases, and fit under 80 characters.

My recommendations usually amount to: write more statements, each
shorter than what you have.
First example - multiple context handlers:

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:

and in my case, with indents, the 80-character marks is just before the ending "as output".

What's the standard recognised way to split this across multiple lines, so that I'm under 80 characters?

I can't just split after the "as input," as that isn't valid syntax, and there's no convenient parentheses for me to split over.

Is there a standard Pythonic way?

The important thing in a with statement is that the assigned name will
be closed (or otherwise exited) automatically. The open call is just
the expression used to assign the name. The expression there isn't
really important. This looks odd, but works the same as what you have:

input = open(self.full_path)
output = open(self.output_csv, 'ab')
with input as input, output as output:
...

(Use different names for the two parts of the "as" clauses if you like.)
Second example - long error messages:

self.logger.error('Unable to open input or output file - %s. Please check you have sufficient permissions and the file and parent directory exist.' % e)

I can use triple quotes:

self.logger.error(
"""Unable to open input or output file - %s. Please check you
have sufficient permissions and the file and parent directory
exist.""" % e)

However, that will introduce newlines in the message, which I don't want.

I can use backslashes:

self.logger.error(
'Unable to open input or output file - %s. Please check you\
have sufficient permissions and the file and parent directory\
exist.' % e)

which won't introduce newlines.

Or I can put them all as separate strings, and trust Python to glue them together:

self.logger.error(
'Unable to open input or output file - %s. Please check you'
'have sufficient permissions and the file and parent directory'
'exist.' % e)

Which way is the recommended Pythonic way?

Use the separate strings, but don't forget the spaces:

self.logger.error(
"Unable to open input or output file - %s. Please check you "
"have sufficient permissions and the file and parent directory "
"exist." % e
)
Third example - long comments:

""" NB - We can't use Psycopg2's parametised statements here, as
that automatically wraps everything in single quotes.
So s3://my_bucket/my_file.csv.gz would become s3://'my_bucket'/'my_file.csv.gz'.
Hence, we use Python's normal string formating - this could
potentially exposes us to SQL injection attacks via the config.yaml
file.
I'm not aware of any easy ways around this currently though - I'm
open to suggestions though.
See
http://stackoverflow.com/questions/...-with-sql-query-parameter-causes-syntax-error
for further information. """

In this case, I'm guessing a using triple quotes (""") is a better idea with multi-line comments, right?

As Ben pointed out, using an actual comment is best.
However, I've noticed that I can't seem to put in line-breaks inside the comment without triggering a warning. For example, trying to put in another empty line in between lines 6 and 7 above causes a warning.

I don't know what you mean about line-breaks causing warnings.

--Ned.
 
S

Steven D'Aprano

That's not syntactically a comment, and I don't think pretending
triple-quoted strings are comments is good practice. If nothing else,
you'll need a special case if you want to enclose something with
existing triple-quotes.

The CPython core devs disagree with you. Using bare strings for comments
is supported by Python, and the compiler strips them out at compile-time:


steve@runes:~$ python3.3
Python 3.3.0rc3 (default, Sep 27 2012, 18:31:58)
[GCC 4.4.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
py> import dis
py> dis.dis('x = 1; """Comment"""; y = 1')
1 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (x)
6 LOAD_CONST 0 (1)
9 STORE_NAME 1 (y)
12 LOAD_CONST 1 (None)
15 RETURN_VALUE
py>


I'm on the fence on this one. I can it's useful; but it's also
inconsistent with docstrings. And I wonder why other "dead objects"
aren't also dropped:

py> dis.dis('x = 1; {23: "a", "b": 42}; y = 1')
1 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (x)
6 BUILD_MAP 2
9 LOAD_CONST 1 ('a')
12 LOAD_CONST 2 (23)
15 STORE_MAP
16 LOAD_CONST 3 (42)
19 LOAD_CONST 4 ('b')
22 STORE_MAP
23 POP_TOP
24 LOAD_CONST 0 (1)
27 STORE_NAME 1 (y)
30 LOAD_CONST 5 (None)
33 RETURN_VALUE
 
T

Terry Reedy

Hi,

Also, forgot two other examples that are causing me grief:

cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)] for filename in filenames])

I've already broken it up using the parentheses, not sure what's the tidy way to break it up again to fit under 80? In this case, the 80-character mark is hitting me around the "for filename" towards the end.

add another linebreak, you are still inside the parens.
and:

if os.path.join(root, file) not in previously_processed_files and os.path.join(root, file)[:-3] not in previously_processed_files:

In this case, the 80-character mark is actually partway through "previously processed files" (the first occurrence)...

You can add parens. In this case, factoring out the common subexpression
and replaced the repeated long name does the job.
p = os.path.join(root, file)
ppf = previously_processed_files
if p not in ppf and p[:-3] not in ppf:

[snip previous post]
 
N

Ned Batchelder

Hi,

Also, forgot two other examples that are causing me grief:

cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)] for filename in filenames])

I've already broken it up using the parentheses, not sure what's the tidy way to break it up again to fit under 80? In this case, the 80-character mark is hitting me around the "for filename" towards the end.

file_values = [
(os.path.relpath(filename, foobar_input_folder),)
for filename in filenames
]
cur.executemany(
"INSERT INTO foobar_foobar_files VALUES (?)",
file_values
)
and:

if os.path.join(root, file) not in previously_processed_files and os.path.join(root, file)[:-3] not in previously_processed_files:

In this case, the 80-character mark is actually partway through "previously processed files" (the first occurrence)...

full_file = os.path.join(root, file)
full_not_in = full_file not in previously_processed_files
tail_not_in = full_file[:-3] not in previously_processed_files
if full_not_in and tail_not_in:
...

This has the advantage of naming these complex intermediate conditions,
and giving you natural places to write comments explaining them. Why
[:-3] for example?

More, shorter, statements.

--Ned.
 
N

Neil Cerutti

with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:

There are two nice and clean solutions for this.

I usually nest them.

with open(self.full_path, 'r') as input:
with open(self.ouptut_csv, 'ab') as output:

I use a non-standard (shortened) indentation for nested with
statements, to save space. I don't feel bad about this at all; the
with statement's syntax makes me do it!

Otherwise, contextlib.ExitStack is another way to separate them.

with contextlib.ExitStack() as stack:
input = stack.enter_context(open(self.full_path, 'r'))
writer = csv.writer(stack.enter_context(open(self.output_csv)))

When working with a csv file I like how it removes the output
temporary file object variable, though if you needed it for some
reason you could keep it.
 
E

Ethan Furman

The important thing in a with statement is that the assigned name will be closed (or otherwise exited) automatically.
The open call is just the expression used to assign the name. The expression there isn't really important. This looks
odd, but works the same as what you have:

input = open(self.full_path)
output = open(self.output_csv, 'ab')
with input as input, output as output:
...

(Use different names for the two parts of the "as" clauses if you like.)

Or skip the `as` clauses all together:

input = ...
output = ...
with input, output:
...

works just fine. (At least on 2.7 where I tested it. ;)
 
S

Steven D'Aprano

Hi,

I'm running pep8 across my code, and getting warnings about my long
lines (> 80 characters).

I'm wonder what's the recommended way to handle the below cases, and fit
under 80 characters.

First example - multiple context handlers:

with open(self.full_path, 'r') as input,
open(self.output_csv, 'ab') as output:


if True: # add indentation, just for the example's sake.
if True:
if True:
with (open(self.full_path, 'r') as input,
open(self.output_csv, 'ab') as output):
do_this()
do_that()


Second example - long error messages:

self.logger.error('Unable to open input or output file - %s.
Please check you have sufficient permissions and the file
and parent directory exist.' % e)

Long strings are always ugly :-(

But you can use implicit concatenation to make them a little less so.



if True: # add indentation, just for the example's sake.
if True:
if True:
self.logger.error(
'Unable to open input or output file - %s.'
' Please check you have sufficient permissions and
' the file and parent directory exist.'
% e)

Notice that my convention is to use a leading space in the strings when
doing implicit concatenation, if possible. E.g.:

("Hello"
" World!")

rather than:

("Hello "
"World!")

Third example - long comments:

""" NB - We can't use Psycopg2's parametised statements
here, as that automatically wraps everything in single
quotes. So s3://my_bucket/my_file.csv.gz would become
s3://'my_bucket'/'my_file.csv.gz'. Hence, we use Python's
normal string formating - this could potentially exposes us
to SQL injection attacks via the config.yaml file.
I'm not aware of any easy ways around this currently though
- I'm open to suggestions though.
See
http://stackoverflow.com/questions/9354392/psycopg2-cursor- execute-with-sql-query-parameter-causes-syntax-error
for further information. """

In this case, I'm guessing a using triple quotes (""") is a better idea
with multi-line comments, right?

*shrug* That's a matter of personal preference.

However, I've noticed that I can't seem to put in line-breaks inside the
comment without triggering a warning. For example, trying to put in
another empty line in between lines 6 and 7 above causes a warning.

Also, how would I split up the long URLs? Breaking it up makes it
annoying to use the URL. Thoughts?

I hate long URLs.

There's no good solution to long URLs. Just leave them as they come, and
if your style checker allows you to flag an exception to the long-line
rules, use it.
 
S

Steven D'Aprano

The important thing in a with statement is that the assigned name will
be closed (or otherwise exited) automatically. The open call is just
the expression used to assign the name. The expression there isn't
really important. This looks odd, but works the same as what you have:

input = open(self.full_path)
output = open(self.output_csv, 'ab')
with input as input, output as output:
...

(Use different names for the two parts of the "as" clauses if you like.)


That's really clever! Why didn't I think of that?
 
S

Steven D'Aprano

if True: # add indentation, just for the example's sake.
if True:
if True:
with (open(self.full_path, 'r') as input,
open(self.output_csv, 'ab') as output):
do_this()
do_that()


Bah! Apparently you can't do that.

py> with (open(self.full_path, 'r') as input,
File "<stdin>", line 1
with (open(self.full_path, 'r') as input,
^
SyntaxError: invalid syntax
 
J

Jussi Piitulainen

Victor Hooi said:
Hi,

Also, forgot two other examples that are causing me grief:

cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)]
for filename in filenames])
I've already broken it up using the parentheses, not sure what's the
tidy way to break it up again to fit under 80? In this case, the
80-character mark is hitting me around the "for filename" towards
the end.

That's a natural break. I did it to your code above. Another is the
condition, if it's there:

cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)]
for filename in filenames
if not filename.startswith('tmp')])

There's much freedom of indentation inside the brackets, but these
points are natural.

Put the long expression in parentheses and you are again free to break
and indent; I tend to have extra spaces inside these parentheses, but
I have no idea about any standards:

if ( os.path.join(root, file)
not in previously_processed_files and
os.path.join(root, file)[:-3]
not in previously_processed_files ):
In this case, the 80-character mark is actually partway through
"previously processed files" (the first occurrence)...

Try to find natural breaks, between phrases, and maybe highlight
operators by putting them in the beginning of a line like.

Or even this:
pre = previously_processed_files
if ( os.path.join(root, file) not in pre and
os.path.join(root, file)[:-3] not in pre ):
 
T

Tim Chase

That's really clever! Why didn't I think of that?

Because if the 2nd output fails, the input doesn't get closed as
it {w,sh}ould in a with-statement? :)

You could work around this with:

from functools import partial
in_ = partial(open, self.full_path)
out_ = partial(open, self.output_csv, 'ab')
with in_() as input, out_() as output:
do_stuff()

There's still room for programmer error if you don't have an
"output_csv" property on self and you get an AttributeError, but
that's more of a "dumb programmer error" rather than an actual
runtime exception.

-tkc
 
C

Chris Angelico

You could work around this with:

from functools import partial
in_ = partial(open, self.full_path)
out_ = partial(open, self.output_csv, 'ab')
with in_() as input, out_() as output:
do_stuff()

Yeah, I think I'd prefer a backslash continuation to that!

ChrisA
 
R

Roy Smith

Victor Hooi said:
cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[[os.path.relpath(filename, foobar_input_folder)] for
filename in filenames])

I don't often write raw SQL embedded in Python. I'm much more likely to
use some sort of ORM layer. But, if I were doing this, I would break it
up something like:



There's a few different strategies I employed there. My first thought
was a logical break of computing the list of pathnames vs. inserting
them into the database. That got me here:

paths = [[os.path.relpath(filename, foobar_input_folder)] \
for filename in filenames]
cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
paths)

I wouldn't have actually broken the first line with the backslash, but
I'm doing that to appease my news posting software.

My next step would be some simple textual changes; I'd get rid of the
overly-line variable names. In general, I don't like very short
variable names, but I'm OK with them as long as the scope is very small,
as it is here:

paths = [[os.path.relpath(fn, folder)] for fn in filenames]
cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
paths)

I'd probably factor out the double lookup of os.path.relpath. I think
this is easier to read:

relpath = os.path.relpath
paths = [[relpath(fn, folder)] for fn in filenames]
cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
paths)

If filenames was a very long list, it would also be a little bit faster
to execute, but that's such a small factor as to probably be
unmeasurable.

And, finally, I'd probably move one set of square brackets down into the
SQL statement. It really makes more sense there anyway; the bundling up
of the arguments into a sequence is more a part of the database API than
it is inherent to the data.

relpath = os.path.relpath
paths = [relpath(fn, foobar_input_folder) for fn in filenames]
cur.executemany("INSERT INTO foobar_foobar_files VALUES (?)",
[paths])
 
N

Ned Batchelder

Because if the 2nd output fails, the input doesn't get closed as
it {w,sh}ould in a with-statement? :)

D'oh, sorry. A doubly-nested with statement?

input = open(self.full_path)
with input:
output = open(self.output_csv, 'ab')
with output:
...

Of course, if you were willing to use a doubly-nested with statement,
then the original open-in-the-with form would have fit in 80 chars...
Also, I have a sneaking suspicion there's still an error case not
handled well, someone will point it out if so!

--Ned.
 
R

Roel Schroeven

Victor Hooi schreef:
with open(self.full_path, 'r') as input, open(self.output_csv, 'ab') as output:

I think I would write that as

with open(self.full_path, 'r') as input:
with open(self.output_csv, 'ab') as output:

That introduces an extra indentation level, but that doesn't really
bother me. Does this seem like a good idea?


Best regards,
Roel
 
N

Nick Mellor

Hi Victor,

I use PyCharm which is set up by default to warn when line length exceeds 120 chars, not 80. Perhaps times have changed?

I often break comprehensions at the for, in and else clauses. It's often not for line length reasons but because it's easier to follow the code that way. I have heard this is how Haskell programmers tend to use comprehensions(comprehensions are from Haskell originally):

location=random.choice([loc['pk']
for loc
in locations.whole_register()
if loc['fields']['provider_id'] == provider_id])))

The other suggestion I have is to put the with clauses in a generator function. This saves you a level or more of indentation and modularises the codeusefully. Here's an example:

def spreadsheet(csv_filename):
with open(csv_filename) as csv_file:
for csv_row in list(csv.DictReader(csv_file, delimiter='\t')):
yield csv_row

then invoked using:

for row in spreadsheet("...")
# your processing code here

Cheers,

Nick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top