In defence of 80-char lines

  • Thread starter Steven D'Aprano
  • Start date
A

Andrew Berg

While I agree that not having a line take up hundreds of characters is a
good thing, 80 is really arbitrary in 2013 and having any self-imposed
hard limit is silly. When you put a single 4- or 5-character word on a
new line because you don't want to go over 80 (or 120 or whatever), the
code is /less/ readable. A better guideline is to make new lines as
necessary to make things more readable rather than blindly stick to some
hard limit and say it's more readable just because.

Also, IMO, 80 is far too limiting and I find 120-130 much better. Then
again, I like small font sizes and avoid lower resolution screens like
the plague.
 
M

Mitya Sirenef

While I agree that not having a line take up hundreds of characters is a
good thing, 80 is really arbitrary in 2013 and having any self-imposed
hard limit is silly. When you put a single 4- or 5-character word on a
new line because you don't want to go over 80 (or 120 or whatever), the
code is /less/ readable. A better guideline is to make new lines as
necessary to make things more readable rather than blindly stick to some
hard limit and say it's more readable just because.

Also, IMO, 80 is far too limiting and I find 120-130 much better. Then
again, I like small font sizes and avoid lower resolution screens like
the plague.


I have to agree. To some degree, it's a matter of taste: for me, 80c
limit looks ugly to the extreme, at least in Django; but 140+ looks even
uglier, and the longer line is, the uglier it looks. The optimal size
for Django code is a 105 char soft limit -- by soft limit I mean that
under 105 it's always one line, 105-110 I decide on a case-by-case basis
and over 110 is always split.

So my preference is: 105 > 120-130 > 140 > 80 > 140+

The trade-off is that on one hand, the code is more readable when a
single line is a single "operation", from a cognitive standpoint, when
you're thinking about the logic of the function as a whole, or a subset
of a function if it's too long (which it shouldn't be, right?) On the
other hand, even if your monitor is wide, you probably still want to fit
in the browser window and the terminal window, and as the blog author
rightly notes, really long lines do get harder to read.

Again, I mostly work with Django and I suspect if I worked in regular
Python I would possibly gravitate towards 95-100 limit.

I find the blog author's point about fitting more text nonsensical: you
can obviously fit more text PER LINE if lines are longer! And you can
quite easily fit two 120-130 wide gvim screens on a modern monitor with
room to spare.

I'm sure eyesight acuity also figures into this: I prefer to work
without glasses -- otherwise my eyes get tired after a couple of hours;
but this means I can't see code on my second monitor. If I could, I
might have preferred having browser and terminal on one monitor and Gvim
with slightly longer width limits than I use now.

-m
 
L

llanitedave

I also tend to prefer a maximum between 110 and 120 characters. I find continuation lines confusing, and when you use some third-party tools, such as wxPython, for example, the boilerplate code leads to some long lines.

I would hate to have to break up this line, for instance:

self.mainLabel.SetFont(wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD, faceName = "FreeSans"))

Especially if it's already indented a few levels to begin with.

With most of your code in classes, you already got most of it indented two levels right off the bat.
 
R

rusi

Although PEP 8 is only compulsory for the Python standard library, many
users like to stick to PEP 8 for external projects.

http://www.python.org/dev/peps/pep-0008/

http://blog.languager.org/2012/10/layout-imperative-in-functional.htmlithperhaps one glaring exception: many people hate, or ignore, PEP 8's
recommendation to limit lines to 80 characters. (Strictly speaking, 79
characters.)

Here is a good defence of 80 char lines:

http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/

The exchange on hacker news linked from there makes for a nice read --
tnx.

I had a blog article http://blog.languager.org/2012/10/layout-imperative-in-functional.html
on this subject. It started from a python discussion, though its more
relevant to Haskell.
What does not so easily come out there is that the wide-line code
samples I posted which read ok to me were not to some readers. So I
moved it to gist, but even there some would get the horizontal scroll
bar. Reading it 'raw' seems to remove the problem -- though I can
hardly promise that for all devices. So from this POV the point that
was made was opposite to the one I was trying to make =)

The discussion that followed on haskell cafe
http://www.haskell.org/pipermail/haskell-cafe/2012-October/104224.html
made a number of interesting points about pros and cons of long lines.
 
S

Steven D'Aprano

I also tend to prefer a maximum between 110 and 120 characters. I find
continuation lines confusing, and when you use some third-party tools,
such as wxPython, for example, the boilerplate code leads to some long
lines.

Excessive boilerplate, and long lines, is a code-smell that suggests
strongly that the code needs refactoring and/or simplifying.

I would hate to have to break up this line, for instance:

self.mainLabel.SetFont(wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD,
faceName = "FreeSans"))


Here, let me do that for you, with two, no, three, no FOUR levels of
indentation :)


self.mainLabel.SetFont(self.label_format)


Now when you want to change the font of your labels, you can change them
all with *one* edit. Or you can add themes to your application by reading
style info from a config file. Even if you don't want to do that, you
still have the benefit of much more readable code.

Clearly you have to define the label format in the class. Here's one
simple way:

class MyClass(whatever):
label_format = wx.Font(
12, wx.DEFAULT, wx.NORMAL, wx.BOLD, faceName="FreeSans")

def __init__(self, label_format=None):
if label_format is not None:
# Over-ride the default.
self.label_format = label_format


Even if you don't want to do that, you can still fit it into 80 char
lines:

# 80 characters, not 79. Oh well.
font = wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD, faceName="FreeSans")
self.mainLabel.SetFont(font)


And finally, when all is said and done, the most important rule from PEP 8
applies: know when to break the rules.
 
G

gregor

Am Wed, 3 Apr 2013 21:32:33 -0700 (PDT)
schrieb llanitedave said:
I would hate to have to break up this line, for instance:

self.mainLabel.SetFont(wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD,
faceName = "FreeSans"))

I think this is much more readable:

self.mainLabel.SetFont(wx.Font(12,
wx.DEFAULT,
wx.NORMAL,
wx.BOLD,
faceName = "FreeSans"))

Emacs for example does this indentation with the TAB key automatically.
 
P

Peter Otten

llanitedave said:
I also tend to prefer a maximum between 110 and 120 characters. I find
continuation lines confusing, and when you use some third-party tools,
such as wxPython, for example, the boilerplate code leads to some long
lines.

I would hate to have to break up this line, for instance:

self.mainLabel.SetFont(wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD,
faceName = "FreeSans"))

I'm not a wx user, but I think I would prefer

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans")
self.mainLabel.SetFont(labelfont)

even if I knew the order of the arguments and the meaning of constants like
DEFAULT and NORMAL by heart.
 
R

Rui Maciel

Steven said:
Although PEP 8 is only compulsory for the Python standard library, many
users like to stick to PEP 8 for external projects.

http://www.python.org/dev/peps/pep-0008/

With perhaps one glaring exception: many people hate, or ignore, PEP 8's
recommendation to limit lines to 80 characters. (Strictly speaking, 79
characters.)


Here is a good defence of 80 char lines:

http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/


The now arbitrary 80-column limit is a remnant of the limitations built into
ancient terminals.

Why not let the text editor auto-wrap the lines? They can do that now.


Rui Maciel
 
T

Tim Chase

I think I would prefer

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans")
self.mainLabel.SetFont(labelfont)

+1
The only change I'd make to this suggestion would be to add a
semi-superfluous comma+newline after the last keyword argument too:

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
)

which makes diffs cleaner when you need to insert something after
faceName:

--- peter1.txt 2013-04-04 06:03:01.420762566 -0500
+++ peter2.txt 2013-04-04 06:03:34.736762582 -0500
@@ -3,4 +3,5 @@
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
- faceName="FreeSans")
+ faceName="FreeSans",
+ otherValue=42)

vs.

--- tkc1.txt 2013-04-04 06:02:52.436762562 -0500
+++ tkc2.txt 2013-04-04 06:03:51.392762588 -0500
@@ -4,4 +4,5 @@
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
+ otherValue=42,
)

Additionally, if there are lots of keyword parameters like this, I'd
be tempted to keep them in sorted order for ease of tracking them
down (though CSS has long-standing arguments on how properties should
be ordered, so to each their own on this).

-tkc
 
R

Roy Smith

llanitedave said:
I would hate to have to break up this line, for instance:

self.mainLabel.SetFont(wx.Font(12, wx.DEFAULT, wx.NORMAL, wx.BOLD, faceName =
"FreeSans"))

I would write that as some variation on

self.mainLabel.SetFont(wx.Font(12,
wx.DEFAULT,
wx.NORMAL,
wx.BOLD,
faceName="FreeSans"))

This lets the reader see at a glance that all the arguments go with
wx.Font(), not with SetFont(), without having to visually parse and
match parenthesis levels.

Actually, I would probably break it up further as:

my_font = wx.Font(12,
wx.DEFAULT,
wx.NORMAL,
wx.BOLD,
faceName="FreeSans")
self.mainLabel.SetFont(my_font)

The last thing on my mind when deciding how to format this is whether I
would be able to punch it onto a single card.
 
J

Jason Swails

+1
The only change I'd make to this suggestion would be to add a
semi-superfluous comma+newline after the last keyword argument too:

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
)

which makes diffs cleaner when you need to insert something after
faceName:

--- peter1.txt 2013-04-04 06:03:01.420762566 -0500
+++ peter2.txt 2013-04-04 06:03:34.736762582 -0500
@@ -3,4 +3,5 @@
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
- faceName="FreeSans")
+ faceName="FreeSans",
+ otherValue=42)

vs.

--- tkc1.txt 2013-04-04 06:02:52.436762562 -0500
+++ tkc2.txt 2013-04-04 06:03:51.392762588 -0500
@@ -4,4 +4,5 @@
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
+ otherValue=42,
)

+1

I wasn't aware you could do this (superfluous trailing commas), although I
admit it hadn't occurred to me to try. I use git for virtually everything,
and I regularly parse diffstats -- this would make them much easier to
grok. (It's an incredibly helpful bug-tracking technique)

Thanks!
Jason
 
R

Roy Smith

Jason Swails said:
The only time I regularly break my rule is for regular expressions (at some
point I may embrace re.X to allow me to break those up, too).

re.X is a pretty cool tool for making huge regexes readable. But, it
turns out that python's auto-continuation and string literal
concatenation rules are enough to let you get much the same effect.
Here's a regex we use to parse haproxy log files. This would be utter
line noise all run together. This way, it's almost readable :)

pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: '
r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):'
r'(?P<client_port>\d{1,5}) '

r'\[(?P<accept_date>\d{2}/\w{3}/\d{4}:)\d{2}){3}\.\d{3})] '
r'(?P<frontend_name>\S+) '
r'(?P<backend_name>\S+)/'
r'(?P<server_name>\S+) '
r'(?P<Tq>(-1|\d+))/'
r'(?P<Tw>(-1|\d+))/'
r'(?P<Tc>(-1|\d+))/'
r'(?P<Tr>(-1|\d+))/'
r'(?P<Tt>\+?\d+) '
r'(?P<status_code>\d{3}) '
r'(?P<bytes_read>\d+) '
r'(?P<captured_request_cookie>\S+) '
r'(?P<captured_response_cookie>\S+) '
r'(?P<termination_state>[\w-]{4}) '
r'(?P<actconn>\d+)/'
r'(?P<feconn>\d+)/'
r'(?P<beconn>\d+)/'
r'(?P<srv_conn>\d+)/'
r'(?P<retries>\d+) '
r'(?P<srv_queue>\d+)/'
r'(?P<backend_queue>\d+) '
r'(\{(?P<request_id>.*?)\} )?'
r'(\{(?P<captured_request_headers>.*?)\} )?'
r'(\{(?P<captured_response_headers>.*?)\} )?'
r'"(?P<http_request>.+)"'
)

And, for those of you who go running in the other direction every time
regex is suggested as a solution, I challenge you to come up with easier
to read (or write) code for parsing a line like this (probably
hopelessly mangled by the time you read it):

2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291
[02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com
0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA
sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0
{4C0ABFA9-515B6DEF-933229} "POST
/api/1/station/892337/song/16024201/notify-play HTTP/1.0"
 
J

Jason Swails

Jason Swails said:
The only time I regularly break my rule is for regular expressions (at some
point I may embrace re.X to allow me to break those up, too).

re.X is a pretty cool tool for making huge regexes readable. But, it
turns out that python's auto-continuation and string literal
concatenation rules are enough to let you get much the same effect.
Here's a regex we use to parse haproxy log files. This would be utter
line noise all run together. This way, it's almost readable :)

pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: '
r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):'
r'(?P<client_port>\d{1,5}) '

For some reason that never occurred to me. I use this technique every
other time I want to break up a long string, but never for regexes...

Now I will. I was wary of using re.X since I sometimes use meaningful
whitespace in my regexes, and I didn't want to have to figure out how to
prevent them from being ignored... This is a much better solution.

Thanks,
Jason
 
L

llanitedave

I would write that as some variation on



self.mainLabel.SetFont(wx.Font(12,

wx.DEFAULT,

wx.NORMAL,

wx.BOLD,

faceName="FreeSans"))



This lets the reader see at a glance that all the arguments go with

wx.Font(), not with SetFont(), without having to visually parse and

match parenthesis levels.



Actually, I would probably break it up further as:



my_font = wx.Font(12,

wx.DEFAULT,

wx.NORMAL,

wx.BOLD,

faceName="FreeSans")

self.mainLabel.SetFont(my_font)



The last thing on my mind when deciding how to format this is whether I

would be able to punch it onto a single card.

To each their own, definitely. For myself, I don't see the utility in adding a bunch of what appears to be superfluous horizontal white space at the expense of extra lines to scroll down. I like to limit my scrolling needs in *both* directions.
(Although I do tend to be fairly generous with blank lines to break up code "paragraphs")
 
N

Neil Cerutti

re.X is a pretty cool tool for making huge regexes readable.
But, it turns out that python's auto-continuation and string
literal concatenation rules are enough to let you get much the
same effect. Here's a regex we use to parse haproxy log files.
This would be utter line noise all run together. This way, it's
almost readable :)

pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: '
r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):'
r'(?P<client_port>\d{1,5}) '

r'\[(?P<accept_date>\d{2}/\w{3}/\d{4}:)\d{2}){3}\.\d{3})] '
r'(?P<frontend_name>\S+) '
r'(?P<backend_name>\S+)/'
r'(?P<server_name>\S+) '
r'(?P<Tq>(-1|\d+))/'
r'(?P<Tw>(-1|\d+))/'
r'(?P<Tc>(-1|\d+))/'
r'(?P<Tr>(-1|\d+))/'
r'(?P<Tt>\+?\d+) '
r'(?P<status_code>\d{3}) '
r'(?P<bytes_read>\d+) '
r'(?P<captured_request_cookie>\S+) '
r'(?P<captured_response_cookie>\S+) '
r'(?P<termination_state>[\w-]{4}) '
r'(?P<actconn>\d+)/'
r'(?P<feconn>\d+)/'
r'(?P<beconn>\d+)/'
r'(?P<srv_conn>\d+)/'
r'(?P<retries>\d+) '
r'(?P<srv_queue>\d+)/'
r'(?P<backend_queue>\d+) '
r'(\{(?P<request_id>.*?)\} )?'
r'(\{(?P<captured_request_headers>.*?)\} )?'
r'(\{(?P<captured_response_headers>.*?)\} )?'
r'"(?P<http_request>.+)"'
)

And, for those of you who go running in the other direction every time
regex is suggested as a solution, I challenge you to come up with easier
to read (or write) code for parsing a line like this (probably
hopelessly mangled by the time you read it):

2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291
[02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com
0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA
sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0
{4C0ABFA9-515B6DEF-933229} "POST
/api/1/station/892337/song/16024201/notify-play HTTP/1.0"

The big win from the above seems to me the groupdict result. The
parsing is also very simple, with virtually no nesting. It's a
good application of re.

It seems easy enough to do with str methods, but would it be an
improvement?

I ran out of time before the prototype was finished, but here's a
sketch.


import re
import datetime
import pprint

s =('2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291'
' [02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com'
' 0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA'
' sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0'
' {4C0ABFA9-515B6DEF-933229}'
' "POST /api/1/station/892337/song/16024201/notify-play HTTP/1.0"')

def get_haproxy(s):
prefix = 'haproxy['
if s.startswith(prefix):
return int(s[len(prefix):s.index(']')])
return False

def get_client_info(s):
ip, colon, port = s.partition(':')
if colon != ':':
return False
else:
return ip, int(port)

def get_accept_date(s):
try:
return datetime.datetime.strptime(s, '[%d/%b/%Y:%H:%M:%S.%f]')
except ValueError:
return False

def get_backend(s):
name, slash, server = s.partition('/')
if slash != '/':
return False
else:
return name, server

def get_track_info(s):
try:
return s.split('/')
except TypeError:
return False

matchers = [
(None, None),
(None, 'localhost'),
('haproxy', get_haproxy),
(('client_ip', 'client_port'), get_client_info),
('accept_date', get_accept_date),
('frontend_name', lambda s: s),
(('backend_name', 'server_name'), get_backend),
(('Tq', 'Tw', 'Tc', 'Tr', 'Tt'), get_track_info),
]
result = {}

for i, s in enumerate(s.split()):
if i < len(matchers): # I'm not finished writing matchers yet.
key, matcher = matchers
if matcher is None:
pass
else:
if isinstance(matcher, str):
value = matcher == s
else:
value = matcher(s)
if value is False:
raise ValueError('Parse error {}: {} "{}"'.format(
key, matcher, s))
if isinstance(key, tuple):
result.update(zip(*[key, value]))
elif key is not None:
result[key] = value
pprint.pprint(result)

The engine would need to be improved in implementation and made
more flexible once it's working and tested. I think the error
handling is a good feature and the ability to customize parsing
and return custom types is cool.
 
M

Mitya Sirenef

Although PEP 8 is only compulsory for the Python standard library, many
users like to stick to PEP 8 for external projects.

http://www.python.org/dev/peps/pep-0008/

With perhaps one glaring exception: many people hate, or ignore, PEP 8's
recommendation to limit lines to 80 characters. (Strictly speaking, 79
characters.)


Here is a good defence of 80 char lines:

http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/


I think one important consideration that wasn't mentioned yet is one of
Python principles: practicality beats purity.

I can see how someone could have a preference for 80 char width, there
are some valid reasons to prefer it. I think other reasons to prefer
(slightly) longer width outweigh them, but that's a judgement call.

However, if you work with other people's code, you will surely run into
all kinds of widths, 100, 120, 140+, etc. For someone with a rigid 80
limit, it's a real pain. I feel that somewhere around 100 width, must be
the reasonable middle ground: for me at soft 105 limit, editing 80 limit
code feels almost like my own, in fact the range of 80-110 is going to
fit neatly into my setup without any hassle.

120 is minimal hassle: I adjust Gvim to take a bit more space, resize
browser to take a bit less space and I'm set.

140 is a bit uncomfortable, but I generally notice that even when people
code to 140 limit, nearly all of their lines are actually going to be at
about 120 limit at most, with only a few offenders, so it's trivial to
adjust to 120 limit.

If people go over 140, that conclusively proves they're smoking crack
and the code needs to be refactored anyway.


I also find the argument about 80 width used in books a little odd. I
read webpages with 100-140 widths all the time with not the slightest
problem. As far as I know, no browser in existence lets you uniformly
adjust all pages to wrap at 80 limit except for lynx/links; if it was
hard to read at wider sizes, surely there would be at least one
graphical browser that would give that option (and get all the user
share from other browsers?)

Code is rather different from regular text. I would not want my books
formatted like this:

The story had held us, round the fire, sufficiently breathless, but
except the obvious
remark that it was
gruesome, as,

on Christmas Eve in an old house, a strange tale should essentially be,
I remember no comment uttered till somebody
happened to say that
it was the only case he had met in which such a visitation
had fallen on a
child.


Nor am I (warning: understatement) particularly enthusiastic about
editing code that looks like (formatted to 72 width):


try: request = self.request_class(environ) except UnicodeDecodeError:
logger.warning('Bad Request (UnicodeDecodeError)',
exc_info=sys.exc_info(), extra={ 'status_code': 400, }) response =
http.HttpResponseBadRequest() else: response =
self.get_response(request)

response._handler_class = self.__class__

try: status_text = STATUS_CODE_TEXT[response.status_code] except
KeyError: status_text = 'UNKNOWN STATUS CODE' status = '%s %s' %
(response.status_code, status_text) response_headers = [(str(k), str(v))
for k, v in response.items()] for c in response.cookies.values():
response_headers.append((str('Set-Cookie'), str(c.output(header=''))))
start_response(force_str(status), response_headers) return response


-m
 
J

Joshua Landau

+1
The only change I'd make to this suggestion would be to add a
semi-superfluous comma+newline after the last keyword argument too:

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
)

Since we're all showing opinions, I've always prefered the typical block
indentation:

labelfont = wx.Font(
pointSize=12,
style=wx.DEFAULT,
family=wx.NORMAL,
weight=wx.BOLD,
faceName="FreeSans",
) # Not indented here

as

A(
B(
C,
D,
E,
)
)

reads a lot cleaner than

A(
B(
C,
D,
E
)
)

which makes diffs cleaner when you need to insert something after
faceName:
<DIFS SNIP>

That is a very good point :).

Additionally, if there are lots of keyword parameters like this, I'd
be tempted to keep them in sorted order for ease of tracking them
down (though CSS has long-standing arguments on how properties should
be ordered, so to each their own on this).

Personally I'd rarely be tempted to put more than 9 or so arguments
directly into a function or class. Most of the time I can imagine unpacking
(or equiv.) would look much more readable in the circumstances that apply.
 
K

Kushal Kumaran

Roy Smith said:
Jason Swails said:
The only time I regularly break my rule is for regular expressions (at some
point I may embrace re.X to allow me to break those up, too).

re.X is a pretty cool tool for making huge regexes readable. But, it
turns out that python's auto-continuation and string literal
concatenation rules are enough to let you get much the same effect.
Here's a regex we use to parse haproxy log files. This would be utter
line noise all run together. This way, it's almost readable :)

pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: '
r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):'
r'(?P<client_port>\d{1,5}) '

r'\[(?P<accept_date>\d{2}/\w{3}/\d{4}:)\d{2}){3}\.\d{3})] '
r'(?P<frontend_name>\S+) '
r'(?P<backend_name>\S+)/'
r'(?P<server_name>\S+) '
r'(?P<Tq>(-1|\d+))/'
r'(?P<Tw>(-1|\d+))/'
r'(?P<Tc>(-1|\d+))/'
r'(?P<Tr>(-1|\d+))/'
r'(?P<Tt>\+?\d+) '
r'(?P<status_code>\d{3}) '
r'(?P<bytes_read>\d+) '
r'(?P<captured_request_cookie>\S+) '
r'(?P<captured_response_cookie>\S+) '
r'(?P<termination_state>[\w-]{4}) '
r'(?P<actconn>\d+)/'
r'(?P<feconn>\d+)/'
r'(?P<beconn>\d+)/'
r'(?P<srv_conn>\d+)/'
r'(?P<retries>\d+) '
r'(?P<srv_queue>\d+)/'
r'(?P<backend_queue>\d+) '
r'(\{(?P<request_id>.*?)\} )?'
r'(\{(?P<captured_request_headers>.*?)\} )?'
r'(\{(?P<captured_response_headers>.*?)\} )?'
r'"(?P<http_request>.+)"'
)

And, for those of you who go running in the other direction every time
regex is suggested as a solution, I challenge you to come up with easier
to read (or write) code for parsing a line like this (probably
hopelessly mangled by the time you read it):

2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291
[02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com
0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA
sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0
{4C0ABFA9-515B6DEF-933229} "POST
/api/1/station/892337/song/16024201/notify-play HTTP/1.0"

Is using csv.DictReader with delimiter=' ' not sufficient for this? I
did not actually read the regular expression in its entirety.
 
J

jmfauth

Although PEP 8 is only compulsory for the Python standard library, many
users like to stick to PEP 8 for external projects.

http://www.python.org/dev/peps/pep-0008/

With perhaps one glaring exception: many people hate, or ignore, PEP 8's
recommendation to limit lines to 80 characters. (Strictly speaking, 79
characters.)

Here is a good defence of 80 char lines:

http://wrongsideofmemphis.com/2013/03/25/80-chars-per-line-is-great/

-----

With "unicode fonts", where even the monospaced fonts
present char widths with a variable width depending on
the unicode block (obvious reasons), speaking of a "text
width" in chars has not even a sense.

jmf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top