python vs perl lines of code

E

Edward Elliott

This is just anecdotal, but I still find it interesting. Take it for what
it's worth. I'm interested in hearing others' perspectives, just please
don't turn this into a pissing contest.

I'm in the process of converting some old perl programs to python. These
programs use some network code and do a lot of list/dict data processing.
The old ones work fine but are a pain to extend. After two conversions,
the python versions are noticeably shorter.

The first program does some http retrieval, sort of a poor-man's wget with
some extra features. In fact it could be written as a bash script with
wget, but the extra processing would make it very messy. Here are the
numbers on the two versions:

Raw -Blanks -Comments
lines chars lines chars lines chars
mirror.py 167 4632 132 4597 118 4009
mirror.pl 309 5836 211 5647 184 4790

I've listed line and character counts for three forms. Raw is the source
file as-is. -Blanks is the source with blank lines removed, including
lines with just a brace. -Comments removes both blanks and comment lines.
I think -Blanks is the better measure because comments are a function of
code complexity, but either works.

By the numbers, the python code appears roughly 60% as long by line and 80%
as long by characters. The chars percentage being (higher relative to line
count) doesn't surprise me since things like list comprehensions and
explicit module calling produce lengthy but readable lines.

I should point out this wasn't a straight line-for-line conversion, but the
basic code structure is extremely similar. I did make a number of
improvements in the Python version with stricter arg checks and better
error handling, plus added a couple minor new features.

The second program is an smtp outbound filtering proxy. Same categories as
before:

Raw -Blanks -Comments
lines chars lines chars lines chars
smtp-proxy.py 261 7788 222 7749 205 6964
smtp-proxy.pl 966 24110 660 23469 452 14869

The numbers here look much more impressive but it's not a fair comparison.
I wasn't happy with any of the cpan libraries for smtp sending at the time
so I rolled my own. That accounts for 150 raw lines of difference. Another
70 raw lines are logging functions that the python version does with the
standard library. The new version performs the same algorithms and data
manipulations as the original. I did do some major refactoring along the
way, but it wasn't the sort that greatly reduces line count by eliminating
redundancy; there is very little redundancy in either version. In any
case, these factors alone don't account for the entire difference, even if
you take 220 raw lines directly off the latter columns.

The two versions were written about 5 years apart, all by me. At the time
of each, I had about 3 years experience in the given language and would
classify my skill level in it as midway between intermediate and advanced.
IOW I'm very comfortable with the language and library reference docs (minus
a few odd corners), but generally draw the line at mucking with interpreter
internals like symbol tables.

I'd like to here from others what their experience converting between perl
and python is (either direction). I don't have the sense that either
language is particularly better suited for my problem domain than the
other, as they both handle network io and list/dict processing very well.
What are the differences like in other domains? Do you attribute those
differences to the language, the library, the programmer, or other
factors? What are the consistent differences across space and time, if
any? I'm interested in properties of the code itself, not performance.

And just what is the question to the ultimate answer to life, the universe,
and everything anyway? ;)
 
J

John Bokma

Edward Elliott said:
This is just anecdotal, but I still find it interesting. Take it for
what it's worth. I'm interested in hearing others' perspectives, just
please don't turn this into a pissing contest.

Without seeing the actual code this is quite meaningless.
 
C

Charles DeRykus

Edward said:
Evaluating my experiences yes, relating your own no.

But why would anecdotal accounts be of interest... unless there's
an agenda :) Differing skill levels and problem scenarios would
tangle the results so much no one could ever unravel the skein or
pry out any meaningful conclusions. I'm not sure what's to be gained
....even if you're just evaluating your own experiences. And, as you
suspect, it almost certainly would devolve into a pissing contest.

This subject thread may be of great interest but I think an language
advocacy mailing list would be a better forum.
 
A

Ala Qumsieh

Edward said:
Evaluating my experiences yes, relating your own no.

Well, quality of code is directly related to its author. Without knowing
the author personally, or at least seeing the code, your anecdote
doesn't really mean anything.

A colleague of mine, who is efficient at programming, and pretty decent
at Perl, routinely does something like:

if ($var =~ /something and something else/) {
$var =~ /(something) and (something else)/;
my $match1 = $1;
my $match2 = $2;
...
}

Needless to say, this adds a lot of unnecessary redundancy, which will
go towards increasing your character count. Being an avid Perl Golfer
(although not one of the best) I can almost guarantee that any python
code can be written more succinctly in Perl, although readability will
suffer. Plus, the extensibility argument is very subjective, and is
closely related to personal coding style.

Btw, do you include space chars that go toward indentating Python code
in your count? If not, you should since they are required. Not so for Perl.

--Ala
 
E

Edward Elliott

Charles said:
This subject thread may be of great interest but I think an language
advocacy mailing list would be a better forum.

Fair enough, but advocacy isn't at all what I'm after. Anecdotes are fine,
after all what is data but a collection of anecdotes? :) Seriously,
anecdotes are valuable: they give you another perspective, reflect common
wisdom, and can tell you what/where/how to look for hard data. Of course
if anyone already has hard data that would be welcome too, but it's hard to
even pin down what 'hard data' means in this situation.

I'll grant you though, asking for non-value-judgement-laden anecdotes on
newsgroups may be asking too much.
 
E

Edward Elliott

Ala said:
Btw, do you include space chars that go toward indentating Python code
in your count? If not, you should since they are required. Not so for
Perl.

All chars are counted on lines which are counted. The perl and python
versions use the same amount and type of indentation, which in this case is
tab characters. In any case, I wouldn't strip the whitespace out of the
perl code just because it's unnecessary for the interpreter. How people
deal with code is far more interesting than how machines do, and for us
whitespace is necessary (not strictly, but a really really good idea).
 
M

Mirco Wahab

Hi Edward
Raw -Blanks -Comments
lines chars lines chars lines chars
mirror.py 167 4632 132 4597 118 4009
mirror.pl 309 5836 211 5647 184 4790

Maybe somebody would change his style
and had a lot of such statements before:

if ( something )
{
do_something()
}

which can be expressed in one
line:

do_something() if ( /something/ );

This has a 1:4 line count then.

Or, somebody used identifier like:

sub GetTheseSamplesHereOut {
...
...
}

and later:
sub SampleExtract {
...
...
}

and saved ~40% characters.
You got my point? ;-)


Regards

M. Wahab
 
J

John Bokma

Edward Elliott said:
Evaluating my experiences yes, relating your own no.

What would the point be? Most important to me would be: am I happy with
the result? And that rarely has to do with the number of lines of actual
code or the programming language. A language is just a tool.
 
E

Edward Elliott

Mirco said:
Maybe somebody would change his style
and had a lot of such statements before:
which can be expressed in one
line:
This has a 1:4 line count then.

Or, somebody used identifier like:
and later:
and saved ~40% characters.
You got my point? ;-)

Hey I completely agree that line counts leave out a lot of information.
Measures of the code like complexity, readability, work performed, etc
hinge on many more important factors. I don't pretend that lines of code
represents any indication of inherent superiority or fitness.

But line counts do convey some information. Even if it's only how many
lines a particular programmer used to convey his ideas. Real-world and
average-case data are more compelling than theoretical limits on how
compact code can be. Besides compactness isn't the point, communication
is. Maybe line count is a good rough first-cut approximation of that.
Maybe it's not. Probably it's both, depending on the case. Talking about
the numbers can only shed light on how to interpret them, which as always
is 'very carefully'.

I'm not saying lines of code necessarily reflects anything else. All I'm
saying is, I noticed some properties of my code. I'd like to know what
objective properties others have noticed about their code. This is not
meant to be a comparison of languages or programming technique, just a
sampling of collective wisdom. That always has value, even when it's
wrong.

By the looks of it, this group is uninterested in the discussion. Which is
fine.
 
E

Edward Elliott

John said:
What would the point be? Most important to me would be: am I happy with
the result? And that rarely has to do with the number of lines of actual
code or the programming language. A language is just a tool.

The point is knowing how to pick the right tool for the right job.
Anecdotes aren't the answer but they can be the beginning of the question.
Besides, whatever happened to pursuing knowledge for its own sake?
 
A

Aahz

Fair enough, but advocacy isn't at all what I'm after. Anecdotes are fine,
after all what is data but a collection of anecdotes? :)

"The plural of anecdote is not data."
 
B

brian d foy

Edward said:
This is just anecdotal, but I still find it interesting. Take it for what
it's worth. I'm interested in hearing others' perspectives, just please
don't turn this into a pissing contest.

I'm in the process of converting some old perl programs to python. These
programs use some network code and do a lot of list/dict data processing.
The old ones work fine but are a pain to extend. After two conversions,
the python versions are noticeably shorter.

You've got some hidden assumptions in there somehere, even if you
aren't admitting them to yourself. :)

You have to note that rewriting a program, even in the same language,
tends to make it shorter, too. These things are measures of programmer
skill, not the usefulness or merit of a particular language.

Shorter doesn't really mean anything though, and line count means even
less. The number of statements or the statement density might be
slightly more meaningful. Furthermore, you can't judge a script by just
the lines you see. Count the lines of all the libraries and support
files that come into play. Even then, that's next to meaningless unless
the two things do exactly the same thing and have exactly the same
features and capabilities.

I can write a one line (or very short) program (in any language) that
does the same thing your scripts do just by hiding the good stuff in a
library. One of my friends likes to talk about his program that
implements Tetris in one statement (because he hardwired everything
into a chip). That doesn't lead us to any greater understanding of
anything though.

*** ***
 
A

achates

It probably says something about your coding style, particularly in
perl. I've found (anecdotally of course) that while perl is potentially
the more economical language, writing *legible* perl takes a lot more
space.
 
A

Adam Jones

Without any more information I would say the biggest contributor to
this dissimilarity is your experience. Having spent an additional five
years writing code you probably are better now at programming than you
were then. I am fairly confident that if you were to take another crack
at these same programs in perl you would see similar results.

One of the bigger differences might have been language changes over
time. If you had written this in python five years ago (assuming the
python rewrites are relatively current, otherwise this list gets
bigger) you would not have generators, iterators, the logging package,
built in sets, decorators, and a host of other changes. Some of these
features you may not have used, but for every one you did python would
have had more weight.

Other than that it all boils down to how the algorithm is implemented.
Between those three factors you can probably account for most of the
differences here. The real important question is: what has perl done in
the last five years to make writing these scripts easier?
 
J

John Bokma

Edward Elliott said:
The point is knowing how to pick the right tool for the right job.

True, and I don't think that the number of lines is going to be a good
guideline.
Anecdotes aren't the answer but they can be the beginning of the
question. Besides, whatever happened to pursuing knowledge for its own
sake?

Sure, but research without being able to peer review the set up is a bit
difficult. If you want an anecdote: three years ago my Perl coding was
different from now, and I am sure that my Python coding (when I get
there), will be different compared to if I had learned the language 3
years ago. Mind, I am not saying that I am going to program Python the
Perl way, just that in 3 years I have learned stuff I can use in other
languages as well.
 
M

Michael Tobis

"The plural of anecdote is not data."

It's a pithy quote, but it isn't QOTW in my book, simply because it
isn't true in general. Talk to some paleoclimatologists.

There is no way to get uniform measures of ancient climate. What should
we do then? Should we ignore the information we have? Are the
fortuitously preserved fossils of the very deep past to be ignored just
because we can't get an unbiased sample?

In fact, the more difficult it is to get systematic data, the more
valuable the anecdote.

There is a number that represents the character ratio for equivalent
skill applied to equivalent tasks across all domains to which both
languages are applied. A single programmer's results on this matter do
in fact constitute a sample. A single sample is not a very good
estimator, but it is not devoid of skill or information either.

In the present case Edward gave us some advice that he thought he was
making a fair comparison, one which would appear counterintuitive to
anyone who has worked in both languages. Perlists tend to giggle and
cackle every time they save a keystroke; Pythonistas do not have this
personality quirk. If Python is nevertheless terser it is a strong
argument in Python's favor vis-a-vis Perl.

Edward also asked if others had similar experiences. If others did, the
assemblage of their opinions would in fact consttitute data. I have no
idea why people are giving him such grief over this request.

My only data point, er, anecdotal evidence, is this. To take things to
an unrealistic extreme, consider the puzzle at http://pycontest.net
(Python Golf). When I first thought about this, I assumed that Perl
would defeat Python in the character count, but as I worked at the
puzzle I came to the (to me) counterintuitive realization that it
probably would not. I'd be interested in seeing the results of an
inter-language golf contest.

Of course, such games don't tell us much about real code, but I'm
inclined to agree with Edward's impression that Python is, in practice,
terse compared to Perl, and I, too, would like to hear about other
examples, and because I think the plural of "anecdote" is, in fact,
"data".

mt
 
B

Ben Finney

Michael Tobis said:
"The plural of anecdote is not data."

It's a pithy quote, but it isn't QOTW in my book, simply because it
isn't true in general. Talk to some paleoclimatologists.

There is no way to get uniform measures of ancient climate. What
should we do then? Should we ignore the information we have? Are the
fortuitously preserved fossils of the very deep past to be ignored
just because we can't get an unbiased sample?

Those samples can be independently verified by any skilled observer at
another time. This is what distinguishes them from anecdotes, and
breaks your analogy.
 
E

Edward Elliott

brian said:
You have to note that rewriting a program, even in the same language,
tends to make it shorter, too. These things are measures of programmer
skill, not the usefulness or merit of a particular language.

I completely agree. But you have to start somewhere.
Shorter doesn't really mean anything though, and line count means even
less. The number of statements or the statement density might be
slightly more meaningful. Furthermore, you can't judge a script by just
the lines you see. Count the lines of all the libraries and support
files that come into play. Even then, that's next to meaningless unless
the two things do exactly the same thing and have exactly the same
features and capabilities.

For an objective measure of which language/environment is more optimal for a
given task, your statement is completely accurate. OTOH for a
quick-and-dirty real-world comparison of line counts, and possibly a rough
approximation of complexity, the libraries don't matter if they offer
more-or-less comparable functionality. Especially if those libraries are
the standard ones most people rely on.

I'm not attaching any special significance to line counts. They're just a
data point that's easy to quantify. What if anything do they mean? How
does one measure statement density? What's the divisor in the density
ratio - lines, characters, units of work, etc? These are all interesting
questions with no easy answers.
I can write a one line (or very short) program (in any language) that
does the same thing your scripts do just by hiding the good stuff in a
library. One of my friends likes to talk about his program that
implements Tetris in one statement (because he hardwired everything
into a chip). That doesn't lead us to any greater understanding of
anything though.

Of course. Extreme cases are just that.
 
E

Edward Elliott

achates said:
It probably says something about your coding style, particularly in
perl. I've found (anecdotally of course) that while perl is potentially
the more economical language, writing *legible* perl takes a lot more
space.

I'm sure it does. My perl (from 5 years ago) may be considered verbose (or
not, I don't know). I avoid shortcuts like $_, use strict mode, etc. Then
again I frequently use short forms like "statement if/unless (blah);" when
appropriate. So there's a big personal component in there.

But again, the interesting thing to me isn't what could one do, it's what
are people actually doing in the real world?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top