scripting languages vs statically compiled ones

A

Ajay

hi!

is there an authoritative source on the performance of scripting languages
such as python vs. something like java, c, c++.

its for a report, so it would be awesome if i could quote some well-known
authority on this.

thanks

cheers
 
C

Cameron Laird

P

Peter Hansen

Ajay said:
is there an authoritative source on the performance of scripting languages
such as python vs. something like java, c, c++.

its for a report, so it would be awesome if i could quote some well-known
authority on this.

Define "well-known" and "authoritative". For most definitions,
the answer would be "no" though.

-Peter
 
A

Alex Martelli

Cameron Laird said:

Heh -- thanks for the honor of quoting just me in this context, Cameron,
but I suspect Ajay would like something more.

I would recomment Prechelt's "An empirical comparison of C, C++, Java,
Perl, Python, Rexx, and Tcl",
<http://page.mi.fu-berlin.de/~prechelt/Biblio/jccpprt_computer2000.pdf>,
published in IEEE's Computer magazine.

Summary of empirical results:
'''
"scripting languages" (Perl, Python, Rexx, Tcl) are more productive than
"conventional languages" (C, C++, Java). In terms of run time and memory
consumption, they often turn out better than Java and not much worse
than C or C++. In general, the differences between languages tend to be
smaller than the typical differences due to different programmers within
the same language.
'''

But, get and study the whole paper -- these conclusions need to be
studied in depth. E.g.: mostly, productivity is due to almost constant
LOC/hour, with scripting languages having median length (for given
problem) of 100 vs others' around 250, confirming "function point"
theory; however, looking at it with a finer grain shows LOC/hour
(median) as high as 40 for Python, as low as 20 for Java and Rexx (Rexx
results, like C ones, doubtful due to fewer programs under study).
I.e., in empirical practice under the experiment's conditions, the
Python program is not only less than half the size of the Java one doing
the same task, but also gets written almost 5 times as fast (a factor of
2 in lines/hour times a factor of 2.5 in program size for programs doing
the same task). Reliability of the different programs also matters, and
it tends to be higher for higher level languages (though the author
prudently says no more than 'no less reliable').

Of course, any such comparison of technologies happens _at one point in
time_, underscoring how silly it is to think one is able to compare
performance of _languages_ rather than their _implementation_ at a given
time. In the years since the experiment, the performance of Java and
Python implementations has increased, with substantial effort having
been put into optimizing the respective virtual machines (particularly
on intel-like processors; others, such as PPC, appear to have VMs that
are less highly optimized); I believe the other languages'
implementations have not changed significantly in performance terms.

Visiting http://shootout.alioth.debian.org/ and downloading and trying
the "shootout" -- needs a lot of ability for critical thinking and good
all around knowledge to fix a lot of issues, fairly pointed out but
unsolved in links from that page -- is another possibility.


Alex
 
B

beliavsky

Ajay said:
hi!

is there an authoritative source on the performance of scripting languages
such as python vs. something like java, c, c++.

its for a report, so it would be awesome if i could quote some well-known
authority on this.

Table 25-1, "Relative Execution Time of Programming Languages", on
page 600 of the book "Code Complete, 2nd Edition" by Steve McConnell
(a well-known author) has the following statistics, based on
benchmarks described by the author in chapters 25-26. A larger number
means the language is slower.

Language Type of Language Execution Time Relative to C++
C++ compiled 1
Visual Basic compiled 1
C# compiled 1
Java byte code 1.5
PHP interpreted >100
Python interpreted >100

I think Python is slower than C++ or Fortran for number-crunching,
based on some experience with Numeric, but the speed factor is more
often in the range of 2-10, not >100. McConnell's benchmarks are more
general. I doubt the assertion that VB is as fast C++. Replacing a VBA
function in Excel with a C dll can lead to big increase in speed.

Table 4-1 of the same book shows the "ratio of high-level-language
statements to
equivalent C code" (higher is better):

Language Language Relative to C
C 1
Fortran 95 2
C++ 2.5
Java 2.5
Microsoft Visual Basic 4.5
Perl 6
Python 6

The ratios depend heavily on the type of program being written. I'll
believe a VB/Fortran 95 ratio of 2.25 (or much higher) for Windows GUI
programming but not for a linear algebra library, where Fortran is
more powerful.

The sources listed for this table are the books

"Estimating Software Costs", by Capers Jones, McGraw-Hill (1998)

"Software Cost Estimation with Cocomo II" by Barry Boehm,
Addison-Wesley
(2000)

and the paper (online at
http://page.mi.fu-berlin.de/~prechelt/Biblio/jccpprt_computer2000.pdf
)

"An Empirical Comparison of Seven Programming Languages", by Lutz
Prechelt,
IEEE Computer, October 2000, 23-29.
 
K

kosh

Language Type of Language Execution Time Relative to C++
C++ compiled 1
Visual Basic compiled 1
C# compiled 1
Java byte code 1.5
PHP interpreted >100
Python interpreted >100

My problem with these kinds of numbers is that they illustrate the whole
problem of lies, damned lies and statistics. The numbers are probably fairly
close to accurate but also not even in the right universe. In various
benchmarks c,c++ etc are typically always faster then python programs by a
huge ammount but remember that benchmarks are also very short overall and are
very time intensive over those lines of code. On a regular application there
is no time to do the kind of optimizations that you see in these benchmarks
and if you tried the product would be far too late.

Lets say that python is 5x more productive then c just for the sake of
arguement. So if the project took 2 months in python it would be 10 months in
c. Now when it comes to optimization you have about 5x the code in the c
version compared to the python version. So it is easier to profile and fix
problems in the python version by a large margin however more important is
fixing design errors. Because the code is much shorter in python and simpler
overall it makes it easier to identify, see and fix design errors.

My experience has been is that while c,c++ etc are faster then python in
regular applications I work with python is almost always a good deal faster.
I think Python is slower than C++ or Fortran for number-crunching,
based on some experience with Numeric, but the speed factor is more
often in the range of 2-10, not >100. McConnell's benchmarks are more
general. I doubt the assertion that VB is as fast C++. Replacing a VBA
function in Excel with a C dll can lead to big increase in speed.

I agree that replacing a single function with c and give an increase in speed
and that there are classes of things that we have libraries for which give
large increases in speed however the point is that all of those things can be
called from python easily. So why write an entire program in a lower level
language like c,c++, java, c# etc when you can write it in python in far less
time? One the python version is done you can profile it, fix speed problems,
debug it etc using far less resources. Once you have done all of that you can
first use libraries if they exist to speed up your program if still needed,
code parts in a lower level language now that you know exactly where the
speed issues are or use something like psyco. Overall your programs will be
far shorter, easier to read, easier to maintain, easier to debug, and just as
fast if not faster. In my view just about anything other then a kernel or a
device driver is premature optimization to write in a low level language
barring other contraints.
 
C

Cameron Laird

On Thursday 28 October 2004 11:04 am, (e-mail address removed) wrote: .
.
.
My experience has been is that while c,c++ etc are faster then python in
regular applications I work with python is almost always a good deal faster. .
.
.
I agree that replacing a single function with c and give an increase in speed
and that there are classes of things that we have libraries for which give
large increases in speed however the point is that all of those things can be
called from python easily. So why write an entire program in a lower level
language like c,c++, java, c# etc when you can write it in python in far less
time? One the python version is done you can profile it, fix speed problems,
debug it etc using far less resources. Once you have done all of that you can
first use libraries if they exist to speed up your program if still needed,
code parts in a lower level language now that you know exactly where the
speed issues are or use something like psyco. Overall your programs will be
far shorter, easier to read, easier to maintain, easier to debug, and just as
fast if not faster. In my view just about anything other then a kernel or a
device driver is premature optimization to write in a low level language
barring other contraints.

Me, too.

I recognize the original questioner wants authoritative answers,
preferably from prestigious sources. Let me assume for a moment
we've accomodated that need.

I want to reinforce one of Kosh's observations: for real-world
applications, Python is often FASTER than C. Not at least a
hundredth as fast, or at least a tenth as fast, but the-customers-
complain-about-the-C-coded-version-but-love-the-Python-based-one:
FASTER.

Also, I probably have an obligation to remind readers that an
opposition such as "C or Python?" is false in at least one more
way: we need to think, as Kosh hints, not whether to use one or
the other, but how much of each. In commercial practice and
academe, decision-makers will push you to settle on one language.
This is a mistake they make. Be prepared for it, and understand
how to respond. Even without heroic polyglotticism, you're
realistically likely to be more successful in your development
when you're ready to combine at least a couple of distinct
languages judiciously.
 
K

kosh

I want to reinforce one of Kosh's observations: for real-world
applications, Python is often FASTER than C. Not at least a
hundredth as fast, or at least a tenth as fast, but the-customers-
complain-about-the-C-coded-version-but-love-the-Python-based-one:
FASTER.

Here is the method I see on making programs fast in python.

Step 1: Make sure the program works (Who cares how slow it is if it does not
get the job done)

Step 2: Go through the program and see where you can simplify it. I know many
find it amazing but overall the shorter and simpler your python code is the
faster it tends to run. However the point of this step is not speed that is
just a side benefit. The point is that your code is now easier to read, fix
etc.

Step 3: If during the writing of your code and cleaning it up you are not
duplicating features available in the standard python install use the built
in ones instead. They get a lot more testing, are far less likely to contain
bugs, it is code you don't have to maintain etc. This will make your code
even simpler to read and understand and also shorter and will likely get rid
of more bugs.

Step 4: Profile your code. Now that your code is nice and clean it is far
easier to understand. This also makes the profiler information far more
useful since you can more easily see what the problem is. At this point think
carefully about the design and algorithm choice. Fix those problems first and
then fix up the speed issues by using better designs in python. Make sure
while you are fixing up speed issues that the code is still short and simple.

Step 5: If you are on a system that psyco works for use that sometimes it
helps a lot and it trivial to add.

Step 6: Look for outside libraries to use like numarray, pil etc. It is better
to reuse code then it is to write more especially if you try and stay with
reusing larger projects since they get a lot more work on optimization and
debugging then you can do. Besides reinventing the wheel takes too much time.

By this point your code should be pretty darn fast, short, easy to understand
and fairly easy to debug.

Step 7: As a last resort if your program is still not fast enough go ahead and
write code in a lower level language and call it from python. However be
careful with this step and make sure you really need to do it. The code you
write in a lower level language is far more likely to be buggy then anything
else up to this point and it will also usually take a fair bit of time to
actually write something in a low level language faster then what the above
steps have aleady given in gains.

At any point along this process you can stop if your code is fast enough and
for most people there is no need to go beyond step 6.

Remember the point of all of this is to get good software written not brag
about how cool you are for writing stuff in a lower level language and
wasting a customers time and money. Doing as much as possible in python and
using libraries saves a lot of time and money and gives a better result
overall. Then end result of this is also your programs tend to be a lot
faster then their low level language counterparts since it takes so much time
to optimize in a low level language. So while others still have a program
that will often times before 2-10x slower despite being in a "faster
language" you are working on adding new features, makng your app even faster
etc.


Another minor point for this is that each new python version often adds a lot
of very useful features for making your code simpler, faster, more memory
efficient etc. It is often worth it to take the time and convert the program
to use the features in a newer version of python. Things like sets,
generators, genexps etc can save a lot of code and memory.
 
A

Alex Martelli

kosh said:
Here is the method I see on making programs fast in python.

[[ snipped lots of well-argued explanations ]]

I also like Kent Beck's summary of basically the same ideas:

Make it work. Make it right. Make it fast.

corresponding to your point 1, 2/3, and 4/7 respectively.

And of course, as part of [1] one SHOULD have a good battery of unit
tests, too. If one doesn't, [1] isn't really complete -- how does one
_know_ one's program does work? That battery of unit tests, religiously
rerun at each refactoring, simplification, speedup, and reuse, will be
the best help throughout the 2->7 trip, giving you faster development
and more confidence at each step.

A couple specific comments...:
Step 6: Look for outside libraries to use like numarray, pil etc. It is better

Sometimes you can also reuse C-interfaced dynamic libraries that way,
via Thomas Heller's 'ctypes' extension. Since you do have the overhead
of interfacing, that will give you substantial speedup only if each call
into the C-interfaced library does a lot of work, of course.
Step 7: As a last resort if your program is still not fast enough go ahead and
write code in a lower level language and call it from python. However be

Here, too, look first for existing, reusable (preferably with-source!)
libraries that can do most of the work you need, leaving your code
mostly as a thin glue, possibly just a bit of looping, custom logic,
etc. Pyrex can often be the best 'lower level language' to use here,
because its level isn't _that_ much lower, necessarily. You can start
with Pyrex code that looks 90% like Python, and descend gracefully if
that's needed...


An excellent post - thanks!


Alex
 
R

Richard Blackwood

Table 25-1, "Relative Execution Time of Programming Languages", on
page 600 of the book "Code Complete, 2nd Edition" by Steve McConnell
(a well-known author) has the following statistics, based on
benchmarks described by the author in chapters 25-26. A larger number
means the language is slower.

Language Type of Language Execution Time Relative to C++
C++ compiled 1
Now since when was C++ as fast as itself?
Visual Basic compiled 1
C++ and VB are both a 1? Does this make any sense anyone?
C# compiled 1
Depends on how you code it. Though I would say the difference is only
significant in particular cases, so I agree here.
Java byte code 1.5
Read my comment below Python, but basically, this should say byte-code
interpreted w/ Just-In-Time compilation.
PHP interpreted >100


Python interpreted >100
Is Python not bytecode interpreted? It has no JIT with the standard
distro but otherwise, it is no less byte-code than Java. Misleading.
I think Python is slower than C++ or Fortran for number-crunching,
based on some experience with Numeric, but the speed factor is more
often in the range of 2-10, not >100. McConnell's benchmarks are more
general. I doubt the assertion that VB is as fast C++. Replacing a VBA
function in Excel with a C dll can lead to big increase in speed.
Indeed, it was my previous understanding that VB depended on runtimes.
There are a number of internet sources which attribute relative "drag"
to visual basic code execution. In other words, no need to doubt the
assertion, it is plain wrong (or so I believe from what I know).
Table 4-1 of the same book shows the "ratio of high-level-language
statements to
equivalent C code" (higher is better):

Language Language Relative to C
C 1
Fortran 95 2
C++ 2.5


Java 2.5
Microsoft Visual Basic 4.5
Higher-level than Java? Well, you surely pay for this in language
features and language design sensibility.
Perl 6
Python 6
Not above Perl? I shall have to think about this one.
 
A

Andrew Dalke

Richard said:
Not above Perl? I shall have to think about this one.

When I worked in Perl I was about as productive in writing code
as with Python, so this feels about right.

The advantages for Python come when many people work together
(either on a team or pulling in sources from different places)
- it scales well, so it's better in larger projects
- I understand how to do OO programming in Python
- it's easier for the non-software developers I work with
to use Python for their programming tasks
- there are fewer corners to worry about. Perl is
deliberately full of pointy edges.

These are not orthogonal.

Andrew
(e-mail address removed)
 
J

John

Table 25-1, "Relative Execution Time of Programming Languages", on
page 600 of the book "Code Complete, 2nd Edition" by Steve McConnell
(a well-known author) has the following statistics, based on
benchmarks described by the author in chapters 25-26. A larger number
means the language is slower.

Language Type of Language Execution Time Relative to C++
C++ compiled 1
Visual Basic compiled 1
C# compiled 1
Java byte code 1.5
PHP interpreted >100
Python interpreted >100

I think Python is slower than C++ or Fortran for number-crunching,
based on some experience with Numeric, but the speed factor is more
often in the range of 2-10, not >100. McConnell's benchmarks are more
general. I doubt the assertion that VB is as fast C++. Replacing a VBA
function in Excel with a C dll can lead to big increase in speed.

Believe it :). VB5 and VB6 can be compiled to native code and has
more or less identical performance to equivalent C code. And why
should not it? VBA implementation in VB is statically typed, well
optimized and not even as expressive as C. There is no kind of
developer productivity overhead like you see in Python :). If your VB
code is running slower in VB6, chances are you are using variants and
other less optimizable features.

VBA in MS Office is interpreted (not even p-code) and your C dll will
improve performance a lot, but not much in VB5/VB6.

I actually benchmarked this 4 years ago. VB4 code ran 30 times slower
compared to when using a C dll. VB5 code ran at about the same speed
as C.

If I remember, there was also a benchmark in MSDN (back in 1998 or so)
which showed VB as native code running only 50% slower than C.

Vendors such as PowerBasic who sell native compiled BASIC variants
claim C or better than C performance.
 
C

Cameron Laird

.
.
.
Is Python not bytecode interpreted? It has no JIT with the standard
distro but otherwise, it is no less byte-code than Java. Misleading.
.
.
.
Like many propositions in this thread, this one bears repetition
and examination. Even people in authority reiterate that "Java
is compiled, while Python is interpreted." Outsiders are going
to hear this said by people who appear to be speaking truthfully.
It's at best misleading, of course, as Mr. Blackwood recognizes.
There ought to be a way to armor the innocent against it ...
 
A

Alex Martelli

Richard Blackwood said:
Now since when was C++ as fast as itself?

Well, this is ONE line in the table nobody's gonna disprove!-)

Higher-level than Java? Well, you surely pay for this in language
features and language design sensibility.

That's another issue, and harder to quantify. Counting average number
of statements per function point, which is what "language level" means
in this context (and the table's title is very explicit about it), it
does seem right -- of course the precision of these numbers is dubious,
it could be, say, Java=3 and VB=4, &c.
Not above Perl? I shall have to think about this one.

In statements per function point? Idiomatic Perl tends to be way more
cryptic and terse than idiomatic Python; that Python pulls back to rough
parity (and I do agree with this table that it does) is, in a sense,
surprising.

Note that this need not translate to same-productivity, although "fixed
number of LOCs/hour" is a long-standing hypothesis of the Function
Points theorists. Prechelt's work couldn't _disprove_ the hypothesis in
a statistically significant way, but if you eyeball the language medians
in the LOCs/hr graphs in his IEEE Computer article (available as a PDF
on the net), you get the strong visual impression that, while most
languages do cluster in that measure, Java tends very much to the low
end of the cluster (20 LOCs/hr) and Python to the high end (40).


Yep, THIS paper!-)


Alex
 
T

Torben Ægidius Mogensen

Ajay said:
is there an authoritative source on the performance of scripting languages
such as python vs. something like java, c, c++.

its for a report, so it would be awesome if i could quote some well-known
authority on this.

You could use the "The Great Computer Language Shootout" at
http://shootout.alioth.debian.org/craps.php . It may give the
impression that you can't trust the numbers it gives, but I believe
this is mainly because it is more honest than the majority of other
language comparisons. The main problem with this test is that the
benchmark programs are fairly small, so the results may not be
representative of large programs.

Torben
 
R

Richard Blackwood

Cameron said:
.
.
.


.
.
.
Like many propositions in this thread, this one bears repetition
and examination. Even people in authority reiterate that "Java
is compiled, while Python is interpreted."
So now Java is compiled eh? I suppose that is more accurate but I
prefer to say that Java is bytecode compilable whereas Python is
bytecode interpreted. So with a broad stroke, I would say I am inline
with those in "authority", though I tend to be more specific.
Outsiders are going
to hear this said by people who appear to be speaking truthfully.
Indeed, appearances can be deceiving. ;-)
It's at best misleading, of course, as Mr. Blackwood recognizes.
There ought to be a way to armor the innocent against it ...
Quite right, just tell them Java is byte-code compiled and Python is
bytecode interpreted (with the ability to be bytecode compilable). I
wonder, how does Psyco match up with Java's JIT?
 
P

Peter Hansen

Richard said:
So now Java is compiled eh? I suppose that is more accurate but I
prefer to say that Java is bytecode compilable whereas Python is
bytecode interpreted. So with a broad stroke, I would say I am inline
with those in "authority", though I tend to be more specific.

Not that it's directly relevant to this thread, but what value
does your greater specificity provide in the case of a CPU
that directly implements the Java "bytecode" as its native
instruction set, Richard?

Java has a much lower level bytecode than Python's, but
even using the term "bytecode compilable" versus "bytecode
interpreted" could be considered misleading. If you study
the mechanisms involved in going from source to execution
in each language, they are very very similar, to the point
that saying one is "compilable" while the other is "interpreted"
is basically inaccurate. *Both* are compiled to bytecode,
both have a runtime interpreter which executes that bytecode,
but so far only Java has true compilers that can skip the
bytecode step (or perhaps they use it as the input, I've
never bothered to learn) and generate native machine code,
which is effectively what the JIT compilers do for Java
(and what Psyco does for Python, albeit in a much more
limited fashion).

-Peter
 
P

Patrick Maupin

<a lot of good stuff about making Python faster>

In addition to your points and Alex's additions, I would add (as part
of your step 4): To make Python faster (drumroll...) CACHE
EVERYTHING!!!

The concept of caching goes by various names (e.g. memoize) depending
on the exact form it takes, but in any case it is usually _much_
easier to do simply and correctly in Python than in almost any other
language, and can sometimes produce dramatic results.

As an example, I wrote an assembler/linker combination in Python (with
a little bit of Pyrex and a tiny bit of C for the scanner). When I
started out, it was much slower than the C legacy system it replaced.
Now, it is much faster, by a factor of around 4x for a full build. It
is also much nicer, catches more errors, has a more flexible syntax,
etc.

In C, there is no way I would have been able to get the semantics
right for usefully caching the results of partial evaluation of
include files, but in Python it is a snap. (Note that I did not
assert that it is not possible to do this in C, merely that _I_ would
have been incapable of doing it in the time I would have been willing
to spend on the task.) I also cache the object files (which are
merely pickles) in preparation for doing a link, the source modules in
preparation for doing a listing output (which can be done as a
relative listing before the link, or an absolute listing after the
link), and anything else I can think of.

The Python version occupies a working set of around 75MB when it runs,
compared to about 4MB for the C version. Good programmers have always
known about time vs. memory tradeoffs, but the informed use of Python
in conjunction with the staggering amount of memory available on
today's average PC can produce truly startling results.

Regards,
Pat
 
P

Programmer Dude

John said:
Believe it :). VB5 and VB6 can be compiled to native code and has
more or less identical performance to equivalent C code. And why
should not it? VBA implementation in VB is statically typed, well
optimized and not even as expressive as C. There is no kind of
developer productivity overhead like you see in Python :). If your VB
code is running slower in VB6, chances are you are using variants and
other less optimizable features.

VBA in MS Office is interpreted (not even p-code) and your C dll will
improve performance a lot, but not much in VB5/VB6.

I actually benchmarked this 4 years ago. VB4 code ran 30 times slower
compared to when using a C dll. VB5 code ran at about the same speed
as C.

Concur. A while back I thought I could speed up a VB app that had
a lot of bit-twiddle operations by writting a C-based DLL for it.

Turns out it ran slower that way!

Having used the VC++ debugger to step through compiled VB code, I
can attest to it being compiled down to native. Ratio is much higher
than with, say C or C++... I'd WAG it at about 10:1 or slightly more.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top