Python(2.5) reads an input file FASTER than pure C(Mingw)

G

Gabriel Ibanez

I did something near like that several days ago. Instead of programming in
C++ I did it with RM-Cobol. I used to know the times that cobol takes to
read the file and search for resutls, and I was surprised about the time
that Python took doing the same: really, really fast.


----- Original Message -----
From: "n00m" <[email protected]>
Newsgroups: comp.lang.python
To: <[email protected]>
Sent: Sunday, April 27, 2008 6:28 AM
Subject: Re: Python(2.5) reads an input file FASTER than pure C(Mingw)
 
N

n00m

Both codes by Dennis Lee Bieber are Ok. The 2nd one ("slurper") ,
seems , a bit faster.
I only corrected the author's typo: should be "% div" instead of "/
div".
And added this (don't know helped it or not):

if div == 1:
print lim
return

And of course:

import psyco
psyco.full()
 
L

Lie

No so simple, guys.
E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
Keep getting TLE (time limit exceeded). Any ideas? After all, it's
weekend.

450. Enormous Input Test
Problem code: INTEST

The purpose of this problem is to verify whether the method you are
using to read input data is sufficiently fast to handle problems
branded with the enormous Input/Output warning. You are expected to be
able to process at least 2.5MB of input data per second at runtime.

Input
The input begins with two positive integers n k (n, k<=107). The next
n lines of input contain one positive integer ti, not greater than
109, each.

Output
Write a single integer to output, denoting how many integers ti are
divisible by k.

Example
Input:
7 3
1
51
966369
7
9
999996
11

Output:
4

The problem is faulty.
First, the bottleneck on reading a huge input is the harddisk speed
and the string processing, the result of the competition doesn't prove
anything but how fast the string processor and the harddisk is.
Python's string processing is not a very fast one as it creates a copy
of the string every now and then.
 
N

n00m

Lie said:
The problem is faulty.
First, the bottleneck on reading a huge input is the harddisk speed
and the string processing, the result of the competition doesn't prove
anything but how fast the string processor and the harddisk is.
Python's string processing is not a very fast one as it creates a copy
of the string every now and then.

In case you didn't pay attention:
Python and C++ were tested on the same PC.
 
N

n00m

Another PC, another OS (Linux) and another compiler C++ (g++ 4.0.0-8)

Compare 2 my latest submissions: http://www.spoj.pl/status/SBANK,zzz/

times: 1.32s and 0.60s

Submitted codes:

import sys
z=sys.stdin.readlines()
print z[5]


#include <cstdio>
#include <cstdlib>
#include <vector>
#include <string>

using namespace std;

vector<string> vs;

int main() {
while (true) {
char line[50];
if (!fgets(line,50,stdin)) break;
vs.push_back(line);
}
return 0;
}


If it proves nothing then white is black and good is evil
 
H

hdante

Another PC, another OS (Linux) and another compiler C++ (g++ 4.0.0-8)

Compare 2 my latest submissions:http://www.spoj.pl/status/SBANK,zzz/

times: 1.32s and 0.60s

Submitted codes:

import sys
z=sys.stdin.readlines()
print z[5]

#include <cstdio>
#include <cstdlib>
#include <vector>
#include <string>

using namespace std;

vector<string> vs;

int main() {
    while (true) {
        char line[50];
        if (!fgets(line,50,stdin)) break;
        vs.push_back(line);
    }
return 0;

}

If it proves nothing then white is black and good is evil

It seems that the "push_back" line takes most of the time of the
code. Remove it and execution will drop to 0.25s.

Python readline uses fread instead of fgets:
http://svn.python.org/view/python/tags/r251/Objects/fileobject.c?rev=54864&view=markup
(see the file_readlines function)

If you write a code that does an fread loop, execution will drop to
0.01s.

This C code takes 0.25s. Almost all time is spent with string
manipulation.

#include <stdio.h>
#include <string.h>

#define B 8192

char vs[100000][40];
char buffer;

int main(void) {
int count;
char *begin, *end;
int i;
i = 0;
while (1) {
count = fread(buffer, 1, B, stdin);
if (count == 0) break;
begin = buffer;
while(1) {
end = (char *)memchr(begin, '\n', buffer+B-begin);
if (end == NULL) {
memmove(buffer, begin, buffer+B-begin);
break;
}
memmove(vs, begin, end-begin);
i = (i+1)%100000;
begin = end + 1;
}
}
return 0;
}

The difference, 0.60s-0.25s = 0.35s is probably mostly python's
memory management (which seems to be much more efficient than
std::vector default).

Very interesting post. :) I had no idea about how much optimized the
builtin library was.
 
L

Lie

In case you didn't pay attention:
Python and C++ were tested on the same PC.

In case you didn't pay attention, what I'm mainly concerned about is
running the same algorithm in the same machine in the same programming
language but in different phase of the moon that would give two
significantly different timing.

Harddisk seek time is based on harddisk's fragmentation, MFT's (or
equivalent) size, what "other" program are currently doing with the
harddisk, and by harddisk age. You can't avoid fragmentation, except
if you ghosted the harddisk and reinstall the OS after every test. You
can't prevent the OS to switch to other tasks while the test is
running they might possibly requested for a huge harddisk query. And
I'm sure you can't prevent hardware degradation (although the impact
of this is probably not significant). Heck I can go on to other phases
of the moon like how full and fragmented the memory (RAM) is, how much
swap is being used.

My secondary concern, is how the string processing is done in the
language. In python, every string operation creates a new string, in
C, a string processing is just a mere address passing, for this test,
the security given by python is unnecessary since it is read only. We
know which one is the more superior in speed, but we can't do anything
about this since this is implementation detail of the language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top