Enormous Input and Output Test

N

n00m

Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

I see two guys who managed to get accepted:
http://www.spoj.pl/ranks/INOUTEST/lang=PYTH

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
b = b.split()
return str(int(b[0]) * int(b[1])) + '\n'

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
a = map(noo, a)
sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?
 
C

Chris Rebert

Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.
def foo():
   ##sys.stdin = open('D:/1583.txt', 'rt')
   a = sys.stdin.readlines()

That line is probably a Very Bad Idea (TM) as it reads the *entire*
enormous file into memory *at once*. It would probably be much better
to iterate over the file, thus only reading one individual line at a
time. I'm betting the massive malloc()ing involved with .readlines()
is a large part of the slowness.

But it gets "Time Limit Exceeded" verdict.
Any ideas?

Cheers,
Chris
 
T

Terry Reedy

n00m said:
Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

I see two guys who managed to get accepted:
http://www.spoj.pl/ranks/INOUTEST/lang=PYTH

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
b = b.split()
return str(int(b[0]) * int(b[1])) + '\n'

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
a = map(noo, a)
sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?

Don't waste your time with problem sites that judge raw-clock time over
(and before) accuracy, thereby greatly favoring low-level languages and
hack tricks over clear high-level code.
 
N

n00m

That line is probably a Very Bad Idea (TM) as it reads the *entire*
enormous file into memory *at once*. It would probably be much better
to iterate over the file, thus only reading one individual line at a
time. I'm betting the massive malloc()ing involved with .readlines()
is a large part of the slowness.

Certainly not.
The culprit is line "a = map(noo, a)".
Without it execution time = 2.59s (I've just checked it).
 
A

alex23

I need your help to understand howhttp://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
    b = b.split()
    return str(int(b[0]) * int(b[1])) + '\n'

def foo():
    ##sys.stdin = open('D:/1583.txt', 'rt')
    a = sys.stdin.readlines()
    a = a[1:int(a[0]) + 1]
    a = map(noo, a)
    sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?

map() is really at its most efficient when used with built-ins, not
user defined functions. In those cases, I believe you're better off
using a for-loop or list comprehension. Also: are you sure they
support psyco? It's not part of stdlib...

I thought something simpler might work:

import sys

sys.stdin.readline()

for line in sys.stdin.readlines():
nums = map(int, line.split())
print nums[0] * nums[1]

But although this processes 5.5MB < 2 secs on my PC (which meets the
2.5MB/sec criteria outlined in INTEST), I'm getting the same result
that you are.

Do you know how big the input data set actually is?
 
N

n00m

Do you know how big the input data set actually is?

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).
 
N

n00m

PS
Yes, they support psyco since long time ago
(otherwise I'd get Compilitation Error verdict).
I used Psyco there many many times.
 
A

alex23

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).

Well, then, that's moved the problem from "challenging" to
"ludicrous". Being asked to conform to one constraint with no
knowledge of the others seems kind of ridiculous.

On my machine, the above code handles ~50MB in ~10sec.
 
N

n00m

On my machine, the above code handles ~50MB in ~10sec.

Means their input > 40-50MB
2.
I just see: two guys did it in Python
and I feel myself curious "how on earth?".
 
N

n00m

This time limits too:

=====================================================
import psyco
psyco.full()

import sys

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
for ai in a:
x, y = ai.split()
sys.stdout.write(str(int(x) * int(y)) + '\n')

foo()
=====================================================
 
J

John Yeung

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).

You know the maximum size of the input, if you can trust the problem
definition. The maximum number of lines in the input is 10**6 + 1.
The first line is at most 7 characters, plus EOL. The subsequent
lines are at most 13 characters each, plus EOL.

John
 
N

n00m

It can be not so "simple".
There can be multiple input files,
with *total* size ~30-50-80 MB.
 
N

n00m

And this code time limits (no matter with or without Psyco):

import psyco
psyco.full()

import sys

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
sys.stdin.readline()
while 1:
try:
x, y = sys.stdin.readline().split()
sys.stdout.write(str(int(x) * int(y)) + '\n')
except:
break

foo()
 
J

John Yeung

I've given up :)

Well, that numerix user (who already had the top Python solution) just
submitted a ton of new ones to that problem, apparently trying to get
a faster time. I don't think he can squeeze much more out of that
stone, but unlike us, he's routinely under 11s. Crazy.

I wish they had an option to let you keep running your program past
the limit (at least two or three times the limit) to give you more
feedback, even if they still consider your solution unacceptable.
Especially in the tutorial section, which doesn't seem to contribute
to your score anyway.

John
 
B

Bearophile

Terry Reedy:
Don't waste your time with problem sites that judge raw-clock time over
(and before) accuracy, thereby greatly favoring low-level languages and
hack tricks over clear high-level code.

I usually don't like to solve the kind of problems shown by those
sites because those problems are too much artificial (and often too
much difficult). But sometimes I have written some solutions.

But those sites never judge "raw" running time over accuracy: in most
or all such sites the programs are tested with tons of possible
inputs, and if even one output is a bit wrong, the program is totally
refused. This is a hard rule that encourages programmers to take a
very good care of program correctness.

Some sites add a little more "noise" in the inputs, simulating a bit
more real-world inputs, while most of those online contests give clean
inputs (the input bounds are well specified in the problem statement).

From what I've seen from some of the best solutions submitted to those
sites (some sites allow people to see the source of the contest
entries), the programs usually don't (need to) use "hack tricks" as
you say (even if probably some program uses them). Using hacks is
often unsafe so people usually prefer safer ways to code, because just
a little bug may fully compromise the score of the program.

I agree that the timing scores in such sites often encourage low level
languages, like C (and sometimes C++, that's a multilevel language),
but on the other hand such languages exist, C is used in many real-
world places, so designing sites where people compete with such
languages is legit. C allows people to understand better what's going
on inside the computer, this is valuable and positive. Bashing low-
level languages is silly. Even CPython is written in C. A good
programmer has to know both higher and lower level languages.

And in real-life sometimes you need performance. This thread shows
that a normal Python program is not up to those timings for the
enormous input problem (even if there are ways to write a Python
program to solve this problem). People at Google are trying to create
a 5 times faster Python (Unladen Swallow project) because they use lot
of real-world Python code and they think Python is slow. I've found
plenty of situations where CPython code is not fast enough for my
purposes.

Bye,
bearophile
 
N

n00m

but unlike us, he's routinely under 11s. Crazy.

No wonder!
50-80%% of users from the 1st page in ranklist are
super-extra-brilliant (young students) programmers.
They are winners of numerous competitions, national
and international olympiads on informatics, etc.
Some of them are/were even true wunderkinders.
E.g. Tomek Czajka from Poland (now he lives and works,
in some university, in America)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top