Enormous Input and Output Test

n00m · Oct 3, 2009

Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

I see two guys who managed to get accepted:
http://www.spoj.pl/ranks/INOUTEST/lang=PYTH

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
b = b.split()
return str(int(b[0]) * int(b[1])) + '\n'

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
a = map(noo, a)
sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?

Chris Rebert · Oct 4, 2009

Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

def foo():
Â Â ##sys.stdin = open('D:/1583.txt', 'rt')
Â Â a = sys.stdin.readlines()

That line is probably a Very Bad Idea (TM) as it reads the *entire*
enormous file into memory *at once*. It would probably be much better
to iterate over the file, thus only reading one individual line at a
time. I'm betting the massive malloc()ing involved with .readlines()
is a large part of the slowness.

But it gets "Time Limit Exceeded" verdict.
Any ideas?

Cheers,
Chris

Terry Reedy · Oct 4, 2009

n00m said:
Hi, py.folk!

I need your help to understand how
http://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

I see two guys who managed to get accepted:
http://www.spoj.pl/ranks/INOUTEST/lang=PYTH

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
b = b.split()
return str(int(b[0]) * int(b[1])) + '\n'

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
a = map(noo, a)
sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?

Don't waste your time with problem sites that judge raw-clock time over
(and before) accuracy, thereby greatly favoring low-level languages and
hack tricks over clear high-level code.

n00m · Oct 4, 2009

That line is probably a Very Bad Idea (TM) as it reads the *entire*
enormous file into memory *at once*. It would probably be much better
to iterate over the file, thus only reading one individual line at a
time. I'm betting the massive malloc()ing involved with .readlines()
is a large part of the slowness.

Certainly not.
The culprit is line "a = map(noo, a)".
Without it execution time = 2.59s (I've just checked it).

n00m · Oct 4, 2009

PS
Time Limit for this problem = 20s

alex23 · Oct 4, 2009

I need your help to understand howhttp://www.spoj.pl/problems/INOUTEST/
can be passed in Python.

My code for this is:
===========================================
import psyco
psyco.full()

import sys

def noo(b):
b = b.split()
return str(int(b[0]) * int(b[1])) + '\n'

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
a = map(noo, a)
sys.stdout.writelines(a)

foo()
===========================================

But it gets "Time Limit Exceeded" verdict.
Any ideas?

map() is really at its most efficient when used with built-ins, not
user defined functions. In those cases, I believe you're better off
using a for-loop or list comprehension. Also: are you sure they
support psyco? It's not part of stdlib...

I thought something simpler might work:

import sys

sys.stdin.readline()

for line in sys.stdin.readlines():
nums = map(int, line.split())
print nums[0] * nums[1]

But although this processes 5.5MB < 2 secs on my PC (which meets the
2.5MB/sec criteria outlined in INTEST), I'm getting the same result
that you are.

Do you know how big the input data set actually is?

n00m · Oct 4, 2009

Do you know how big the input data set actually is?

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).

n00m · Oct 4, 2009

PS
Yes, they support psyco since long time ago
(otherwise I'd get Compilitation Error verdict).
I used Psyco there many many times.

alex23 · Oct 4, 2009

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).

Well, then, that's moved the problem from "challenging" to
"ludicrous". Being asked to conform to one constraint with no
knowledge of the others seems kind of ridiculous.

On my machine, the above code handles ~50MB in ~10sec.

n00m · Oct 4, 2009

On my machine, the above code handles ~50MB in ~10sec.

Means their input > 40-50MB
2.
I just see: two guys did it in Python
and I feel myself curious "how on earth?".

n00m · Oct 4, 2009

This time limits too:

=====================================================
import psyco
psyco.full()

import sys

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
a = sys.stdin.readlines()
a = a[1:int(a[0]) + 1]
for ai in a:
x, y = ai.split()
sys.stdout.write(str(int(x) * int(y)) + '\n')

foo()
=====================================================

n00m · Oct 4, 2009

And *without* Psyco the above code gets TLE verdict...

A kind of mystery

John Yeung · Oct 4, 2009

Of course, I don't know exact size of input.
It's several MBs, I guess. And mind the fact:
their testing machines are PIII (750MHz).

You know the maximum size of the input, if you can trust the problem
definition. The maximum number of lines in the input is 10**6 + 1.
The first line is at most 7 characters, plus EOL. The subsequent
lines are at most 13 characters each, plus EOL.

John

n00m · Oct 4, 2009

It can be not so "simple".
There can be multiple input files,
with *total* size ~30-50-80 MB.

n00m · Oct 4, 2009

And this code time limits (no matter with or without Psyco):

import psyco
psyco.full()

import sys

def foo():
##sys.stdin = open('D:/1583.txt', 'rt')
sys.stdin.readline()
while 1:
try:
x, y = sys.stdin.readline().split()
sys.stdout.write(str(int(x) * int(y)) + '\n')
except:
break

foo()

John Yeung · Oct 4, 2009

It can be not so "simple".
There can be multiple input files,
with *total* size ~30-50-80 MB.

According to one of the global moderators, the 20s time limit is for
each input file:

https://www.spoj.pl/forum/viewtopic.php?f=6&t=4667

John

n00m · Oct 4, 2009

I've given up

John Yeung · Oct 4, 2009

I've given up

Well, that numerix user (who already had the top Python solution) just
submitted a ton of new ones to that problem, apparently trying to get
a faster time. I don't think he can squeeze much more out of that
stone, but unlike us, he's routinely under 11s. Crazy.

I wish they had an option to let you keep running your program past
the limit (at least two or three times the limit) to give you more
feedback, even if they still consider your solution unacceptable.
Especially in the tutorial section, which doesn't seem to contribute
to your score anyway.

John

Bearophile · Oct 4, 2009

Terry Reedy:

Don't waste your time with problem sites that judge raw-clock time over
(and before) accuracy, thereby greatly favoring low-level languages and
hack tricks over clear high-level code.

I usually don't like to solve the kind of problems shown by those
sites because those problems are too much artificial (and often too
much difficult). But sometimes I have written some solutions.

But those sites never judge "raw" running time over accuracy: in most
or all such sites the programs are tested with tons of possible
inputs, and if even one output is a bit wrong, the program is totally
refused. This is a hard rule that encourages programmers to take a
very good care of program correctness.

Some sites add a little more "noise" in the inputs, simulating a bit
more real-world inputs, while most of those online contests give clean
inputs (the input bounds are well specified in the problem statement).

From what I've seen from some of the best solutions submitted to those
sites (some sites allow people to see the source of the contest
entries), the programs usually don't (need to) use "hack tricks" as
you say (even if probably some program uses them). Using hacks is
often unsafe so people usually prefer safer ways to code, because just
a little bug may fully compromise the score of the program.

I agree that the timing scores in such sites often encourage low level
languages, like C (and sometimes C++, that's a multilevel language),
but on the other hand such languages exist, C is used in many real-
world places, so designing sites where people compete with such
languages is legit. C allows people to understand better what's going
on inside the computer, this is valuable and positive. Bashing low-
level languages is silly. Even CPython is written in C. A good
programmer has to know both higher and lower level languages.

And in real-life sometimes you need performance. This thread shows
that a normal Python program is not up to those timings for the
enormous input problem (even if there are ways to write a Python
program to solve this problem). People at Google are trying to create
a 5 times faster Python (Unladen Swallow project) because they use lot
of real-world Python code and they think Python is slow. I've found
plenty of situations where CPython code is not fast enough for my
purposes.

Bye,
bearophile

n00m · Oct 4, 2009

but unlike us, he's routinely under 11s. Crazy.

No wonder!
50-80%% of users from the 1st page in ranklist are
super-extra-brilliant (young students) programmers.
They are winners of numerous competitions, national
and international olympiads on informatics, etc.
Some of them are/were even true wunderkinders.
E.g. Tomek Czajka from Poland (now he lives and works,
in some university, in America)

Testing for the presence of input from stdin.	7	Jan 23, 2006
SOAPpy, client, execute method that has 'complex' input parameter (with list)?	0	Oct 5, 2005
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Implementations of Multi-Methods and Tail Call Elimination (Or how I stopped worrying and learned to	1	Aug 29, 2004
ANN main-4.4.0	0	Nov 25, 2010
[ANN] main-4.0.0 (for avdi)	0	Oct 21, 2009
Real-Time Fluid Dynamics for Games...	2	Feb 20, 2005
[ANN] main-2.8.3	2	Oct 23, 2008

Enormous Input and Output Test

n00m

Chris Rebert

Terry Reedy

n00m

n00m

alex23

n00m

n00m

alex23

n00m

n00m

n00m

John Yeung

n00m

n00m

John Yeung

n00m

John Yeung

Bearophile

n00m

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads