very slow IO (STDIN.gets and puts) on Linux, ruby 1.8.2_pre3

M

MiG

Why is Ruby 2x slower in IO than php or bash?


data.dat is 80 MB file with 5000000 lines. I use Linux, 2GB RAM (tested
on another pc with similar result).

--------------------

test.php:
#!/usr/bin/php
<? while (fgets(STDIN)); ?>

$ time ./test.php < data.dat
/test.php < data.dat 5,59s user 0,19s system 88% cpu 6,516 total

--------------------

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
/test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total
 
F

Florian Gross

MiG said:
Why is Ruby 2x slower in IO than php or bash?

data.dat is 80 MB file with 5000000 lines. I use Linux, 2GB RAM (tested
on another pc with similar result).

--------------------

test.php:
#!/usr/bin/php
<? while (fgets(STDIN)); ?>

$ time ./test.php < data.dat
/test.php < data.dat 5,59s user 0,19s system 88% cpu 6,516 total

--------------------

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
/test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

Perhaps also try io.read() -- I think it will be faster.
 
B

Ben Giddings

MiG said:
Why is Ruby 2x slower in IO than php or bash?


data.dat is 80 MB file with 5000000 lines. I use Linux, 2GB RAM (tested
on another pc with similar result).

--------------------

test.php:
#!/usr/bin/php
<? while (fgets(STDIN)); ?>

$ time ./test.php < data.dat
./test.php < data.dat 5,59s user 0,19s system 88% cpu 6,516 total

--------------------

test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total

English is so much worse than Japanese! When I try to count to one
million in English it takes me 3.42 days, but when I try it in Japanese,
it only takes me 3.12 days!

Obviously, that means English is the worse language. Why does English
suck so bad?!?

-----

In other words: your benchmark is really dumb. That isn't practical
code, and trying to draw any conclusions from it is silly. For Ruby to
be considered fast, how much time should it take to read and discard a
line of text 5 kagillion times? Btw, I found a way to optimize your code:

deleteme.rb
#!/usr/bin/ruby
exit(0)

ben% time ruby deleteme.rb
ruby deleteme.rb 0.00s user 0.00s system 102% cpu 0.006 total

I'm still working on getting it to run in less than 0.004 total.

Ben
 
F

Florian Frank

MiG said:
test.rb:
#!/usr/bin/ruby
while gets
end

$ time ./test.rb < data.dat
./test.rb < data.dat 11,51s user 0,31s system 86% cpu 13,598 total
Well, Ruby assigns the line string to $_, if you use gets that way. So
Ruby has to construct an object for every line. Perhaps PHP doesn't do that?
 
M

MiG

1. I have NOTHING against Ruby, it is my best language
2. Is it wrong-doing to ask?
3. My dumb benchmark: I used real data. If you have 2GB of free RAM and
use 80MB file, is it wrong? It's the same if you have 1MB RAM and use
smaller file. I used the real data I have, that's all. It behaves the
same way with smaller.
4. Thank you for excellent humour.

MiG
 
G

gabriele renzi

MiG ha scritto:
So the solution is maybe to use getc and parse lines on my own...

or maybe use one of the standard methods for iterating over lines, such as
open('file').each do |x| stuff(x) end
this would not set $_ (I don't think it slows down things that much, but
who knows).
Once you have stuff() in place you can re-check if there is a difference.
 
N

Navindra Umanee

MiG said:
So the solution is maybe to use getc and parse lines on my own...

Maybe you're missing the point.

The two programs aren't doing the same amount of work; your benchmarks
aren't equivalent. If you change the PHP benchmark slightly, you'll
likely see PHP is just as slow as Ruby.

[navindra@dot /tmp]$ time php -r 'while (fgets(STDIN));' < FILE
8.421u 2.334s 0:26.53 40.5% 0+0k 0+0io 2pf+0w
[navindra@dot /tmp]$ time ruby -e 'while gets;end' < FILE
11.676u 2.586s 0:39.44 36.1% 0+0k 0+0io 11pf+0w
[navindra@dot /tmp]$ time php -r 'while ($blah=fgets(STDIN));' < FILE
10.680u 2.372s 0:37.83 34.4% 0+0k 0+0io 10pf+0w

Cheers,
Navin.
 
T

Tom Willis

MiG said:
So the solution is maybe to use getc and parse lines on my own...

Maybe you're missing the point.

The two programs aren't doing the same amount of work; your benchmarks
aren't equivalent. If you change the PHP benchmark slightly, you'll
likely see PHP is just as slow as Ruby.

[navindra@dot /tmp]$ time php -r 'while (fgets(STDIN));' < FILE
8.421u 2.334s 0:26.53 40.5% 0+0k 0+0io 2pf+0w
[navindra@dot /tmp]$ time ruby -e 'while gets;end' < FILE
11.676u 2.586s 0:39.44 36.1% 0+0k 0+0io 11pf+0w
[navindra@dot /tmp]$ time php -r 'while ($blah=fgets(STDIN));' < FILE
10.680u 2.372s 0:37.83 34.4% 0+0k 0+0io 10pf+0w

Cheers,
Navin.

Here's my results on a 14.5 mb file, ruby wins.

twillis:~$ time ruby -e 'while gets;end'< HL7Audit.csv

real 0m1.481s
user 0m0.924s
sys 0m0.095s
twillis:~$ time php -r 'while($blah=fgets(STDIN));'< HL7Audit.csv

real 0m2.327s
user 0m1.001s
sys 0m0.083s
 
B

Ben Giddings

MiG said:
1. I have NOTHING against Ruby, it is my best language
2. Is it wrong-doing to ask?
3. My dumb benchmark: I used real data. If you have 2GB of free RAM and
use 80MB file, is it wrong? It's the same if you have 1MB RAM and use
smaller file. I used the real data I have, that's all. It behaves the
same way with smaller.
4. Thank you for excellent humour.

I'm glad you see the humour. I was a little harsh, but I was having a
bad day, sorry.

Really, the benchmark really isn't meaningful. You need to do something
with the data you're reading. It doesn't matter if it's a 80MB file or
a 10 byte file. If you're simply reading the data and discarding it,
you aren't doing anything. For the measurement to be meaningful, you
actually need to *do something*.

Would you expect these two applications to take the same amount of time:

#!/bin/env ruby

1000.times do
# do nothing
end

------

#!/bin/env ruby

1000.times do
num = Math.sin(rand(1.0))
if num < 0.0
num += 1.0
else
num -= 1.0
end
end


Both programs are essentially equivalent. Neither actually *does*
anything. If the second one ran slower, could you really draw any
conclusions about the speed of Ruby's math operations?

In fact, it may be that Ruby's IO is slower than other languages. If
Ruby were even close to the speed of C I'd be stunned. Ruby has to
construct an object with every line it reads. C just stuffs things
blindly into an array. The problem is that your sample doesn't test
Ruby's IO capabilities. In the end, your sample code does absolutely
nothing.

If you want to benchmark Ruby's IO, try doing something like writing a
program to concatenate a number of files, or even just to copy a file.
Open one file for writing, and then open a file for reading, read
something from the input file, write to the output file.

In any case, until the slowness of Ruby's IO proves to be a problem in
actual use, why do you care how it fares on a benchmark?

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top