Confirm my Performance Test Against Java?

Discussion in 'Ruby' started by Ben Christensen, Aug 19, 2009.

  1. I'm evaluating Ruby for use in a variety of systems that are planned by
    default to be Java.

    I've started down a path of doing various performance tests to see what
    kind of impact will occur by using Ruby and in my first test the numbers
    are very poor - so poor that I have to question if I'm doing something
    wrong.

    I've tried it on both Linux and Mac OSX and get similar performance
    numbers on each - differences being hardware, but the ratio between the
    results about the same.

    Please take a look at my blog post on my test results and view the
    source code and let me know if I'm doing something completely wrong with
    the Ruby code or execution - or if these are accurate numbers.

    http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/

    NOTE: This is not an attempt to start a flame war. This is a legitimate
    effort to take a good look at Ruby and let the numbers speak for
    themselves in making decisions for what types of applications I can
    choose to use Ruby for without sacrificing the performance of a mature
    platform such as Java.

    Thank you.

    Ben
    --
    Posted via http://www.ruby-forum.com/.
    Ben Christensen, Aug 19, 2009
    #1
    1. Advertising

  2. Ben Christensen

    pharrington Guest

    On Aug 19, 9:31 am, Ben Christensen <> wrote:
    > I'm evaluating Ruby for use in a variety of systems that are planned by
    > default to be Java.
    >
    > I've started down a path of doing various performance tests to see what
    > kind of impact will occur by using Ruby and in my first test the numbers
    > are very poor - so poor that I have to question if I'm doing something
    > wrong.
    >
    > I've tried it on both Linux and Mac OSX and get similar performance
    > numbers on each - differences being hardware, but the ratio between the
    > results about the same.
    >
    > Please take a look at my blog post on my test results and view the
    > source code and let me know if I'm doing something completely wrong with
    > the Ruby code or execution - or if these are accurate numbers.
    >
    > http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-per...
    >
    > NOTE: This is not an attempt to start a flame war. This is a legitimate
    > effort to take a good look at Ruby and let the numbers speak for
    > themselves in making decisions for what types of applications I can
    > choose to use Ruby for without sacrificing the performance of a mature
    > platform such as Java.
    >
    > Thank you.
    >
    > Ben
    > --
    > Posted viahttp://www.ruby-forum.com/.


    Well.... without having put a ton of thought into this... yes, Ruby
    (*especially* 1.8 MRI) is slow. No one's going to argue that the Ruby
    interpreter is one of the quicker kids around. If performance is the
    #1 priority of whatever you'll be developing, Ruby doesn't fit your
    needs, and no one will tell you it does. That's what Java (for the
    most part) and C are still hanging around for.

    What sort of software is in needed of being developed here?

    Ask yourself: is it critical that my code always performs as fast as
    possible? Or is the greater concern speed of development and project
    maintainability?

    Also as to the benchmark... can you post your /tmp/file_test.txt?
    Posting some benchmarky code isn't very useful if no one can replicate
    your results. Reading the whole file into memory may be faster than
    reading it line-by-line (but obviously the wrong thing to do if the
    file's enormous, which.... 8 secs to read??? i'd better be moved to
    tears by the size it.) And not entirely sure what it is you're trying
    to benchmark here? Vagggguuee benchmarks are fairly useless, as the
    code your timing is never going to be anywhere close to the actual
    code you'll write. Are you trying to just compare file reading times?
    Benchmark that, and only that. Is there something specific string
    manipulation-wise you want to measure? Then... measure that. Until
    your code starts getting at least halfway specific, just doing a line-
    by-line Java-Ruby conversion doesn't tell anything, as the code that
    happens is neither the most "elegant" *nor* fastest Ruby can do.
    pharrington, Aug 19, 2009
    #2
    1. Advertising

  3. Ben Christensen

    Guest

    On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen
    <> wrote:
    > I'm evaluating Ruby for use in a variety of systems that are planned by
    > default to be Java.
    >
    > I've started down a path of doing various performance tests to see what
    > kind of impact will occur by using Ruby and in my first test the numbers
    > are very poor - so poor that I have to question if I'm doing something
    > wrong.


    Is this test case in any way representative of the tasks you will
    actually be performing?

    Test file 1:

    > uname -a

    Linux linux116.ctc.com 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 12:03:43
    EST 2008 i686 i686 i386 GNU/Linux

    > java -version

    java version "1.6.0_0"
    IcedTea6 1.3.1 (6b12-Fedora-EPEL-5) Runtime Environment (build 1.6.0_0-b12)
    OpenJDK Server VM (build 1.6.0_0-b12, mixed mode)

    > java FileReadParse

    Starting to read file...
    The number of tokens is: 1954
    It took 16 ms

    > ruby -v file_read_parse.rb

    ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-linux]
    Starting to read file ...
    The number of tokens is: 1954
    It took 4.951 ms

    Test file 2:

    > java FileReadParse

    Starting to read file...
    The number of tokens is: 479623
    It took 337 ms

    > ruby file_read_parse.rb

    Starting to read file ...
    The number of tokens is: 479623
    It took 2526.455 ms

    > ruby file_read_parse-2.rb

    Starting to read file ...
    It took 588.065 ms
    The number of tokens is: 479623

    > cat file_read_parse-2.rb

    puts "Starting to read file ..."
    start = Time.now

    tokens = File.new("/tmp/file_test.txt").read.scan(/[^\s]+/)
    count = tokens.size

    stop = Time.now
    puts "It took #{(stop - start) * 1000} ms"
    puts "The number of tokens is: #{count}"
    , Aug 19, 2009
    #3
  4. wrote:
    > On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen
    > <> wrote:
    >> I'm evaluating Ruby for use in a variety of systems that are planned by
    >> default to be Java.
    >>
    >> I've started down a path of doing various performance tests to see what
    >> kind of impact will occur by using Ruby and in my first test the numbers
    >> are very poor - so poor that I have to question if I'm doing something
    >> wrong.

    >
    > Is this test case in any way representative of the tasks you will
    > actually be performing?


    If it is, then you should just do
    $ time wc approach.txt
    6836 78325 484114 approach.txt

    real 0m0.041s
    user 0m0.046s
    sys 0m0.015s
    Reid Thompson, Aug 19, 2009
    #4
  5. Ben Christensen

    Mike Sassak Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen
    <>wrote:

    > I'm evaluating Ruby for use in a variety of systems that are planned by
    > default to be Java.
    >
    > I've started down a path of doing various performance tests to see what
    > kind of impact will occur by using Ruby and in my first test the numbers
    > are very poor - so poor that I have to question if I'm doing something
    > wrong.
    >
    > I've tried it on both Linux and Mac OSX and get similar performance
    > numbers on each - differences being hardware, but the ratio between the
    > results about the same.
    >
    > Please take a look at my blog post on my test results and view the
    > source code and let me know if I'm doing something completely wrong with
    > the Ruby code or execution - or if these are accurate numbers.
    >
    >
    > http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/
    >
    > NOTE: This is not an attempt to start a flame war. This is a legitimate
    > effort to take a good look at Ruby and let the numbers speak for
    > themselves in making decisions for what types of applications I can
    > choose to use Ruby for without sacrificing the performance of a mature
    > platform such as Java.
    >


    Hi Ben,

    The point everyone keeps bringing up--whether this benchmark is indicative
    of what you will actually be doing with Ruby, and whether it is "fast
    enough"--is worth considering for any project, but the fact remains that for
    many things, Java is going to execute faster than Ruby. You can certainly
    optimize Ruby code (and yes, writing Ruby extensions in C is actually pretty
    easy), but that's not why many of us love Ruby. We love it because it allows
    you to turn FileReadParse.java into this: http://gist.github.com/170466.
    Now, in the spirit of good fun:

    $ ruby file_read_parse_2.rb file_read_parse_2.rb
    Starting to read file ...
    The number of tokens is: 39.
    It took 0.189 ms

    $ ruby file_read_parse_2.rb FileReadParse.java
    Starting to read file ...
    The number of tokens is: 159.
    It took 0.215 ms

    See? :)

    Good luck with Ruby, and don't be afraid to ask more questions!
    Mike


    > Thank you.
    >
    > Ben
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
    Mike Sassak, Aug 19, 2009
    #5
  6. Ben Christensen

    Mike Sassak Guest

    [Note: parts of this message were removed to make it a legal post.]

    Argh! That gist should be http://gist.github.com/170476. Sigh...

    On Wed, Aug 19, 2009 at 1:17 PM, Mike Sassak <> wrote:

    > On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen <
    > > wrote:
    >
    >> I'm evaluating Ruby for use in a variety of systems that are planned by
    >> default to be Java.
    >>
    >> I've started down a path of doing various performance tests to see what
    >> kind of impact will occur by using Ruby and in my first test the numbers
    >> are very poor - so poor that I have to question if I'm doing something
    >> wrong.
    >>
    >> I've tried it on both Linux and Mac OSX and get similar performance
    >> numbers on each - differences being hardware, but the ratio between the
    >> results about the same.
    >>
    >> Please take a look at my blog post on my test results and view the
    >> source code and let me know if I'm doing something completely wrong with
    >> the Ruby code or execution - or if these are accurate numbers.
    >>
    >>
    >> http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/
    >>
    >> NOTE: This is not an attempt to start a flame war. This is a legitimate
    >> effort to take a good look at Ruby and let the numbers speak for
    >> themselves in making decisions for what types of applications I can
    >> choose to use Ruby for without sacrificing the performance of a mature
    >> platform such as Java.
    >>

    >
    > Hi Ben,
    >
    > The point everyone keeps bringing up--whether this benchmark is indicative
    > of what you will actually be doing with Ruby, and whether it is "fast
    > enough"--is worth considering for any project, but the fact remains that for
    > many things, Java is going to execute faster than Ruby. You can certainly
    > optimize Ruby code (and yes, writing Ruby extensions in C is actually pretty
    > easy), but that's not why many of us love Ruby. We love it because it allows
    > you to turn FileReadParse.java into this: http://gist.github.com/170466.
    > Now, in the spirit of good fun:
    >
    > $ ruby file_read_parse_2.rb file_read_parse_2.rb
    > Starting to read file ...
    > The number of tokens is: 39.
    > It took 0.189 ms
    >
    > $ ruby file_read_parse_2.rb FileReadParse.java
    > Starting to read file ...
    > The number of tokens is: 159.
    > It took 0.215 ms
    >
    > See? :)
    >
    > Good luck with Ruby, and don't be afraid to ask more questions!
    > Mike
    >
    >
    >> Thank you.
    >>
    >> Ben
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>
    >>

    >
    Mike Sassak, Aug 19, 2009
    #6
  7. Mike Sassak wrote:
    > Argh! That gist should be http://gist.github.com/170476. Sigh...


    And you can even, with another ounce of ruby-love, rewrite that as:

    num = 0
    ARGF.each do |l|
    num += l.split.length
    end

    Then it also works with stdin or multiple filenames on the cmdline.

    I'll leave it to others to #inject... ;)

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
    Joel VanderWerf, Aug 19, 2009
    #7
  8. Ben Christensen

    Guest

    On Wed, Aug 19, 2009 at 11:07 AM, Reid Thompson<> wro=
    te:
    > wrote:
    >>
    >> On Wed, Aug 19, 2009 at 9:31 AM, Ben Christensen
    >> <> wrote:
    >>>
    >>> I'm evaluating Ruby for use in a variety of systems that are planned by
    >>> default to be Java.
    >>>
    >>> I've started down a path of doing various performance tests to see what
    >>> kind of impact will occur by using Ruby and in my first test the number=

    s
    >>> are very poor - so poor that I have to question if I'm doing something
    >>> wrong.

    >>
    >> Is this test case in any way representative of the tasks you will
    >> actually be performing?

    >
    > If it is, then you should just do
    > $ time wc approach.txt
    > =A06836 =A078325 484114 approach.txt


    :)

    I got a little crazy; first the numbers (slower hardware this time):

    > uname -a

    Linux eXist 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 00:28:35 UTC
    2009 i686 GNU/Linux

    > java -version

    java version "1.6.0_0"
    OpenJDK Runtime Environment (IcedTea6 1.4.1) (6b14-1.4.1-0ubuntu11)
    OpenJDK Client VM (build 14.0-b08, mixed mode, sharing)

    > java FileReadParse

    Starting to read file...
    The number of tokens is: 479623
    It took 596 ms

    > /opt/matzruby/trunk/bin/ruby -v -rubygems file_read_parse.rb

    ruby 1.9.2dev (2009-08-14 trunk 24539) [i686-linux]
    Starting to read file ...
    The number of tokens is: 479623
    It took 1751.92544 ms

    > /opt/matzruby/trunk/bin/ruby -v -rubygems file_read_parse-3.rb

    ruby 1.9.2dev (2009-08-14 trunk 24539) [i686-linux]
    ffi_c.so: warning: method redefined; discarding old inspect
    struct.rb:26: warning: method redefined; discarding old offset
    variadic.rb:15: warning: method redefined; discarding old call
    library.rb:78: warning: method redefined; discarding old fopen
    library.rb:78: warning: method redefined; discarding old fgetc
    Starting to read file ...
    It took 4565.077896 ms
    The number of tokens is: 479623

    > jruby -v -rubygems file_read_parse.rb

    jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (OpenJDK Client VM
    1.6.0_0) [i386-java]
    Starting to read file ...
    The number of tokens is: 479623
    It took 2316.0 ms

    > jruby -v -rubygems file_read_parse-3.rb

    jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (OpenJDK Client VM
    1.6.0_0) [i386-java]
    Starting to read file ...
    It took 3117.0 ms
    The number of tokens is: 479623

    And the code:

    > cat file_read_parse-3.rb

    require 'ffi'

    module LibC
    extend FFI::Library

    # FILE *fopen(const char *path, const char *mode);
    attach_function :fopen, [ :string, :string ], :pointer

    # int fgetc(FILE *stream);
    attach_function :fgetc, [ :pointer ], :int
    end

    puts "Starting to read file ..."
    start =3D Time.now

    file =3D LibC.fopen("/tmp/file_test.txt", "r")
    count =3D 0; in_word =3D false
    while (c =3D LibC.fgetc(file)) !=3D -1
    if 32 < c and c < 127
    unless in_word
    count +=3D 1
    in_word =3D true
    end
    else
    in_word =3D false
    end
    end

    stop =3D Time.now
    puts "It took #{(stop - start) * 1000} ms"
    puts "The number of tokens is: #{count}"
    , Aug 19, 2009
    #8
  9. Ben Christensen

    Mike Sassak Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Wed, Aug 19, 2009 at 1:26 PM, Joel VanderWerf <>wrote:

    > Mike Sassak wrote:
    >
    >> Argh! That gist should be http://gist.github.com/170476. Sigh...
    >>

    >
    > And you can even, with another ounce of ruby-love, rewrite that as:
    >
    > num = 0
    > ARGF.each do |l|
    > num += l.split.length
    > end
    >
    > Then it also works with stdin or multiple filenames on the cmdline.
    >
    > I'll leave it to others to #inject... ;)
    >


    Ha! I wrote it with inject initially, but then thought, "Nah... I don't want
    to blow *too* many minds." :)
    Mike Sassak, Aug 19, 2009
    #9
  10. Thanks everyone for your responses.

    Yes, this test is representative of some of the types of applications
    and necessary data processing I have current applications doing and am
    needing in some future ones.

    The file I'm using is 49MB in size unzipped - too large for me to upload
    right now as I'm on a mobile cell network.

    To provide context on the file, it contains data such as this:

    Western Digital Caviar Special Edition Hard Drive - 80GB - 7200rpm -
    Ultra ATA - IDE/EIDE - Internal
    Kingston 256MB SDRAM Memory Module - 256MB (1 x 256MB) - 133MHz PC133 -
    SDRAM - 144-pin
    512Mo (1 x 512Mo) - 133MHz PC133 - SDRAM - 168 broches

    It's stats are:

    wc /tmp/file_test.txt
    1778983 7764115 51084191 /tmp/file_test.txt

    This is not a test of "file reading". The test is related to the
    performance of iterating over large lists of data and performing
    processing on them - such as indexing for searching, cleansing,
    normalizing etc.

    This is a very small representation of the level of complexity and size
    of data I would in reality be dealing with.

    It seems however that the answer is that this is not what Ruby is well
    suited for. Am I correct in that determination?

    I will however be continuing my ongoing tests with SOAP/REST webservices
    and more CRUD focused webapps, where I expect to see Ruby shine.
    --
    Posted via http://www.ruby-forum.com/.
    Ben Christensen, Aug 19, 2009
    #10
  11. pharrington, in your response you stated:

    "as the code that happens is neither the most "elegant" *nor* fastest
    Ruby can do."

    Can you please provide me a re-write of the Ruby code I used that is
    elegant and fast so I can learn from you?

    I consider myself quite advanced in Java (14 years of experience there)
    but obviously do not have experience in Ruby for performance tuning and
    optimization.

    I would appreciate your demonstration of how to perform the task I have
    attempted in Ruby using an appropriate "Ruby" approach that achieves the
    highest performance possible and the "elegance" spoke of.

    Thank you.

    Ben

    --
    Posted via http://www.ruby-forum.com/.
    Ben Christensen, Aug 19, 2009
    #11
  12. Ben Christensen

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    A version of Mike Sassak's gist

    start = Time.now
    printf "Starting to read file ...\nThe number of tokens is: %d.\nIt took
    %.2f ms\n" , File.open(ARGV[0]){|f| f.inject(0){|a,l| a+l.split.length } } ,
    (Time.now - start) * 1000

    I won't call it elegant, that seems subjective to me, but I do appreciate
    brevity.


    On Wed, Aug 19, 2009 at 12:52 PM, Ben Christensen <
    > wrote:


    > pharrington, in your response you stated:
    >
    > "as the code that happens is neither the most "elegant" *nor* fastest
    > Ruby can do."
    >
    > Can you please provide me a re-write of the Ruby code I used that is
    > elegant and fast so I can learn from you?
    >
    > I consider myself quite advanced in Java (14 years of experience there)
    > but obviously do not have experience in Ruby for performance tuning and
    > optimization.
    >
    > I would appreciate your demonstration of how to perform the task I have
    > attempted in Ruby using an appropriate "Ruby" approach that achieves the
    > highest performance possible and the "elegance" spoke of.
    >
    > Thank you.
    >
    > Ben
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
    Josh Cheek, Aug 19, 2009
    #12
  13. On Thu, 20 Aug 2009, Ben Christensen wrote:

    >
    > This is not a test of "file reading". The test is related to the
    > performance of iterating over large lists of data and performing
    > processing on them - such as indexing for searching, cleansing,
    > normalizing etc.
    >
    > This is a very small representation of the level of complexity and size
    > of data I would in reality be dealing with.
    >
    > It seems however that the answer is that this is not what Ruby is well
    > suited for. Am I correct in that determination?
    >


    Ben -- I've been working with Java since '96 (and taught Java for sun for
    a while, so I think I can understand where you may be coming from). At
    this point, I prefer to write Ruby -- it's much more readable and lots
    less *crufty* than Java, but Java still pays the bills.

    I do have the following questions and/or things to consider --

    1. How *often* are you going to be processing these files? If they are
    batch style jobs, then does absolute speed matter over maintainability?

    2. Are there any reasons to not keep the data in a database and then
    perform queries, etc.?



    If you're wanting to do things such as indexing and so forth, Ruby's
    string handling far outshines, imho, Java's. Ruby's "collections" and
    enumerables are far more robust as well. As a result, I can spend 5
    minutes writing something that would take me 30 or even 60 minutes in
    Java. Yes, ruby may not be faster in execution time -- of course, as the
    results show, it depends on how you write it (in one instance it was
    faster than java), but even if a run takes, say, 1 second longer, it'd
    have to run 1500 times before the total of java's development and runtime
    caught up with ruby's. And that's not including maintenance time. Then
    factor in that developer time is usually far more expensive than cpu time,
    and Ruby tends to come out in the lead.

    What would be a far more fair assessment would be to factor in the amount
    of time it takes to write a test, as well as the number of lines of code,
    since size of code tends to increase complexity and also maintenance
    costs. Then run the two and see which is better.

    If you're processing these files in realtime to extract data, etc., then
    perhaps you'd be better loading them into a database. However, if they're
    batched, as I expect, by simply comparing "speed of execution" you're
    looking at only one facet of the problem.

    Matt
    Matthew K. Williams, Aug 19, 2009
    #13
  14. On 19.08.2009 15:31, Ben Christensen wrote:

    > http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/


    1.9* is significantly better. I did not try JRuby yet.

    robert@fussel /cygdrive/c/Temp/frp
    $ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/javac FileReadParse.java

    robert@fussel /cygdrive/c/Temp/frp
    $ java -cp . FileReadParse
    Starting to read file...
    The number of tokens is: 1122
    It took 16 ms

    robert@fussel /cygdrive/c/Temp/frp
    $ allruby file_read_parse.rb
    ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
    Starting to read file ...
    The number of tokens is: 1122
    It took 3.0 ms
    ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
    Starting to read file ...
    The number of tokens is: 1122
    It took 2.0 ms

    robert@fussel /cygdrive/c/Temp/frp
    $ wc file_test.txt
    190 1114 7579 file_test.txt

    robert@fussel /cygdrive/c/Temp/frp
    $


    ====================================================================


    robert@fussel /cygdrive/c/Temp/frp
    $ !w
    wc file_test.txt x
    95000 557000 3789500 file_test.txt
    68970 404382 2751177 x
    163970 961382 6540677 insgesamt

    robert@fussel /cygdrive/c/Temp/frp
    $ java -cp . FileReadParse
    Starting to read file...
    The number of tokens is: 561000
    It took 359 ms

    robert@fussel /cygdrive/c/Temp/frp
    $ !a
    allruby file_read_parse.rb
    ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
    Starting to read file ...
    The number of tokens is: 561000
    It took 1395.0 ms
    ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
    Starting to read file ...
    The number of tokens is: 561000
    It took 872.0 ms

    robert@fussel /cygdrive/c/Temp/frp

    robert@fussel /cygdrive/c/Temp/frp
    $ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/java -server -cp .
    FileReadParse
    Starting to read file...
    The number of tokens is: 561000
    It took 515 ms

    robert@fussel /cygdrive/c/Temp/frp
    $

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 19, 2009
    #14
  15. Ben Christensen

    Josh Cheek Guest

    [Note: parts of this message were removed to make it a legal post.]

    My previous version would probably be better like this:

    start = Time.now
    puts "Starting to read file ..."
    puts "The number of tokens is: %d." % File.open(ARGV[0]){|f|
    f.inject(0){|a,l| a+l.split.length } } ,
    "It took #{(Time.now - start) * 1000} ms"

    That way if the file is enormous, it prints the "starting to read file ..."
    immediately.


    On Wed, Aug 19, 2009 at 1:17 PM, Josh Cheek <> wrote:

    > A version of Mike Sassak's gist
    >
    > start = Time.now
    > printf "Starting to read file ...\nThe number of tokens is: %d.\nIt took
    > %.2f ms\n" , File.open(ARGV[0]){|f| f.inject(0){|a,l| a+l.split.length } } ,
    > (Time.now - start) * 1000
    >
    > I won't call it elegant, that seems subjective to me, but I do appreciate
    > brevity.
    >
    >
    >
    > On Wed, Aug 19, 2009 at 12:52 PM, Ben Christensen <
    > > wrote:
    >
    >> pharrington, in your response you stated:
    >>
    >> "as the code that happens is neither the most "elegant" *nor* fastest
    >> Ruby can do."
    >>
    >> Can you please provide me a re-write of the Ruby code I used that is
    >> elegant and fast so I can learn from you?
    >>
    >> I consider myself quite advanced in Java (14 years of experience there)
    >> but obviously do not have experience in Ruby for performance tuning and
    >> optimization.
    >>
    >> I would appreciate your demonstration of how to perform the task I have
    >> attempted in Ruby using an appropriate "Ruby" approach that achieves the
    >> highest performance possible and the "elegance" spoke of.
    >>
    >> Thank you.
    >>
    >> Ben
    >>
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>
    >>

    >
    Josh Cheek, Aug 19, 2009
    #15
  16. Robert Klemme wrote:
    > On 19.08.2009 15:31, Ben Christensen wrote:
    >
    >> http://benjchristensen.com/2009/08/18/initial-impressions-on-ruby-performance/
    >>

    >
    > 1.9* is significantly better. I did not try JRuby yet.
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/javac FileReadParse.java
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ java -cp . FileReadParse
    > Starting to read file...
    > The number of tokens is: 1122
    > It took 16 ms
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ allruby file_read_parse.rb
    > ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
    > Starting to read file ...
    > The number of tokens is: 1122
    > It took 3.0 ms
    > ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
    > Starting to read file ...
    > The number of tokens is: 1122
    > It took 2.0 ms
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ wc file_test.txt
    > 190 1114 7579 file_test.txt
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $
    >
    >
    > ====================================================================
    >
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ !w
    > wc file_test.txt x
    > 95000 557000 3789500 file_test.txt
    > 68970 404382 2751177 x
    > 163970 961382 6540677 insgesamt
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ java -cp . FileReadParse
    > Starting to read file...
    > The number of tokens is: 561000
    > It took 359 ms
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ !a
    > allruby file_read_parse.rb
    > ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
    > Starting to read file ...
    > The number of tokens is: 561000
    > It took 1395.0 ms
    > ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-cygwin]
    > Starting to read file ...
    > The number of tokens is: 561000
    > It took 872.0 ms
    >
    > robert@fussel /cygdrive/c/Temp/frp
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $ /cygdrive/c/Programme/Java/jdk1.6.0_14/bin/java -server -cp .
    > FileReadParse
    > Starting to read file...
    > The number of tokens is: 561000
    > It took 515 ms
    >
    > robert@fussel /cygdrive/c/Temp/frp
    > $
    >
    > Cheers
    >
    > robert
    >

    $ java FileReadParse
    Starting to read file...
    The number of tokens is: 284717
    It took 333 ms
    rthompso@raker>~

    $ ruby wcinline.rb uscities.txt
    Starting to read file ...
    284717
    It took 211.72 ms
    rthompso@raker>~

    $ time wc uscities.txt
    141989 284717 7449038 uscities.txt

    real 0m0.333s
    user 0m0.307s
    sys 0m0.006s

    $ java -version
    java version "1.6.0_15"
    Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
    Java HotSpot(TM) Server VM (build 14.1-b02, mixed mode)

    $ ruby -v
    ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-linux]

    Not sure how Gentoo handles the java, but all other exes on the box are compiled
    CFLAGS="-march=prescott -O2 -g -pipe" with splitdebug enabled
    dual core
    Linux raker 2.6.30-gentoo-r4 #2 SMP PREEMPT Wed Aug 5 11:51:00 EDT 2009 i686
    Intel(R) Core(TM)2 CPU 6320 @ 1.86GHz GenuineIntel GNU/Linux


    wcinline.rb quickly hacked from
    http://en.literateprograms.org/Special:Downloadcode/Word_count_(C)
    and
    http://github.com/remogatto/ffi-inl...c0778e12218d3ffa83e3f823acaf/examples/ex_1.rb

    $ cat wcinline.rb
    require 'ffi-inliner'

    module MyLib
    extend Inliner
    inline '#include <stdio.h>
    #include<ctype.h>

    int n;

    void wc(const char *fname)
    {
    int ch;
    int chars=0;
    int words=0;
    int lines=0;
    int sp=1;
    FILE *fp;

    if(fname[0]!=055) fp=fopen(fname, "r");
    else fp=stdin;
    if(!fp) return -1;

    while((ch=getc(fp))!=EOF) {
    if(isspace(ch)) sp=1;
    else if(sp) {
    ++words;
    sp=0;
    }
    }

    if(fname[0]!=055) fclose(fp);

    printf("% 8d\n", words);
    }'
    end

    class Foo
    include MyLib
    end

    # get the start time
    start = Time.now

    puts "Starting to read file ..."

    Foo.new.wc(ARGV[0])

    puts "It took " + ((Time.now-start)*1000).to_s + " ms"
    Reid Thompson, Aug 19, 2009
    #16
  17. On Wed, Aug 19, 2009 at 8:31 AM, Ben
    Christensen<> wrote:
    > I've started down a path of doing various performance tests to see what
    > kind of impact will occur by using Ruby and in my first test the numbers
    > are very poor - so poor that I have to question if I'm doing something
    > wrong.


    1.8.6 is pretty slow, compared to other impls. Ruby 1.9 and JRuby will
    perform better, as shown by a few folks. JRuby on a Java 6 JVM with
    --fast and --server should perform very well.

    I'm also pretty confident that I can get JRuby within a few times Java
    performance for non-numeric CPU-intensive tasks. Just not sure when it
    will be a priority to make it happen.

    - Charlie
    Charles Oliver Nutter, Aug 19, 2009
    #17
  18. Ben Christensen

    Guest

    On Wed, Aug 19, 2009 at 5:05 PM, Charles Oliver
    Nutter<> wrote:
    > On Wed, Aug 19, 2009 at 8:31 AM, Ben
    > Christensen<> wrote:
    >> I've started down a path of doing various performance tests to see what
    >> kind of impact will occur by using Ruby and in my first test the numbers
    >> are very poor - so poor that I have to question if I'm doing something
    >> wrong.

    >
    > 1.8.6 is pretty slow, compared to other impls. Ruby 1.9 and JRuby will
    > perform better, as shown by a few folks. JRuby on a Java 6 JVM with
    > --fast and --server should perform very well.


    And, of course JRuby adds other possibilities:

    $ java FileReadParse
    Starting to read file...
    The number of tokens is: 234937
    It took 2098 ms

    $ java FileReadParse
    Starting to read file...
    The number of tokens is: 234937
    It took 788 ms

    $ ruby -v file_read_parse.rb
    ruby 1.8.2 (2004-12-25) [powerpc-darwin8.0]
    Starting to read file ...
    The number of tokens is: 234937
    It took 2666.646 ms

    $ jruby -v file_read_parse.rb
    jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
    Client VM 1.5.0_16) [ppc-java]
    Starting to read file ...
    The number of tokens is: 234937
    It took 3120.0 ms

    $ jruby --fast --server -v file_read_parse.rb
    jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
    Client VM 1.5.0_16) [ppc-java]
    Starting to read file ...
    The number of tokens is: 234937
    It took 2809.0 ms

    $ jruby -v file_read_parse-2.rb
    jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
    Client VM 1.5.0_16) [ppc-java]
    Starting to read file...
    The number of tokens is: 234937
    It took 593 ms

    $ java FileReadParse
    Starting to read file...
    The number of tokens is: 234937
    It took 588 ms

    $ jruby -v file_read_parse-2.rb
    jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM)
    Client VM 1.5.0_16) [ppc-java]
    Starting to read file...
    The number of tokens is: 234937
    It took 595 ms

    $ cat file_read_parse-2.rb
    require 'java'
    java_import 'FileReadParse'

    FileReadParse.new.do_stuff

    :)
    , Aug 20, 2009
    #18
  19. @Mike

    Thank you for providing the Gist link to a file.
    (http://gist.github.com/170476)

    However, the changes don't improve the performance when I take into
    account what was removed and I had in there on purpose. Take note of
    item #2 below.

    1) Object structure

    The modified code removed all of the class/object structure, which I
    purposefully had in there to simulate this being an object within a
    larger project.

    That being said, converting the lines of code we're discussing for
    performance into a script means nothing to this discussion - but I
    purposefully am writing the code in an OO style with classes as opposed
    to scripts.

    I was also purposefully making the Java and Ruby versions as similar to
    each other so as to allow a performance comparison to be done with as
    little difference as possible in approaching the code.

    2) Counting versus Using the Tokens

    In the modified code, it is now just counting the tokens:

    num += l.split.length

    Obviously that is faster than what I had in the original code. Again
    however, I'm doing this on purpose.

    Counting the number of tokens in an of itself is not all that I was
    doing in the original code or in the Java version. To simulate more
    closely what actually occurs in a functional system I am:

    - assigning the array of tokens to a variable
    - iterating the tokens to do something with each of them

    In this case I'm just assigning each token to another variable and then
    performing the count.

    In a real world use I'd perform some function on the text, put it
    somewhere, whatever.

    This change accounts for the difference in time from "7965.289 ms" to
    "4821.399 ms" when I run the original code and the modified code.

    So yes, the modified code is "faster", but it's not doing the same thing
    as the original and therefore not a valid comparison.


    What I gather therefore from looking at your changes, is that there
    really isn't anything different for me to do in the code - that I am in
    fact using the proper API calls and techniques and there is nothing
    special.

    For example, in Java there are 2 ways of doing this:

    a) String.split - which uses REGEX and is much slower as it's intended
    for pattern matching, not simple tokenization
    b) StringTokenizer - intended for tokenization on a delimiter instead of
    REGEX and much faster

    Therefore, I'm using option (b) in Java. I was curious if I was
    mistakenly using a slower technique of Ruby when in fact there was a
    faster alternative.
    --
    Posted via http://www.ruby-forum.com/.
    Ben Christensen, Aug 20, 2009
    #19
  20. @Matthew K. Williams

    -- 1. How *often* are you going to be processing these files? If they
    are
    -- batch style jobs, then does absolute speed matter over
    maintainability?

    The particular application I'm looking at in the future has a virtually
    continuous feed of incoming data from multiple concurrent sources.

    Thus I'm looking at what language the processing code would be in. My
    default go to is Java - but I want to consider Ruby and not blindly just
    use what I'm accustomed to before establishing what will likely be in
    existence for the next 3-5 years.

    In an existing system doing similar data processing, it is indeed a
    batch process - but one that preferably didn't exist - thus the concept
    of potentially doubling the time isn't appealing - as it's already a
    thorn in the side of operations at which hardware is thrown to
    alleviate.

    In another system we horizontally cluster and shard data processing as
    much as possible to parallelize the effort - and do as much as we can to
    optimize performance. For example, daily jobs are required, but the
    volume of data progressed to where the old system was taking days to
    process a single job - hence the new system which now handles a job in
    4-6 hours - and we're looking at other ways of reducing that further but
    so far their cost exceeds business value for now.


    -- 2. Are there any reasons to not keep the data in a database and then
    -- perform queries, etc.?

    SQL is far slower at handling this type of processing in most cases with
    large volumes of data where the incremental inefficiencies of things
    like REGEX and SQL really add up over 10s of millions of executions.

    I have recently dealt with a large database (100+ GB) where to achieve
    the necessary performance thresholds we finally had to revert to the use
    of C to write UDFs in MySQL that could process the data efficiently
    without needing to pull the data out of the database, process in Java
    then re-insert, and therefore create huge IO burdens. It was an order of
    magnitude or two faster using this approach rather than straight SQL
    and/or pulling the data out to process externally.

    This is a rare thing - this project was the first time I've ever had to
    do that due to very unique needs of the project.

    Generally however I have Java in asynchronous processes doing the data
    processing and manipulation.

    The analysis of Ruby performance doing these types of jobs was intended
    to find what cost the adoption of Ruby would incur.

    It appears that Ruby is not well suited to data processing type
    applications from what I've seen and heard so far.

    In another simple test I did where I was iterating over a large amount
    of data, I was shocked at how poorly the Ruby implementation did. It
    seems the looping itself was a very inefficient action in the Ruby
    interpreter.

    Hopefully this helps provide some context to my questions about Ruby in
    regards to batch process of data.
    --
    Posted via http://www.ruby-forum.com/.
    Ben Christensen, Aug 20, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    354
    =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=
    Feb 26, 2007
  2. Mukesh
    Replies:
    2
    Views:
    893
    Walter Wang [MSFT]
    Jun 14, 2007
  3. Skybuck Flying

    Call oddities: &Test() vs &Test vs Test

    Skybuck Flying, Oct 4, 2009, in forum: C Programming
    Replies:
    1
    Views:
    678
    Skybuck Flying
    Oct 4, 2009
  4. Martin O'Brien

    C Test Incorrectly Uses printf() - Please Confirm

    Martin O'Brien, Aug 8, 2010, in forum: C Programming
    Replies:
    147
    Views:
    2,592
    Shao Miller
    Aug 21, 2010
  5. Martin
    Replies:
    4
    Views:
    269
    Ben Bacarisse
    Aug 10, 2010
Loading...

Share This Page