handling of regexp objects that aren't referenced by variables,arrays, tables or objects

Discussion in 'Ruby' started by ThomasW, Sep 27, 2009.

  1. ThomasW

    ThomasW Guest

    Hi,

    first of all I have to say I'm relatively unexperienced with Ruby and
    also new to regular expressions. This causes me some problems:

    I'm parsing text files and am using a lot of regexps for this.
    Initially I was doing something like this:

    file.each_line { |line|
    if line =~ /^pattern[a]*/
    process_pattern_a(line)
    elsif line =~ /pat+e(rn)? b\s*$/
    process_pattern_b(line)
    # some more elsifs
    end
    }

    But this was really, really slow. My suspicion is that the regexp
    objects are recreated and thrown away for every iteration. Storing
    all patterns in a table and referencing them like

    file.each_line { |line|
    if line =~ $line_patterns["pattern a"]
    process_pattern_a(line)
    elsif line =~ $line_patterns["pattern b"]
    process_pattern_b(line)
    # some more elsifs
    end
    }

    made things tremendously faster, but I'm not really keen on storing
    every regular expression that occurs somewhere in my program in this
    table or as a variable. This splits up code that I would like to have
    at one place and can create variable clutter.[*]

    Is it the case that such "anonymous" objects like regexps (maybe also
    strings?) are re-created whenever the code snippet they are defined in
    is executed? If so, is there a convenient way of preventing this? Is
    this only the case for regexps or also for strings and other objects?
    (Why is it the case at all - I can't make any sense of it?) I would
    like to learn how I can write Ruby code that is reasonably efficient
    in this regard because the impact on execution time in the described
    situation was so immense. (I'm currently using Ruby 1.9.1.)

    Thanks!
    Thomas W.


    [*] I maybe could also store the regexps and the to be executed
    functions in a table with the regexps as keys and the functions as
    values, iterating through them until a matching regexp key was found
    so that the function that is stored as a value can be executed. But
    this is only possible in situations similar to the described one.
     
    ThomasW, Sep 27, 2009
    #1
    1. Advertising

  2. Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects


    > Is it the case that such "anonymous" objects like regexps (maybe also
    > strings?) are re-created whenever the code snippet they are defined in
    > is executed? If so=2C is there a convenient way of preventing this? Is
    > this only the case for regexps or also for strings and other objects?
    > (Why is it the case at all - I can't make any sense of it?) I would
    > like to learn how I can write Ruby code that is reasonably efficient
    > in this regard because the impact on execution time in the described
    > situation was so immense. (I'm currently using Ruby 1.9.1.)


    Yes=2C indeed a new object is indeed created every time an anonymous object=
    is created. The only core object I know of for which this is not true is t=
    he symbol=2C which is basically an immutable string. There may be others I'=
    m not aware of though. I suppose your code shows that there just might be a=
    need for the symbol equivalent of a regexp.
    =0A=
    _________________________________________________________________=0A=
    Hotmail=AE has ever-growing storage! Don=92t worry about storage limits.=0A=
    http://windowslive.com/Tutorial/Hotmail/Storage?ocid=3DTXT_TAGLM_WL_HM_Tuto=
    rial_Storage_062009=
     
    Ehsanul Hoque, Sep 27, 2009
    #2
    1. Advertising

  3. Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    Is this ok? But it still use variable :(

    file.each_line { |line|
    if line =~ (a ||= $line_patterns["pattern a"])
    process_pattern_a(line)
    elsif line =~ (b ||= $line_patterns["pattern b"])
    process_pattern_b(line)
    # some more elsifs
    end
    }
    --
    Posted via http://www.ruby-forum.com/.
     
    Thairuby ->a, b {a + b}, Sep 27, 2009
    #3
  4. Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    On 9/27/09, Ehsanul Hoque <> wrote:
    >> Is it the case that such "anonymous" objects like regexps (maybe also
    >> strings?) are re-created whenever the code snippet they are defined in
    >> is executed? If so, is there a convenient way of preventing this? Is
    >> this only the case for regexps or also for strings and other objects?
    >> (Why is it the case at all - I can't make any sense of it?) I would
    >> like to learn how I can write Ruby code that is reasonably efficient
    >> in this regard because the impact on execution time in the described
    >> situation was so immense. (I'm currently using Ruby 1.9.1.)

    >
    > Yes, indeed a new object is indeed created every time an anonymous object is
    > created. The only core object I know of for which this is not true is the
    > symbol, which is basically an immutable string. There may be others I'm not
    > aware of though. I suppose your code shows that there just might be a need
    > for the symbol equivalent of a regexp.


    Actually, I believe that regexp literals are created only once even if
    they're executed multiple times. The exception to this would be when
    you use #{} within a regexp... that forces ruby to not only create a
    new object each time the regexp literal is executed, it has to
    recompile the regexp each time.... and that is really slow. You can
    bypass this behavior by using the o regexp option, but that only works
    right if the value of the inclusion (what's inside #{}) is guaranteed
    to be the same on each execution.

    Thomas, are you using #{} within your regexps? If so, you should try
    sticking an o on the end of each one; that will probably solve your
    performance problem. for instance
    x =~ /foo#{bar}/o
    instead of
    x =~ /foo#{bar}/
     
    Caleb Clausen, Sep 27, 2009
    #4
  5. ThomasW

    ThomasW Guest

    On 27 Sep., 20:04, Caleb Clausen <> wrote:
    > On 9/27/09, Ehsanul Hoque <> wrote:
    >
    > > Yes, indeed a new object is indeed created every time an anonymous object is
    > > created. The only core object I know of for which this is not true is the
    > > symbol, which is basically an immutable string.


    I think that's not quite what I meant. Of course, if I define the
    same regular expression twice at different places, there would be two
    regexp objects.


    >
    > Actually, I believe that regexp literals are created only once even if
    > they're executed multiple times. The exception to this would be when
    > you use #{} within a regexp... that forces ruby to not only create a
    > new object each time the regexp literal is executed, it has to
    > recompile the regexp each time.... and that is really slow. You can
    > bypass this behavior by using the o regexp option, but that only works
    > right if the value of the inclusion (what's inside #{}) is guaranteed
    > to be the same on each execution.
    >



    Thanks so much! Your suspicion was right, I am indeed using #{} in
    some of the regular expressions, and the o option does fix the issue.
    And your explanation why the expressions would otherwise be recompiled
    in every iteration is now very obvious to me.

    Now my code is already a bit shorter :)!

    Thomas W.
     
    ThomasW, Sep 27, 2009
    #5
  6. ThomasW

    Gary Wright Guest

    Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    On Sep 27, 2009, at 11:50 AM, ThomasW wrote:
    > I'm parsing text files and am using a lot of regexps for this.
    > Initially I was doing something like this:
    >
    > file.each_line { |line|
    > if line =~ /^pattern[a]*/
    > process_pattern_a(line)
    > elsif line =~ /pat+e(rn)? b\s*$/
    > process_pattern_b(line)
    > # some more elsifs
    > end
    > }


    This example is perfect for Ruby's case statement:

    file.each_line { |line|
    case line
    when /^pattern[a]*/o
    process_pattern_a(line)
    when /pat+e(rn)? b\s*$/o
    process_pattern_b(line)
    # more when clauses
    else
    # handle no match
    end
    }



    Gary Wright
     
    Gary Wright, Sep 27, 2009
    #6
  7. Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    Thairuby ->a, b {a + b} wrote:
    > Is this ok? But it still use variable :(
    >
    > file.each_line { |line|
    > if line =~ (a ||= $line_patterns["pattern a"])
    > process_pattern_a(line)
    > elsif line =~ (b ||= $line_patterns["pattern b"])
    > process_pattern_b(line)
    > # some more elsifs
    > end
    > }


    I'm wrong typing. It would be

    file.each_line { |line|
    if line =~ (a ||= /^pattern[a]*/)
    process_pattern_a(line)
    elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
    process_pattern_b(line)
    # some more elsifs
    end
    }

    Does it have o option for string? :)
    --
    Posted via http://www.ruby-forum.com/.
     
    Thairuby ->a, b {a + b}, Sep 27, 2009
    #7
  8. ThomasW

    ThomasW Guest

    On 27 Sep., 21:57, Gary Wright <> wrote:
    > On Sep 27, 2009, at 11:50 AM, ThomasW wrote:
    >
    > > I'm parsing text files and am using a lot of regexps for this.
    > > Initially I was doing something like this:

    >
    > > file.each_line { |line|
    > >  if line =~ /^pattern[a]*/
    > >    process_pattern_a(line)
    > >  elsif line =~ /pat+e(rn)? b\s*$/
    > >    process_pattern_b(line)
    > >  # some more elsifs
    > >  end
    > > }

    >
    > This example is perfect for Ruby's case statement:
    >
    > file.each_line { |line|
    >    case line
    >    when /^pattern[a]*/o
    >     process_pattern_a(line)
    >    when /pat+e(rn)? b\s*$/o
    >     process_pattern_b(line)
    >    # more when clauses
    >    else
    >      # handle no match
    >    end
    >
    > }
    >
    > Gary Wright


    Thanks for that tip. I wasn't aware that this also works with regexp
    matches. It's great that it does! By the way, is there anything
    substantially different from an elsif chain, except for being slightly
    less typing?

    Thomas W.
     
    ThomasW, Sep 27, 2009
    #8
  9. ThomasW

    Gary Wright Guest

    Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    On Sep 27, 2009, at 4:25 PM, ThomasW wrote:
    > Thanks for that tip. I wasn't aware that this also works with regexp
    > matches. It's great that it does! By the way, is there anything
    > substantially different from an elsif chain, except for being slightly
    > less typing?


    The semantics are the same in this case but I think the
    case statement highlights the fact that you are doing a
    sequence of matches against a single object, whereas the
    standard if/then/else is a more general construct.

    Gary Wright
     
    Gary Wright, Sep 28, 2009
    #9
  10. ThomasW

    Josh Cheek Guest

    Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    [Note: parts of this message were removed to make it a legal post.]

    On Sun, Sep 27, 2009 at 3:20 PM, Thairuby ->a, b {a + b} <
    > wrote:

    > Thairuby ->a, b {a + b} wrote:
    > > Is this ok? But it still use variable :(
    > >
    > > file.each_line { |line|
    > > if line =~ (a ||= $line_patterns["pattern a"])
    > > process_pattern_a(line)
    > > elsif line =~ (b ||= $line_patterns["pattern b"])
    > > process_pattern_b(line)
    > > # some more elsifs
    > > end
    > > }

    >
    > I'm wrong typing. It would be
    >
    > file.each_line { |line|
    > if line =~ (a ||= /^pattern[a]*/)
    > process_pattern_a(line)
    > elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
    > process_pattern_b(line)
    > # some more elsifs
    > end
    > }
    >
    > Does it have o option for string? :)
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >

    Unfortunately, I don't think this does anything, because a and b are
    declared within the block, so while the scope is the same, the extent is
    not. Essentially, a and b are no longer bound, after each iteration of the
    loop. So upon entering each iteration, they do not retain their previously
    assigned values.

    This can be illustrated:

    "patterna\npatte b".each_line do |line|

    p line

    puts "defined?(a) => #{defined?(a).inspect}"
    puts "defined?(b) => #{defined?(b).inspect}"

    if line =~ (a ||= /^pattern[a]*/)
    elsif line =~ (b ||= /pat+e(rn)? b\s*$/)
    else
    end

    puts "defined?(a) => #{defined?(a).inspect}"
    puts "defined?(b) => #{defined?(b).inspect}" , ''

    end
    __END__

    Which has the following output:
    "patterna\n"
    defined?(a) => nil
    defined?(b) => nil
    defined?(a) => "local-variable(in-block)"
    defined?(b) => "local-variable(in-block)"

    "patte b"
    defined?(a) => nil
    defined?(b) => nil
    defined?(a) => "local-variable(in-block)"
    defined?(b) => "local-variable(in-block)"

    You can see, that a and b were defined after the if statement in "patterna",
    but were no longer defined before the if statement for "patte b"
     
    Josh Cheek, Sep 28, 2009
    #10
  11. Re: handling of regexp objects that aren't referenced by variables, arrays, tables or objects

    Oh, I forgot the scope of local variable :(
    Thank you very much for your explanation.
    --
    Posted via http://www.ruby-forum.com/.
     
    Thairuby ->a, b {a + b}, Sep 28, 2009
    #11
  12. ThomasW

    ThomasW Guest

    On 28 Sep., 03:09, "Thairuby ->a, b {a + b}" <>
    wrote:
    > Oh, I forgot the scope of local variable :(
    > Thank you very much for your explanation.
    > --
    > Posted viahttp://www.ruby-forum.com/.


    Thairuby, thanks anyway for your effort :).
     
    ThomasW, Sep 28, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. BillE
    Replies:
    18
    Views:
    542
    BillE
    Oct 20, 2006
  2. William McBrine
    Replies:
    1
    Views:
    482
    Tim Golden
    Mar 16, 2008
  3. Philipp
    Replies:
    21
    Views:
    1,190
    Philipp
    Jan 20, 2009
  4. Joao Silva
    Replies:
    16
    Views:
    409
    7stud --
    Aug 21, 2009
  5. Jaap Karssenberg
    Replies:
    21
    Views:
    317
    Brian McCauley
    Jul 2, 2004
Loading...

Share This Page