T
ThomasW
Hi,
first of all I have to say I'm relatively unexperienced with Ruby and
also new to regular expressions. This causes me some problems:
I'm parsing text files and am using a lot of regexps for this.
Initially I was doing something like this:
file.each_line { |line|
if line =~ /^pattern[a]*/
process_pattern_a(line)
elsif line =~ /pat+e(rn)? b\s*$/
process_pattern_b(line)
# some more elsifs
end
}
But this was really, really slow. My suspicion is that the regexp
objects are recreated and thrown away for every iteration. Storing
all patterns in a table and referencing them like
file.each_line { |line|
if line =~ $line_patterns["pattern a"]
process_pattern_a(line)
elsif line =~ $line_patterns["pattern b"]
process_pattern_b(line)
# some more elsifs
end
}
made things tremendously faster, but I'm not really keen on storing
every regular expression that occurs somewhere in my program in this
table or as a variable. This splits up code that I would like to have
at one place and can create variable clutter.[*]
Is it the case that such "anonymous" objects like regexps (maybe also
strings?) are re-created whenever the code snippet they are defined in
is executed? If so, is there a convenient way of preventing this? Is
this only the case for regexps or also for strings and other objects?
(Why is it the case at all - I can't make any sense of it?) I would
like to learn how I can write Ruby code that is reasonably efficient
in this regard because the impact on execution time in the described
situation was so immense. (I'm currently using Ruby 1.9.1.)
Thanks!
Thomas W.
[*] I maybe could also store the regexps and the to be executed
functions in a table with the regexps as keys and the functions as
values, iterating through them until a matching regexp key was found
so that the function that is stored as a value can be executed. But
this is only possible in situations similar to the described one.
first of all I have to say I'm relatively unexperienced with Ruby and
also new to regular expressions. This causes me some problems:
I'm parsing text files and am using a lot of regexps for this.
Initially I was doing something like this:
file.each_line { |line|
if line =~ /^pattern[a]*/
process_pattern_a(line)
elsif line =~ /pat+e(rn)? b\s*$/
process_pattern_b(line)
# some more elsifs
end
}
But this was really, really slow. My suspicion is that the regexp
objects are recreated and thrown away for every iteration. Storing
all patterns in a table and referencing them like
file.each_line { |line|
if line =~ $line_patterns["pattern a"]
process_pattern_a(line)
elsif line =~ $line_patterns["pattern b"]
process_pattern_b(line)
# some more elsifs
end
}
made things tremendously faster, but I'm not really keen on storing
every regular expression that occurs somewhere in my program in this
table or as a variable. This splits up code that I would like to have
at one place and can create variable clutter.[*]
Is it the case that such "anonymous" objects like regexps (maybe also
strings?) are re-created whenever the code snippet they are defined in
is executed? If so, is there a convenient way of preventing this? Is
this only the case for regexps or also for strings and other objects?
(Why is it the case at all - I can't make any sense of it?) I would
like to learn how I can write Ruby code that is reasonably efficient
in this regard because the impact on execution time in the described
situation was so immense. (I'm currently using Ruby 1.9.1.)
Thanks!
Thomas W.
[*] I maybe could also store the regexps and the to be executed
functions in a table with the regexps as keys and the functions as
values, iterating through them until a matching regexp key was found
so that the function that is stored as a value can be executed. But
this is only possible in situations similar to the described one.