Suggestions for a parsing strategy?

R

Robb

Hi all,

I have input strings that can look like this:

Common, Commerc(e, ial)

I need to parse these into the three words that this represents:

Common, Commerce, Commercial.

I'm a little new to ruby, and hence wondering what direction would be
best to go in? (.scan, regexes ... something else?) For me, the
complication I'm not sure how to deal with is the two "levels" of the
comma as a separator.

Thanks,
Robb
 
D

David Masover

Hi all,

I have input strings that can look like this:

Common, Commerc(e, ial)

I need to parse these into the three words that this represents:

Common, Commerce, Commercial.

I'm a little new to ruby, and hence wondering what direction would be
best to go in? (.scan, regexes ... something else?) For me, the
complication I'm not sure how to deal with is the two "levels" of the
comma as a separator.

One way would be to find the exceptions first. Replace anything that matches
the

Commerc(e, ial)

pattern with the two words, as the literal string "Commerce, Commercial". Then
you can just do a simple split on the commas, and maybe strip whitespace.
 
E

Eric I.

Hi all,

I have input strings that can look like this:

Common, Commerc(e, ial)

I need to parse these into the three words that this represents:

Common, Commerce, Commercial.

This code does a lot of what you describe, providing the parenthetical
only appears at the end.

====

s = "Common, Commerc(e, ial), Computer, Con(ic, ehead, temporary)"

def parse_word_list(s)
s.scan(/(\w+)(\((.*?)\))?/).map { |root, junk, suffixes|
[root, suffixes && suffixes.split(", ")]
}
end

list = parse_word_list(s)

# see what's produced
p list

# use it to generate all words
list.each do |root, suffix_list|
if suffix_list
suffix_list.each do |suffix|
puts "#{root}#{suffix}"
end
else
puts root
end
end

====

Hope that helps,

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE workshops.
Please visit http://LearnRuby.com for all the details.
 
D

David Masover

s.scan(/(\w+)(\((.*?)\))?/).map { |root, junk, suffixes|

This pattern looks really useful... Looking at the docs for scan, it looks
like it can take a block.

Which just leaves one question: Why isn't this an Enumerator in Ruby 1.9? I
don't think the original meaning (of producing an array) is made much more
difficult by the form

s.scan(/.../).to_a

And I suspect that it would most often be useful for things like #map, if not
used in block form outright. Making it an Enumerator would be somewhat more
efficient than building a whole array first -- and more responsive, if it's a
large string.
 
S

Sebastian Hungerecker

David said:
Why isn't [the return value of scan] an Enumerator in Ruby 1.9?

Or 1.8.7 for that matter. Yes, I've been asking myself this very same question
since the release of 1.9.

And I suspect that it would most often be useful for things like #map, if
not used in block form outright. Making it an Enumerator would be somewhat
more efficient than building a whole array first -- and more responsive, if
it's a large string.

Also it'd allow you to use the matchdata object inside map if you need to. The
way it is now you'd have to do:
string.enum_for:)scan,/re/).map do
md = Regexp.last_match
do_something_with md
end

instead of just
string.scan.map do...end
 
T

ThoML

Making it an Enumerator would be somewhat more
efficient than building a whole array first -- and more responsive, if it's a
large string.

There is also the StringScanner class that can be used to return one
match at a time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top