New at regexp and Ruby need help on parsing a string.

G

Gabra Kadabra

I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me
 
7

7stud --

Gabra said:
I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.

Please help me

Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>[], 'not'=>[]}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

case pieces[index]
when 'and '
smoking_table['and'] << pieces[index+1].strip
index +=2
when 'not '
smoking_table['not'] << pieces[index+1].strip
index += 2
else
index += 1
end

end

p smoking_table
 
R

Raul Parolari

Gabra said:
I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

=> {:and => ["stuff", "nice things", "girlsandboys"],
:not => ["bad girls", "greasy boys"]}


I hope that this works for you,

Raul
 
7

7stud --

7stud said:
Try this:

str = 'and stuff and nice things not bad girls not greasy boys
and girlsandboys'

smoking_table = {'and'=>[], 'not'=>[]}

pieces = str.split(/(and |not )/)
len = pieces.length

index = 0
while index < len

case pieces[index]
when 'and '
smoking_table['and'] << pieces[index+1].strip
index +=2
when 'not '
smoking_table['not'] << pieces[index+1].strip
index += 2
else
index += 1
end

end

p smoking_table

Normally when you split() a string, you do something like this:

str = 'aXbXc'
pieces = str.split('X')
p pieces
-->["a", "b", "c"]

Notice that the pattern you use to split the string is not part of the
results-it's chopped out of the string and the pieces are what's left
over. However, there is a little known feature where if your split
pattern has a group in it, which is formed by putting parenthesis around
part of the patten, then the group will be returned in the results. I
used parentheses around the whole split pattern to get a result array
like this:

["", "and ", "stuff ", "and ", "nice things ", "not ", "bad girls ",
"not ", "greasy boys\n", "and ", "girlsandboys"]

By including the split pattern in the results, you can see that each
piece of the string is preceded by either 'and ' or 'not '. The 'and '
or 'not ' then serves as an identifier for each piece of the string.
 
G

Gabra Kadabra

Raul said:
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.
 
P

Peter Vanderhaden

Raul,
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.
Thanks,
PV

Raul said:
Gabra said:
I'm building a little test console for a ruby project. When using a
function I might get something like this:

input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"

As you already have guessed, I want the following in some kind of
format:

smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}

Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me

# One possible implementation is:

smoking_table = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

=> {:and => ["stuff", "nice things", "girlsandboys"],
:not => ["bad girls", "greasy boys"]}


I hope that this works for you,

Raul
 
7

7stud --

Gabra said:
Raul said:
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Don't be fooled by one liners. Ruby syntax allows you to string
multiple method calls together in a compact way--yet the result can be
very inefficient. Whenever I see a one liner with multiple method calls
strung together and regex's sprinkled in for good measure, I immediately
assume there is a more efficient solution. The solution I posted is a
case in point: even though it has five times the number of lines, it is
70% faster on my system than the one liner you find so alluring.

In addition, I find one liners hard to decipher, and since I don't
aspire to write hard to read code that is also inefficient, I rarely try
to cram a whole program into a single line.

Use the p command instead of puts to get the nice dictionary format.
 
R

Raul Parolari

Gabra said:
Raul said:
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end

I think Raul just convinced me that I really need to start a deep
relationship with regexp.
This is magic in one row, readable in three.

Thanks.

I totally agree with you; this is not a subject that you learn 'just
trying' or even reading the forum. Start with calm from the basics with
a good book, and soon those funny hieroglyphics will become your
friends.

By the way, the code above did not deal with 'notorious bad girls' (I
mean words beginning with 'not'); I had only checked for an absence of
prefix, not of suffix. So, here it is (the '\b' before and after a word
makes sure that it is indeed a 'word'):

str = "and stuff and nice things not notorious bad girls not greasy boys
and girslsandboys"

h = { :and => [], :not => [] }

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip) }
end

p h # => {:and=>["stuff", "nice things", "girslsandboys"],
# :not=>["notorious bad girls", "greasy boys"]}


Peter Vanderhaden wrote
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out ..

By default, the puts/print methods for hashes concatenate keys and
values; you can use 'p' (or 'puts inspect') to see the hash. If you are
in irb, just writing the name of the hash will show it to you.

Regards
Raul
 
R

Raul Parolari

When I typed the final solution, an unwanted '}' got in. I post again
the code:

str.scan(/ (and|not) (.*?) (?= (\b(and|not)\b)|$) /x) do |k, v|
h[k.to_sym].push(v.strip)
end

Regards
Raul
 
R

RichardOnRails

Raul,
Interesting solution. One question, how did you print the output? I'm
a newbie and the output I got when I tried your solution came out like:

andstuffnice thingsgirlsboysnotbad girlsgreasy boys

I used puts smoking_table. I'm assuming that's not the correct way to
do it.
Thanks,
PV





Raul said:
Gabra said:
I'm building a little test console for a ruby project. When using a
function I might get something like this:
input_string ="and stuff and nice things not bad girls not greasy boys
and girlsandboys"
As you already have guessed, I want the following in some kind of
format:
smoking_table = {"and" => ["stuff", "nice things", "girlsandboys"],
"not" => ["bad girls","greasy boys"]}
Thus, a regexp that splits a string on code words like "and" and "not"
is what I need.
Please help me
# One possible implementation is:
smoking_table = { :and => [], :not => [] }
str.scan(/ (and|not) (.*?) (?= \band|\bnot|$) /x) do |k, v|
smoking_table[k.to_sym].push(v.strip)
end
=> {:and => ["stuff", "nice things", "girlsandboys"],
:not => ["bad girls", "greasy boys"]}
I hope that this works for you,

p smoking_table

(Same as stud's example).

HTH,
Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top