Parsing text with regular expression

  • Thread starter Sebastian probst Eide
  • Start date
S

Sebastian probst Eide

Hi
I am writing a class that parses text. It checks each word and counts
how many times they occur in the text. It also checks for 'special'
words, that being words that are capitalized, all upper case or in mixed
case, and ads a flag to those words and checks that the words that are
not special fulfill a certain length requirement. The information is
stored in a hash like this:

{'word' => {:count => 1, :special => false}, 'other_word' => {:count=>
3, :special => true}}

Everything is working fine so far. The thing I am struggling to
implement though is the following:
I want to be able to check the context the 'special' words are in to see
if a capitalized special word maybe only is capitalized because it is
the first word in a new sentence or something like that.

I thought I could check by looking for something like this:

text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
and if I got something else than 0 as a result it would mean that the
word is in the beginning of a sentence. But how do I insert a variable
into the regular expression? Or is there a different much cleverer way
to do this sort of check?

Currently I am scanning for each word like this:

_inn.scan(/\w{2,}[-\w]?/i) do |word|
...
end

and then doing the checking of the words inside that iterator.

Hope you have understood my problem and that you can point me in the
right direction.

best regards
Sebastian
 
T

Timothy Hunter

Sebastian said:
I thought I could check by looking for something like this:

text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
and if I got something else than 0 as a result it would mean that the
word is in the beginning of a sentence. But how do I insert a variable
into the regular expression?
Use #{}, like this

word = "hello"

test =~ /[[:punct:]]\s?#{word}/

"word" can be any regular expression.
 
S

Sebastian probst Eide

Timothy said:
Sebastian said:
I thought I could check by looking for something like this:

text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
and if I got something else than 0 as a result it would mean that the
word is in the beginning of a sentence. But how do I insert a variable
into the regular expression?
Use #{}, like this

word = "hello"

test =~ /[[:punct:]]\s?#{word}/

"word" can be any regular expression.

Huh... that was the first thing I tried... must have done something else
wrong too in the same expression because it didn't work... I'll try
again.
Thanks Timothy

Sebastian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top