Parsing text with regular expression

Discussion in 'Ruby' started by Sebastian probst Eide, Apr 29, 2007.

  1. Hi
    I am writing a class that parses text. It checks each word and counts
    how many times they occur in the text. It also checks for 'special'
    words, that being words that are capitalized, all upper case or in mixed
    case, and ads a flag to those words and checks that the words that are
    not special fulfill a certain length requirement. The information is
    stored in a hash like this:

    {'word' => {:count => 1, :special => false}, 'other_word' => {:count=>
    3, :special => true}}

    Everything is working fine so far. The thing I am struggling to
    implement though is the following:
    I want to be able to check the context the 'special' words are in to see
    if a capitalized special word maybe only is capitalized because it is
    the first word in a new sentence or something like that.

    I thought I could check by looking for something like this:

    text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
    and if I got something else than 0 as a result it would mean that the
    word is in the beginning of a sentence. But how do I insert a variable
    into the regular expression? Or is there a different much cleverer way
    to do this sort of check?

    Currently I am scanning for each word like this:

    _inn.scan(/\w{2,}[-\w]?/i) do |word|
    ...
    end

    and then doing the checking of the words inside that iterator.

    Hope you have understood my problem and that you can point me in the
    right direction.

    best regards
    Sebastian

    --
    Posted via http://www.ruby-forum.com/.
     
    Sebastian probst Eide, Apr 29, 2007
    #1
    1. Advertising

  2. Sebastian probst Eide wrote:
    > I thought I could check by looking for something like this:
    >
    > text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
    > and if I got something else than 0 as a result it would mean that the
    > word is in the beginning of a sentence. But how do I insert a variable
    > into the regular expression?

    Use #{}, like this

    word = "hello"

    test =~ /[[:punct:]]\s?#{word}/

    "word" can be any regular expression.
     
    Timothy Hunter, Apr 29, 2007
    #2
    1. Advertising

  3. Timothy Hunter wrote:
    > Sebastian probst Eide wrote:
    >> I thought I could check by looking for something like this:
    >>
    >> text =~ /[[:punct:]]\s?WORD_I_AM_LOOKING_FOR/
    >> and if I got something else than 0 as a result it would mean that the
    >> word is in the beginning of a sentence. But how do I insert a variable
    >> into the regular expression?

    > Use #{}, like this
    >
    > word = "hello"
    >
    > test =~ /[[:punct:]]\s?#{word}/
    >
    > "word" can be any regular expression.


    Huh... that was the first thing I tried... must have done something else
    wrong too in the same expression because it didn't work... I'll try
    again.
    Thanks Timothy

    Sebastian

    --
    Posted via http://www.ruby-forum.com/.
     
    Sebastian probst Eide, Apr 29, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,390
  2. Bill Chiu
    Replies:
    4
    Views:
    447
    Uwe Schnitker
    Sep 12, 2003
  3. ArdGre
    Replies:
    9
    Views:
    500
    Mike Schilling
    Jan 9, 2007
  4. mike
    Replies:
    1
    Views:
    114
    julie lawrence
    Oct 4, 2006
  5. penny
    Replies:
    28
    Views:
    3,053
    Charlton Wilbur
    Mar 10, 2008
Loading...

Share This Page