Re: trying to use regex(s) to clean up text

Discussion in 'Java' started by Jussi Piitulainen, Sep 9, 2008.

  1. Albretch Mueller writes:

    > I am trying to cleanse slashdot comments usig regexs but some
    > patterns I am not getting right (see cases bellow)

    ....
    > // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ should match:
    > An the solution is.... (Score:5, Insightful)
    > Foxconn? (Score:4, Informative)

    ....

    You seem to expect \w to match spaces and punctuation. It won't.
     
    Jussi Piitulainen, Sep 9, 2008
    #1
    1. Advertising

  2. Jussi Piitulainen <> wrote:
    > Albretch Mueller writes:
    >
    >> I am trying to cleanse slashdot comments usig regexs but some
    >> patterns I am not getting right (see cases bellow)

    > ...
    >> // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ should match:
    >> An the solution is.... (Score:5, Insightful)
    >> Foxconn? (Score:4, Informative)

    > ...
    > You seem to expect \w to match spaces and punctuation. It won't.


    Or, more precise:
    \w does not match "a word", but only certain single characters.

    // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ *would* match:
    Anthesolutionis... (Score:5, I)
    Foxconn... (Score:4, i)
     
    Andreas Leitgeb, Sep 9, 2008
    #2
    1. Advertising

  3. Andreas Leitgeb writes:
    > Jussi Piitulainen wrote:
    >> Albretch Mueller writes:
    >>
    >>> I am trying to cleanse slashdot comments usig regexs but some
    >>> patterns I am not getting right (see cases bellow)

    >> ...
    >>> // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ should match:
    >>> An the solution is.... (Score:5, Insightful)
    >>> Foxconn? (Score:4, Informative)

    >> ...
    >> You seem to expect \w to match spaces and punctuation. It won't.

    >
    > Or, more precise:
    > \w does not match "a word", but only certain single characters.


    You're right, I didn't notice the missing + in the final \w. Among
    other things.

    Perhaps [^()]+ \(Score:[1-5], \w+\)?

    > // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ *would* match:
    > Anthesolutionis... (Score:5, I)
    > Foxconn... (Score:4, i)
     
    Jussi Piitulainen, Sep 9, 2008
    #3
  4. Jussi Piitulainen

    Tom Anderson Guest

    On Tue, 9 Sep 2008, Jussi Piitulainen wrote:

    > Andreas Leitgeb writes:
    >> Jussi Piitulainen wrote:
    >>> Albretch Mueller writes:
    >>>
    >>>> I am trying to cleanse slashdot comments usig regexs but some
    >>>> patterns I am not getting right (see cases bellow)
    >>> ...
    >>>> // __ pattern: ^\w+\.\.\. \(Score:[1-5], \w\)$ should match:
    >>>> An the solution is.... (Score:5, Insightful)
    >>>> Foxconn? (Score:4, Informative)
    >>> ...
    >>> You seem to expect \w to match spaces and punctuation. It won't.

    >>
    >> Or, more precise:
    >> \w does not match "a word", but only certain single characters.

    >
    > You're right, I didn't notice the missing + in the final \w. Among
    > other things.


    His REAL problem, of course, is that he's trying to extract information
    from Slashdot. This is a bit like hoping the right filter settings will
    produce sweet music from white noise. :)

    tom

    --
    Baby got a masterplan. A foolproof masterplan.
     
    Tom Anderson, Sep 10, 2008
    #4
  5. Jussi Piitulainen

    Tom Anderson Guest

    On Wed, 10 Sep 2008, bugbear wrote:

    > Tom Anderson wrote:
    >> His REAL problem, of course, is that he's trying to extract information
    >> from Slashdot. This is a bit like hoping the right filter settings will
    >> produce sweet music from white noise. :)

    >
    > Many analogue synthesizers do exactly that, remarkably!


    That rather depends on your opinion of synthesizer music!

    But point taken - i should choose my analogies more carefully in future.

    tom

    --
    And dear lord, its like peaches in a lacy napkin. -- James Dearden
     
    Tom Anderson, Sep 10, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. darrel
    Replies:
    5
    Views:
    547
    Karl Seguin [MVP]
    Feb 24, 2006
  2. Replies:
    8
    Views:
    515
  3. Replies:
    3
    Views:
    773
    Reedick, Andrew
    Jul 1, 2008
  4. merrittr

    trying to use regex

    merrittr, Jun 20, 2007, in forum: Ruby
    Replies:
    3
    Views:
    179
    Drew Olson
    Jun 20, 2007
  5. js
    Replies:
    1
    Views:
    92
    Shawn Milo
    May 21, 2004
Loading...

Share This Page