Remove short words from a string

Discussion in 'Perl Misc' started by Leif Wessman, Oct 25, 2006.

  1. Leif Wessman

    Leif Wessman Guest

    Hi all!

    How can I remove all words that have a length that is 3 or less?

    "a lot of words in this text";

    should become

    "words this text"

    Is it possible?

    Leif
    Leif Wessman, Oct 25, 2006
    #1
    1. Advertising

  2. Leif Wessman

    Paul Lalli Guest

    Leif Wessman wrote:
    > How can I remove all words that have a length that is 3 or less?
    >
    > "a lot of words in this text";
    >
    > should become
    >
    > "words this text"
    >
    > Is it possible?


    Yes, it's possible.

    There are two approaches which jump out at me. You could use a
    join/grep/split combination. Or you could use a regexp solution being
    sure to include word boundaries.

    What have you tried so far? How did it not work as you expected?

    Paul Lalli
    Paul Lalli, Oct 25, 2006
    #2
    1. Advertising

  3. Leif Wessman

    Ted Zlatanov Guest

    On 25 Oct 2006, wrote:

    > "Leif Wessman" <> writes:
    >
    >> Hi all!
    >>
    >> How can I remove all words that have a length that is 3 or less?
    >>
    >> "a lot of words in this text";
    >>
    >> should become
    >>
    >> "words this text"
    >>
    >> Is it possible?
    >>
    >> Leif
    >>

    >
    > Here is your hint.
    > grep { length > 3 } @words;


    That's not a good hint.

    Ted
    Ted Zlatanov, Oct 25, 2006
    #3
  4. Leif Wessman

    Ted Zlatanov Guest

    On 25 Oct 2006, wrote:

    > How can I remove all words that have a length that is 3 or less?
    >
    > "a lot of words in this text";
    >
    > should become
    >
    > "words this text"


    Solution below. Note that your requirement ("remove all words...")
    does not match the expected result, since you are also removing
    whitespace around the words. That's why I added the second regex.
    Still, the leading space is preserved. You can either add a third
    regex to eliminate leading spaces, or you can split on ' '.

    Keep in mind that if you split on ' ' you still won't have "words"
    because punctuation will be included, for example. This is why I
    would recommend against a split()/grep()/join() approach for this,
    unless you are absolutely sure you don't need to worry about
    punctuation or preserving spaces.

    Ted

    #!/usr/bin/perl

    use warnings;
    use strict;

    my $text = "a lot of words in this text";
    # note \w may not work well for you, adjust accordingly
    $text =~ s/(\w+)/length $1 > 3 ? $1 : ''/eg;
    # if you need multiple spaces collapsed to just one
    $text =~ s/\s+/ /g;
    print $text;
    Ted Zlatanov, Oct 25, 2006
    #4
  5. Leif Wessman

    Dr.Ruud Guest

    Mirco Wahab schreef:
    > Leif:


    >> How can I remove all words that have a length that is 3 or less?
    >> "a lot of words in this text";
    >> should become
    >> "words this text"

    >
    > I'll try to give a easy example and you'll
    > try to explain it line by line in your reply, ok?
    >
    > use strict;
    > use warnings;
    >
    > my $shortlen = 3;
    > my $fulltext = 'a lot of words in this text';
    > my $no_shorts = $fulltext;
    >
    > $no_shorts =~ s/ \b \w{1,$shortlen} \b \s+//gmx;


    1. Won't work well with a short last word. Maybe use "\s*" or
    "(?:\s+|$)".

    2. Maybe "\w" is too limited, it is just [[:alnum:]_], so doesn't
    contain "-", which could lead to unwanted changes, like of
    "non-essential", etc.

    > print $fulltext, "\n";
    > print $no_shorts, "\n";


    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Oct 25, 2006
    #5
  6. Leif Wessman

    -berlin.de Guest

    Ted Zlatanov <> wrote in comp.lang.perl.misc:
    > On 25 Oct 2006, wrote:
    >
    > > "Leif Wessman" <> writes:
    > >
    > >> Hi all!
    > >>
    > >> How can I remove all words that have a length that is 3 or less?


    [...]

    > > Here is your hint.
    > > grep { length > 3 } @words;

    >
    > That's not a good hint.


    What's wrong with it?

    Anno
    -berlin.de, Oct 26, 2006
    #6
  7. Leif Wessman

    Ted Zlatanov Guest

    On 26 Oct 2006, -berlin.de wrote:

    > Ted Zlatanov <> wrote in comp.lang.perl.misc:
    >> On 25 Oct 2006, wrote:
    >>
    >>> "Leif Wessman" <> writes:
    >>>
    >>>> Hi all!
    >>>>
    >>>> How can I remove all words that have a length that is 3 or less?

    >
    > [...]
    >
    >>> Here is your hint.
    >>> grep { length > 3 } @words;

    >>
    >> That's not a good hint.

    >
    > What's wrong with it?


    As I explained in my other post, the split/grep/join approach is not
    aware of punctuation and whitespace. Two spaces may become one, a
    period may count as a letter... It's just not a good solution unless
    we know for sure it's OK to use it.

    Also the hint doesn't say anything about split() and join(). It's not
    very useful. At least say "split() before, join() after" in a
    comment. Takes 4 words, and may save the OP hours of work. If I
    didn't know Perl well and got this hint, I'd be puzzled for many
    reasons.

    Finally, the OP's requirements (as I mentioned in my other post too)
    contradict each other. He's removing words of length <= 3, but the
    example he gives also eliminates whitespace.

    Ted
    Ted Zlatanov, Oct 26, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    540
    Keith Thompson
    Jan 11, 2007
  2. Replies:
    4
    Views:
    793
    Kaz Kylheku
    Oct 17, 2006
  3. BerlinBrown
    Replies:
    6
    Views:
    4,388
  4. Ioannis Vranos

    unsigned short, short literals

    Ioannis Vranos, Mar 4, 2008, in forum: C Programming
    Replies:
    5
    Views:
    651
    Eric Sosman
    Mar 5, 2008
  5. pantagruel
    Replies:
    8
    Views:
    415
    Dr John Stockton
    Jul 22, 2006
Loading...

Share This Page