Using split to count matches, but exclude certain patterns

Discussion in 'Perl Misc' started by surfitupdotcom@gmail.com, Aug 1, 2007.

  1. Guest

    I have script that recursively greps for a term and counts the
    occurrences of it in each file. It works fine but now I want to
    exclude matches where the term has an underscore in front or after
    it. I have tried to continue using split on (not underscore)
    $search_term(not underscore) in below examples but my results are not
    right yet. Input is a string in $grep_out and I want to count any
    number of occurrences. I can not break string up into words since a
    correct match may not have spaces or any certain character around it.
    Let me know if I have not provided enough info, or should post whole
    script.... Thanks in advance for any assist, John

    Attempts so far:
    # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
    $grep_out);
    # @surewords = split(/\_{0}${search_term}\_{0}/im,
    $grep_out);
    @surewords = split(/[^\_]${search_term}[^\_]/im,
    $grep_out);
    # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
    $grep_out);
     
    , Aug 1, 2007
    #1
    1. Advertising

  2. Paul Lalli Guest

    On Aug 1, 3:22 pm, wrote:
    > I have script that recursively greps for a term and counts the
    > occurrences of it in each file. It works fine but now I want to
    > exclude matches where the term has an underscore in front or after
    > it. I have tried to continue using split on (not underscore)
    > $search_term(not underscore) in below examples but my results are not
    > right yet. Input is a string in $grep_out and I want to count any
    > number of occurrences. I can not break string up into words since a
    > correct match may not have spaces or any certain character around it.
    > Let me know if I have not provided enough info, or should post whole
    > script.... Thanks in advance for any assist, John
    >
    > Attempts so far:
    > # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
    > $grep_out);


    _ is not special. No need to backslash it. This code says to split
    on any $search_term that is not *immediately* preceded by or
    *immediately* followed by an underscore. Is that what you meant?

    > # @surewords = split(/\_{0}${search_term}\_{0}/im,
    > $grep_out);


    A quantifier of {0} is a no-op. Frankly, I think that should be a
    syntax error, or at least a warning.

    > @surewords = split(/[^\_]${search_term}[^\_]/im,
    > $grep_out);


    This says to include the not-underscore character in the split
    delimiter.

    > # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
    > $grep_out);


    That's a modification of the above, allowing $search_term to come at
    the beginning or end of the string as well.


    Please provide some sample input and sample output, so people have a
    chance to know what it is you're trying to acheive. This and other
    good advice can be found in the Posting Guidelines, which are posted
    here twice a week.

    Paul Lali
     
    Paul Lalli, Aug 1, 2007
    #2
    1. Advertising

  3. Paul Lalli Guest

    On Aug 1, 3:22 pm, wrote:
    > I have script that recursively greps for a term and counts the
    > occurrences of it in each file. It works fine but now I want to
    > exclude matches where the term has an underscore in front or after
    > it. I have tried to continue using split


    As a side note to my other response, split() is a very bad way to
    attempt to count occurrences of a string:

    $ perl -le'
    print scalar(@foo = split /foo/, "barfoobazfoobiff");
    print scalar(@foo = split /foo/, "barfoobazbifffoo");
    print scalar(@foo = split /foo/, "barbazbifffoofoo");
    '
    3
    2
    1


    I rather strongly suggest you read:
    $ perldoc -q count
    Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
    How can I count the number of occurrences of a substring
    within a string?

    Paul Lalli
     
    Paul Lalli, Aug 1, 2007
    #3
  4. Guest

    On Aug 1, 2:41 pm, Paul Lalli <> wrote:
    > On Aug 1, 3:22 pm, wrote:
    >
    > > I have script that recursively greps for a term and counts the
    > > occurrences of it in each file. It works fine but now I want to
    > > exclude matches where the term has an underscore in front or after
    > > it. I have tried to continue using split

    >
    > As a side note to my other response, split() is a very bad way to
    > attempt to count occurrences of a string:
    >
    > $ perl -le'
    > print scalar(@foo = split /foo/, "barfoobazfoobiff");
    > print scalar(@foo = split /foo/, "barfoobazbifffoo");
    > print scalar(@foo = split /foo/, "barbazbifffoofoo");
    > '
    > 3
    > 2
    > 1
    >
    > I rather strongly suggest you read:
    > $ perldoc -q count
    > Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
    > How can I count the number of occurrences of a substring
    > within a string?
    >
    > Paul Lalli


    You read me correctly, idea was to split on any occurrence of my
    search term that does not have an underscore before or after it.
    Counting matches using split worked fine until I tried to exclude
    certain patterns. I will look at the perldoc you suggested but here
    is more info for the thread. Thanks, John

    Sample input: super _super_ _super super SUPER SUPER_ blahsuper
    Desired output: super super SUPER super

    Current output using split(/(?<!_)${search_term}(?!_)/i, $grep_out);
    Array contents- _super_ _super SUPER_ blah
     
    , Aug 1, 2007
    #4
  5. <> wrote:

    > I have script that recursively greps



    > Attempts so far:
    > # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
    > $grep_out);
    > # @surewords = split(/\_{0}${search_term}\_{0}/im,
    > $grep_out);
    > @surewords = split(/[^\_]${search_term}[^\_]/im,
    > $grep_out);
    > # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
    > $grep_out);



    There is no recursion anywhere in that code.

    Perhaps you meant "repeatedly" instead?


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad McClellan, Aug 2, 2007
    #5
  6. Guest

    On Aug 1, 4:42 pm, Tad McClellan <> wrote:
    > <> wrote:
    > > I have script that recursively greps
    > > Attempts so far:
    > > # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
    > > $grep_out);
    > > # @surewords = split(/\_{0}${search_term}\_{0}/im,
    > > $grep_out);
    > > @surewords = split(/[^\_]${search_term}[^\_]/im,
    > > $grep_out);
    > > # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
    > > $grep_out);

    >
    > There is no recursion anywhere in that code.
    >
    > Perhaps you meant "repeatedly" instead?
    >
    > --
    > Tad McClellan
    > email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


    The recursion is elsewhere in the script. By the time it gets to this
    split each line of $grep_out has one or more hits of the search term.
     
    , Aug 2, 2007
    #6
  7. Guest

    On Aug 1, 8:55 pm, ""
    <> wrote:
    > On Aug 1, 1:02 pm, wrote:
    >
    > (snipped)
    >
    >
    >
    > > You read me correctly, idea was tospliton any occurrence of my
    > > search term that does not have an underscore before or after it.
    > > Counting matches usingsplitworked fine until I tried to exclude
    > > certain patterns. I will look at the perldoc you suggested but here
    > > is more info for the thread. Thanks, John

    >
    > > Sample input: super _super_ _super super SUPER SUPER_ blahsuper
    > > Desired output: super super SUPER super

    >
    > How did you plan on getting rid of the 'blah' substring by
    > doing asplit?
    >
    >
    >
    > > Current output usingsplit(/(?<!_)${search_term}(?!_)/i, $grep_out);
    > > Array contents- _super_ _super SUPER_ blah

    >
    > Your description said 'a underscore before ... OR
    > a underscore after'; so you also need an "OR" in your
    > regular expression. This is known as "Alternation"
    > (see perldoc perlre).
    >
    > use Data::Dumper;
    >
    > my $term = 'super';
    >
    > my $string = 'super _super_ _super super SUPER SUPER_ blahsuper';
    >
    > my @fragments =split(
    > /_\Q$term\E_? # exclude term with underscore in front
    > # (optional trailing _)
    > | # OR
    > _?\Q$term\E_/xi # exclude term with underscore afterward
    > # (optional leading _)
    > , $string);
    >
    > print Dumper \@fragments;
    >
    > __END__
    >
    > I get:
    >
    > $VAR1 = [
    > 'super ',
    > ' ',
    > ' super SUPER ',
    > ' blahsuper'
    > ];
    >
    > Is that what you wanted? As Paul said, there's
    > probably a better way to "count" things than
    > usingsplit.
    >
    > --
    > Hope this helps,
    > Steven



    Thanks all for the assist. After further experimentation I did switch
    to using option other than split for this task. I did sharpen my
    regexp along the way so everything worked out. Take care, John
     
    , Aug 3, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter
    Replies:
    2
    Views:
    5,814
    Steve C. Orr, MCSD
    Aug 20, 2003
  2. Stephan Bour

    Extracting matches from Regex.Split

    Stephan Bour, Oct 29, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    2,531
    Stephan Bour
    Oct 30, 2003
  3. crichmon
    Replies:
    4
    Views:
    487
    Mabden
    Jul 7, 2004
  4. Jeremy
    Replies:
    10
    Views:
    1,319
    Tim Chase
    Jan 13, 2010
  5. Geoff Cox

    how to count matches?

    Geoff Cox, Apr 28, 2005, in forum: Perl Misc
    Replies:
    4
    Views:
    129
    Geoff Cox
    Apr 28, 2005
Loading...

Share This Page