Using split to count matches, but exclude certain patterns

surfitupdotcom · Aug 1, 2007

I have script that recursively greps for a term and counts the
occurrences of it in each file. It works fine but now I want to
exclude matches where the term has an underscore in front or after
it. I have tried to continue using split on (not underscore)
$search_term(not underscore) in below examples but my results are not
right yet. Input is a string in $grep_out and I want to count any
number of occurrences. I can not break string up into words since a
correct match may not have spaces or any certain character around it.
Let me know if I have not provided enough info, or should post whole
script.... Thanks in advance for any assist, John

Attempts so far:
# @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
$grep_out);
# @surewords = split(/\_{0}${search_term}\_{0}/im,
$grep_out);
@surewords = split(/[^\_]${search_term}[^\_]/im,
$grep_out);
# @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
$grep_out);

Paul Lalli · Aug 1, 2007

I have script that recursively greps for a term and counts the
occurrences of it in each file. It works fine but now I want to
exclude matches where the term has an underscore in front or after
it. I have tried to continue using split on (not underscore)
$search_term(not underscore) in below examples but my results are not
right yet. Input is a string in $grep_out and I want to count any
number of occurrences. I can not break string up into words since a
correct match may not have spaces or any certain character around it.
Let me know if I have not provided enough info, or should post whole
script.... Thanks in advance for any assist, John

Attempts so far:
# @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
$grep_out);

_ is not special. No need to backslash it. This code says to split
on any $search_term that is not *immediately* preceded by or
*immediately* followed by an underscore. Is that what you meant?

# @surewords = split(/\_{0}${search_term}\_{0}/im,
$grep_out);

A quantifier of {0} is a no-op. Frankly, I think that should be a
syntax error, or at least a warning.

@surewords = split(/[^\_]${search_term}[^\_]/im,
$grep_out);

This says to include the not-underscore character in the split
delimiter.

# @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
$grep_out);

That's a modification of the above, allowing $search_term to come at
the beginning or end of the string as well.

Please provide some sample input and sample output, so people have a
chance to know what it is you're trying to acheive. This and other
good advice can be found in the Posting Guidelines, which are posted
here twice a week.

Paul Lali

Paul Lalli · Aug 1, 2007

I have script that recursively greps for a term and counts the
occurrences of it in each file. It works fine but now I want to
exclude matches where the term has an underscore in front or after
it. I have tried to continue using split

As a side note to my other response, split() is a very bad way to
attempt to count occurrences of a string:

$ perl -le'
print scalar(@foo = split /foo/, "barfoobazfoobiff");
print scalar(@foo = split /foo/, "barfoobazbifffoo");
print scalar(@foo = split /foo/, "barbazbifffoofoo");
'
3
2
1

I rather strongly suggest you read:
$ perldoc -q count
Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
How can I count the number of occurrences of a substring
within a string?

Paul Lalli

surfitupdotcom · Aug 1, 2007

As a side note to my other response, split() is a very bad way to
attempt to count occurrences of a string:

$ perl -le'
print scalar(@foo = split /foo/, "barfoobazfoobiff");
print scalar(@foo = split /foo/, "barfoobazbifffoo");
print scalar(@foo = split /foo/, "barbazbifffoofoo");
'
3
2
1

I rather strongly suggest you read:
$ perldoc -q count
Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
How can I count the number of occurrences of a substring
within a string?

Paul Lalli

You read me correctly, idea was to split on any occurrence of my
search term that does not have an underscore before or after it.
Counting matches using split worked fine until I tried to exclude
certain patterns. I will look at the perldoc you suggested but here
is more info for the thread. Thanks, John

Sample input: super _super_ _super super SUPER SUPER_ blahsuper
Desired output: super super SUPER super

Current output using split(/(?<!_)${search_term}(?!_)/i, $grep_out);
Array contents- _super_ _super SUPER_ blah

Tad McClellan · Aug 2, 2007

I have script that recursively greps

Attempts so far:
# @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
$grep_out);
# @surewords = split(/\_{0}${search_term}\_{0}/im,
$grep_out);
@surewords = split(/[^\_]${search_term}[^\_]/im,
$grep_out);
# @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
$grep_out);

There is no recursion anywhere in that code.

Perhaps you meant "repeatedly" instead?

surfitupdotcom · Aug 2, 2007

[email protected] said:
[email protected] said:

I have script that recursively greps
Attempts so far:
# @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
$grep_out);
# @surewords = split(/\_{0}${search_term}\_{0}/im,
$grep_out);
@surewords = split(/[^\_]${search_term}[^\_]/im,
$grep_out);
# @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
$grep_out);

Click to expand...

There is no recursion anywhere in that code.

Perhaps you meant "repeatedly" instead?

The recursion is elsewhere in the script. By the time it gets to this
split each line of $grep_out has one or more hits of the search term.

surfitupdotcom · Aug 3, 2007

On Aug 1, 1:02 pm, (e-mail address removed) wrote:

(snipped)

You read me correctly, idea was tospliton any occurrence of my
search term that does not have an underscore before or after it.
Counting matches usingsplitworked fine until I tried to exclude
certain patterns. I will look at the perldoc you suggested but here
is more info for the thread. Thanks, John

Click to expand...

Sample input: super _super_ _super super SUPER SUPER_ blahsuper
Desired output: super super SUPER super

Click to expand...

How did you plan on getting rid of the 'blah' substring by
doing asplit?

Current output usingsplit(/(?<!_)${search_term}(?!_)/i, $grep_out);
Array contents- _super_ _super SUPER_ blah

Click to expand...

Your description said 'a underscore before ... OR
a underscore after'; so you also need an "OR" in your
regular expression. This is known as "Alternation"
(see perldoc perlre).

use Data:umper;

my $term = 'super';

my $string = 'super _super_ _super super SUPER SUPER_ blahsuper';

my @fragments =split(
/_\Q$term\E_? # exclude term with underscore in front
# (optional trailing _)
| # OR
_?\Q$term\E_/xi # exclude term with underscore afterward
# (optional leading _)
, $string);

print Dumper \@fragments;

__END__

I get:

$VAR1 = [
'super ',
' ',
' super SUPER ',
' blahsuper'
];

Is that what you wanted? As Paul said, there's
probably a better way to "count" things than
usingsplit.

Thanks all for the assist. After further experimentation I did switch
to using option other than split for this task. I did sharpen my
regexp along the way so everything worked out. Take care, John

How to login using PDO	3	Jul 21, 2021
I want to include fees depending on the payment method, using the plugin "Deposits for Woocommerce"	0	Aug 17, 2022
Regexp: exclude a word or a phrase	1	Apr 20, 2010
Using the split function	15	Jul 17, 2007
Using the nntplib module to count Google Groups users	3	Oct 27, 2013
FAQ 4.29 How can I count the number of occurrences of a substring within a string?	0	Jan 4, 2011
Returning "nearest in document" matches using XPath	2	Dec 5, 2008
split doesn't work	4	Jun 14, 2010

Using split to count matches, but exclude certain patterns

surfitupdotcom

Paul Lalli

Paul Lalli

surfitupdotcom

Tad McClellan

surfitupdotcom

surfitupdotcom

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads