Finding if something is in a list

ccc31807 · Nov 23, 2010

Actually the data is more like IN2 and what I want is everyone who
read book 4.

Replace the IN2 loop with something like this:

my %people;
while (<IN2>)
{
chomp;
my ($name, $date, $books) = split /,/;
my @books = split(/;/, $books);
foreach my $ele (@books) { push @{$people{$ele}}, $_; }
}

####print "People who read book 4:\n@{$people{4}}\n";

foreach my $k (keys %people)
{
print "$k => @{$people{$k}}\n";
}

This gives you the data on the people who read all the books. To get
data for a specific book, uncomment the #### line.

What this does is create a hash indexed to your book number, with the
value of the hash an anonymous array to which you push the line (as a
scalar) containing the data from your IN2 file. You get the data out
by using the hash key of the book you want, dereferencing the value as
an anonymous array. It may look hairy, but the third time you do it
you won't even think about it.

CC.

sln · Nov 23, 2010

I (almost) wish that the syntax for grouping without backrefs
were at least as terse as the syntax for grouping with backrefs.
Having to add extra punctuation to indicate *not* doing something
just seems counterintuitive.

The (? syntax was a later addition to the language, when () was
already well established, so that wasn't really an option. (? is
also slightly harder to read for people familiar with regexp syntaxes
other than Perl's (and for those of us who first learned Perl before
it had (?). I'm not saying that's an excuse for creating backrefs
unnecessarily, but there is some pressure to use () because it works.

I think that (?: ) was a logical step in the process, in the face of
( ) which doesen't make sense when combined with quantifiers.
In that case, it really doesen't work and is basically useless for capture
in this sense of (\s([\w]+\s*)*\s)+.
And its very hard to read.

In that sense, (?: ) has moderately easier to discern than a capture
grouping (As a bonus you get extra's (?imsx-imsx: ) ) but imho all groupings,
especially nested, are hard to read.

When modifying or reading a regexs groupings, its sometimes more important
to me to separate the capture ones as it shifts the output when alterred.
Most unique syntax is taken already.

In need of a tool, I tried to cull out the start of the capture groups
separate from the non-capture. I didn't even attempt closures, although
if the start can be determined, I'd imagine the ends can too, but not sure.

-sln

-------------------
use strict;
use warnings;
require 5.010_000;

##
my $rxgroup = qr/
([[:cntrl:]] | $) # Formatting control character
| # or, the rest ..
(?:
(?<!\\) # Not an escape behind us
(?:\\.)* # 0 or more "escape + any char"
(?:
# Exclude character class'
\[
\]?
(?: \\.| \[:[a-z]*:\] | [^\]\n] )*
(\n?)
(?: \\.| \[:[a-z]*:\] | [^\]] )*
\]
|
(?# Exclude extended comments )
$\?(\#) [^)]* $
|
# Exclude free comments
(\#) (?:[^\n])*
|
# Start of a capture group
$ # (
(?:
(?!\?) # unnamed: not a ? in front of us
| # or (Perl 5.10 and above)
# named: a ?<name> or ?'name' is ok
(?= \?[<'][^\W\d][\w]*['>] )
)
)
)
/x;

my $testrx = qr/
\(\$th(\\(?:.) [(]
(?# Extended lines
of comment
)
\\\\(.$\\$.)(i(s))\t(i(s)) ] )
/x;

##
# Sample object
print FindRXCaptureGroups(
qr/ \(\$th(\\(?:.) [(] \\\\(.$\\\(.)(i(s))\t(i(s)) ] )/x ), "\n";

# Sample reference
print FindRXCaptureGroups( \$testrx ), "\n";

# Show groups for that which finds the groups
print FindRXCaptureGroups( \$rxgroup ),"\n";

exit(0);

##
sub FindRXCaptureGroups
{
@_ > 0 || die "Expected a parameter";
my $sample;

if ( ref( $_[0]) eq 'SCALAR' ) { $sample = $_[0] }
elsif (ref(\$_[0]) eq 'SCALAR' ) { $sample = \$_[0] }
elsif (ref( $_[0]) eq 'Regexp' ) { $sample = \$_[0] }
elsif (ref( $_[0]) eq 'REF' &&
ref(${$_[0]}) eq 'Regexp') { $sample = $_[0] }
else {
die "Not a string, Regexp object, or reference to one";
}
my ($All,
$grpstring,
$group,
$lastpos ) = ('', '', 1, 0);

while ($$sample =~ /$rxgroup/g )
{
if (defined $1) {
my $cntrlen = length $1;
my $cntrlcode = $cntrlen ? $1 : "\n";

$All .= substr( $$sample, $lastpos, ($+[0]-$lastpos-$cntrlen) ) . $cntrlcode;
$grpstring .= '-' x ($+[0]-$lastpos-$cntrlen) . $cntrlcode;
$lastpos = $+[0];
if ($cntrlcode eq "\n") {
$All .= $grpstring if ($grpstring =~ /\d/);
$grpstring = '';
}
next;
}
if (defined $2) {
my ($cntrlcode, $match0, $match2) = ($2, $+[0], $+[2]);

if (length( $2 ) && $grpstring =~ /\d/) {
$All .= substr( $$sample, $lastpos, ($match2-$lastpos) );
$grpstring .= '-' x ($match2-$lastpos-1) . $cntrlcode;
$lastpos = $match2;
$All .= $grpstring;
$grpstring = '';
}
$All .= substr( $$sample, $lastpos, ($match0-$lastpos) );
$grpstring .= '-' x ($match0-$lastpos);
$lastpos = $match0;
next;
}
if (defined $3 || defined $4) {
$All .= substr( $$sample, $lastpos, ($+[0]-$lastpos) );
$grpstring .= '-' x ($+[0]-$lastpos);
$lastpos = $+[0];
next;
}

$All .= substr( $$sample, $lastpos, ($+[0]-$lastpos) );
$grpstring .= '-' x ($+[0]-$lastpos-1) . $group++ % 10;
$lastpos = $+[0];
}
return $All;
}
__END__

(?x-ism: $\$th(\\(?:.) [(] \\\\(.$\\$.)(i(s))\t(i(s)) ] ))
---------------1----------------2---------3-4-----5-6--------

(?x-ism:
\(\$th(\\(?:.) [(]
----------1-----------
(?# Extended lines
of comment
)
\\\\(.$\\$.)(i(s))\t(i(s)) ] )
--------2---------3-4-----5-6-------
)

(?x-ism:
([[:cntrl:]] | $) # Formatting control character
-----1------------------------------------------------
| # or, the rest ..
(?:
(?<!\$ # Not an escape behind us
(?:\\.)* # 0 or more "escape + any char"
(?:
# Exclude character class'
\[
\]?
(?: \\.| \[:[a-z]*:\] | [^\]\n] )*
(\n?)
-----------------2----
(?: \\.| \[:[a-z]*:\] | [^\]] )*
\]
|
(?# Exclude extended comments )
$\?(\#) [^)]* $
-------------------3------------
|
# Exclude free comments
(\#) (?:[^\n])*
--------------4--------------
|
# Start of a capture group
\( # (
(?:
(?!\?) # unnamed: not a ? in front of us
| # or (Perl 5.10 and above)
# named: a ?<name> or ?'name' is ok
(?= \?[<'][^\W\d][\w]*['>] )
)
)
)
)

sln · Nov 24, 2010

On Tue, 23 Nov 2010 13:52:05 -0800, (e-mail address removed) wrote:
[snip preamble]

|
# Exclude free comments
(\#) (?:[^\n])*

^^
In the interest of readability, this grouping can be removed.
(\#) [^\n]*

[snip lines and lines of stuff]

-sln

Peter Scott · Nov 24, 2010

I (almost) wish that the syntax for grouping without backrefs were at
least as terse as the syntax for grouping with backrefs. Having to add
extra punctuation to indicate *not* doing something just seems
counterintuitive.

Understand that the basic syntax for regular expressions was created
first by mathematicians before computers existed, then transliterated for
computers before Perl existed, and so the Perl developers were starting
from a syntax that had already been developed according to certain
assumptions, and this constrained their choices as they extended the
syntax.

If you want to see what pattern matching syntax can look like when all
the legacy is thrown out or up for debate, see rules in Perl 6.

John W. Krahn · Nov 24, 2010

Uri said:
DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.

that makes no sense. you CAN always change it for internal use like
searching. are you looking into this string many times? if so, spliting
the values out to a hash and searching that will be much faster and
simpler. no need for much other than split and a hash lookup:

my %is_book_num = map { $_ => 1 } split /;/, $string ;

that will create a leading empty field which shouldn't matter in your
lookups. if you are worried about it, then either grep that out or use a
different way to grab them (\d+ comes to mind) in a regex:

my %is_book_num = map { $_ => 1 } $string =~ /(\d+)/ ;

That only captures the first \d+ value, which is OK if that is all that
is required. To capture all \d+ values:

my %is_book_num = map { $_ => 1 } $string =~ /\d+/g;

John

Dave Saville · Nov 25, 2010

What this does is create a hash indexed to your book number, with the
value of the hash an anonymous array to which you push the line (as a
scalar) containing the data from your IN2 file. You get the data out
by using the hash key of the book you want, dereferencing the value as
an anonymous array. It may look hairy, but the third time you do it
you won't even think about it.

Well what I think about is all the work the box is doing, making a
hash and then processing it, keeping the whole file in memory -
assuming it fits, and thus possibly causing a paging problem, when all
it needs is a loop reading one record at a time and either passing it
over or processing it based on a test on what in reality is a pretty
short list of possible numbers. I appreciate the clever idea - and can
use it for a similar, but more complicated, requirement.

I know what you are going to say - It doesn't matter - But I am afraid
I come from *way* back in programming when one had to worry about such
things and wrote tight code. Usually in assembler.

ccc31807 · Nov 26, 2010

Well what I think about is all the work the box is doing, making a
hash and then processing it, keeping the whole file in memory -
assuming it fits, and thus possibly causing a paging problem, when all
it needs is a loop reading one record at a time and either passing it
over or processing it based on a test on what in reality is a pretty
short list of possible numbers.

I agree that you use the simplest, easiest tool that gets the job
done. If your job requires producing a report from the analysis of two
(or more) input files, you will arrange your work in (at least) three
steps: (1) reading in the data to specific data structures, (2)
processing the data contained in the data structures, and (3) reading
out the processed data to an output file.

One of the great strengths of Perl is that it gives you sophisticated
structures, arrays of hashes, hashes of hashes, hashes of arrays,
etc., nested to arbitrary depths, and doesn't require you to travel
around our elbow to get to your nose.

CC.

FAQ 4.42 How can I tell whether a certain element is contained in a list or array?	0	Feb 8, 2011
Finding Dupe in a List	4	Jul 30, 2007
How to check is something is a list or a dictionary or a string?	6	Aug 29, 2008
Engineering a list container. Part 1.	71	Dec 7, 2013
Code suggestion - List comprehension	0	Dec 12, 2013
Checking if any item in a radio list is checked	2	Apr 21, 2009
Probelm to post XML data in a loop. First time XML is posted, second time data is getting truncated.	7	Feb 9, 2006
If value is in a list	19	Apr 22, 2008

Finding if something is in a list

ccc31807

sln

sln

Peter Scott

John W. Krahn

Dave Saville

ccc31807

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads