Finding if something is in a list

Dave Saville · Nov 22, 2010

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

Neither of which I am that pleased with. Is there a better way?

TIA

Dave Saville · Nov 22, 2010

next unless $books =~ /(^|$req(;|$)/;

That's neat Tad - Thank you.

Willem · Nov 22, 2010

Dave Saville wrote:
) I am processing records, one field of which is a "book number" this is
) the "book" that the item belongs to. However it can belong to
) multiple books. So the field can either be "1" or a semi colon
) separated list "1;5;7" for example.
)
) I want a simple test to see if an item belongs to a given book. Pseudo
) code: "next if list does not contain book #".
)
) Assuming $req holds the requested book number and $books holds the
) string of books, two ways I have come up with that work are:
)
) next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;
)
) or
)
) next if ! grep { $req == $_ } split /;/, $books;
)
) Neither of which I am that pleased with. Is there a better way?

One trick to do this is this:

next if ";$books;" !~ m/;$req;/;

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

sln · Nov 22, 2010

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

Neither of which I am that pleased with. Is there a better way?

TIA

Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

So for brevity, use an '==' condition if your not sure,
don't use regex at all (other than for the split delimeter).

Otherwise, for regex, it should be an all string proposition.

-sln

use strict;
use warnings;

my @books = (
'3;4;05;010,0000',
'01;3;09;020',
'2;1;099;030,0',
);

my $req = '00';
$req =~ s/^0*(\d+)/$1/;

for my $book (@books) {
next unless $book =~ /\b0*($req)\b/;
print "found '$1' -> '$book'\n";
}

Dave Saville · Nov 22, 2010

Dave Saville wrote:
) I am processing records, one field of which is a "book number" this is
) the "book" that the item belongs to. However it can belong to
) multiple books. So the field can either be "1" or a semi colon
) separated list "1;5;7" for example.
)
) I want a simple test to see if an item belongs to a given book. Pseudo
) code: "next if list does not contain book #".
)
) Assuming $req holds the requested book number and $books holds the
) string of books, two ways I have come up with that work are:
)
) next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;
)
) or
)
) next if ! grep { $req == $_ } split /;/, $books;
)
) Neither of which I am that pleased with. Is there a better way?

One trick to do this is this:

next if ";$books;" !~ m/;$req;/;

Nice. Thanks, I should have thought of that. Used to have to use a
similar trick in shell scripts that did not like empty strings in
compares

Dave Saville · Nov 22, 2010

Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

All numbers and no leading zeros.

So for brevity, use an '==' condition if your not sure,
don't use regex at all (other than for the split delimeter).

Otherwise, for regex, it should be an all string proposition.

-sln

use strict;
use warnings;

my @books = (
'3;4;05;010,0000',
'01;3;09;020',
'2;1;099;030,0',
);

my $req = '00';
$req =~ s/^0*(\d+)/$1/;

for my $book (@books) {
next unless $book =~ /\b0*($req)\b/;
print "found '$1' -> '$book'\n";
}

Hmm, works - but I wanted to "next" to the loop reading the items.

Dave Saville · Nov 22, 2010

If the data is exactly as you have described, then even this will do it:

next unless $books =~ /\b$req\b/;

Yes it is as described. Thanks. I also learnt something about or's and
brackets from your first suggestion. I had only used () to use $1 etc.
Did not realise you could group logically like that. I really must
read all the regx book

sln · Nov 22, 2010

On Mon, 22 Nov 2010 16:23:19 UTC, (e-mail address removed) wrote:
[snip problem info]

Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

Click to expand...

All numbers and no leading zeros.

Literal numbers aren't numbers in regular expressions, they
are digits, as in if a variable, converted to digits.
Look up the FAQ on how to determine if a variable is a
number.

Hmm, works - but I wanted to "next" to the loop reading the items.

The analagy is the same ..

# data file open code ..
#
$req =~ s/^0*(\d+)/$1/;

while (defined (my $books = <DATA>) {
next unless $books =~ /\b0*($req)\b/;
print "found '$1' -> '$books'\n";
}
# data file close code ..
#
----------------------------------

At a minimum, /\b0*$req\b/ can't hurt.
Your way is not a solid defense when comparing numbers as strings.
The numeric equality test '==' is a solution when a numeric test
is called for.
Otherwise, they should both be treated as a string when using a
regex solution. It doesn't matter how pretty you think it is or not.

-sln

ccc31807 · Nov 22, 2010

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

You are not searching a list, but a string.

Searching a string to see if it matches a pattern is easy, as long as
you have unique values. If you use digits as identifies in your
string, then '2' will match '2', '12', '20', '21', etc.

It strikes me that, if you want to use a unique identifies to match a
book number, you would likely want something like the ISBN. These will
always be unique strings.

Depending on your requirements, you might want to play with using an
anonymous array to hold your book values. You can set a scalar
variable to an anonymous array like this: $arref = []; and then push/
pop your identifiers.

CC.

sln · Nov 22, 2010

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

IMO, this ^^^^ is the best way to go
if you are sure that $req is actually a 'number' and not
a string (like a book code, where every digit matters and
might contain no non-digits) ,

next if ! grep { $req eq $_ } split /;/, $books;

otherwise.

-sln

Bart Lateur · Nov 22, 2010

Tad said:
next unless $books =~ /(^|$req(;|$)/;

Using double negation, that can be rewritten as

/(?<![^;])$req(?![^;])/

Or, using the inverted approach, you could convert the word list to a
single regex (possibly optimized with Regex:

reSuf) and match the book
id against it.

my $books = "1;5;7";
$books =~ tr/;/|/;
$req =~ /^($books)$/ or next;

Dave Saville · Nov 22, 2010

You are not searching a list, but a string.

True, but logically it is a list of identifiers. I could have
formulated the question better, but the question "finding if something
is in a string" smacks of very newby - not worth answering - RTFM
area.

Searching a string to see if it matches a pattern is easy, as long as
you have unique values. If you use digits as identifies in your
string, then '2' will match '2', '12', '20', '21', etc.

Quite, but that is exactly what I don't want. The answers from others,
and my original, cope with that, ie 2 will match 2 but not 22.

It strikes me that, if you want to use a unique identifies to match a
book number, you would likely want something like the ISBN. These will
always be unique strings.

Can't change the string - it is coming from another application and I
can't change the data format.

Depending on your requirements, you might want to play with using an
anonymous array to hold your book values. You can set a scalar
variable to an anonymous array like this: $arref = []; and then push/
pop your identifiers.

CC.

Uri Guttman · Nov 22, 2010

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.

that makes no sense. you CAN always change it for internal use like
searching. are you looking into this string many times? if so, spliting
the values out to a hash and searching that will be much faster and
simpler. no need for much other than split and a hash lookup:

my %is_book_num = map { $_ => 1 } split /;/, $string ;

that will create a leading empty field which shouldn't matter in your
lookups. if you are worried about it, then either grep that out or use a
different way to grab them (\d+ comes to mind) in a regex:

my %is_book_num = map { $_ => 1 } $string =~ /(\d+)/ ;

and yes, you should have said string and not list. you could have said
finding in a string of separated values.

uri

Dave Saville · Nov 22, 2010

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.

that makes no sense. you CAN always change it for internal use like
searching. are you looking into this string many times? if so, spliting
the values out to a hash and searching that will be much faster and
simpler. no need for much other than split and a hash lookup:

No, searching many records that contain a similar string.

and yes, you should have said string and not list. you could have said
finding in a string of separated values.

Hindsight is a wonderful thing

C.DeRykus · Nov 22, 2010

Yes it is as described. Thanks. I also learnt something about or's and
brackets from your first suggestion. I had only used () to use $1 etc.
Did not realise you could group logically like that.

Not a biggie which I'm sure is why Tad
just used () instead of (?

but any ()
creates a backref. (?

though groups
without capturing. So (?:^|

in this
case is faster since no capture/copy
is needed. Also, clearer about what's
going on which is important in longer
regexes.

ccc31807 · Nov 22, 2010

True, but logically it is a list of identifiers. I could have
formulated the question better, but the question "finding if something
is in a string" smacks of very newby - not worth answering - RTFM
area.

Okay, let's abstract a couple of levels up. Matching might not be an
end, but merely a means to an end.

You have an identifier, and other pieces of data. You are munging the
data. Data munging is real easy to do with hashes. I don't know what
your data looks like, but a typical scenario would involve creating a
new batch of records (call it information) from two input files. For
example, suppose you have IN1 and IN2, like this:

IN1
1,Learning Perl,Schwartz
2,Beginning Perl,Lee
3,Sam's Teach Yourself Perl,Lemay
4,Perl for Dummies,Hoffman

IN2
Saville,2010-10-2,1
McClellan,1020-10-5,1;2
sln,2010-10-16,3;4
Guttman,2010-10-23,1;2;3;4

and you want a report of each book your person read. You can do
something like this:

#build a hash of books from IN1
while(<IN1>)
{
($id,$title,$author) = split(/,/);
$books{$id} = {
title => $title,
author => $author,
};
}

#print information from IN2
while(<IN2>)
{
($name,$data,$books) = split(/,/);
@books = split(/;/, $books);
print "PERSON: $name = $date\n";
foreach my $ele (@books)
{
print "BOOK: $books{$ele}{name} $books{$ele}{author}\n";
}
}

If you split the string into an array, and use the array elements as
an index into another data structure, you don't need to match the
string. Depending on the structure and amount of your data, splitting
the books string may be quicker, and potentially a lot more useful. In
any case, the string elements are merely keys, and the key is valuable
because of the access it gives you, not because it's valuable in and
of itself.

CC.

Uri Guttman · Nov 22, 2010

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.
DS> No, searching many records that contain a similar string.

you want to find which string has the number? hash to the rescue again!

just make a hash of the numbers of all the strings with the value being
the string. if multiple strings have the same number either make it a
hash of arrays or hashes. do that once and then finding which string has
a number is again a simple hash lookup (maybe with a second lookup).

DS> Hindsight is a wonderful thing

well, consider it a teaching moment! specifying your problem clearly in
the subject is a skill all posters need. you know the fun kind of newbie
subject that says nothing at all like "doesn't work"!

uri

Keith Thompson · Nov 23, 2010

C.DeRykus said:
Not a biggie which I'm sure is why Tad
just used () instead of (? but any ()
creates a backref. (? though groups
without capturing. So (?:^| in this
case is faster since no capture/copy
is needed. Also, clearer about what's
going on which is important in longer
regexes.

I (almost) wish that the syntax for grouping without backrefs
were at least as terse as the syntax for grouping with backrefs.
Having to add extra punctuation to indicate *not* doing something
just seems counterintuitive.

The (?

syntax was a later addition to the language, when () was
already well established, so that wasn't really an option. (?

is
also slightly harder to read for people familiar with regexp syntaxes
other than Perl's (and for those of us who first learned Perl before
it had (?

). I'm not saying that's an excuse for creating backrefs
unnecessarily, but there is some pressure to use () because it works.

C.DeRykus · Nov 23, 2010

[...]

I (almost) wish that the syntax for grouping without backrefs
were at least as terse as the syntax for grouping with backrefs.
Having to add extra punctuation to indicate *not* doing something
just seems counterintuitive.

The (? syntax was a later addition to the language, when () was
already well established, so that wasn't really an option. (? is
also slightly harder to read for people familiar with regexp syntaxes
other than Perl's (and for those of us who first learned Perl before
it had (?). I'm not saying that's an excuse for creating backrefs
unnecessarily, but there is some pressure to use () because it works.

A hasty* thought occurs to me that <> seems
metacharacter-ish enough that it could be
used for a non-capturing cluster. Of course
having to backwhack every literal <,> would
be a pain but non-capturing groups are so
frequent, the gain might be worth the pain.

* to cover any obvious objections that'd
make this suggestion seem totally silly

Dave Saville · Nov 23, 2010

Okay, let's abstract a couple of levels up. Matching might not be an
end, but merely a means to an end.

You have an identifier, and other pieces of data. You are munging the
data. Data munging is real easy to do with hashes. I don't know what
your data looks like, but a typical scenario would involve creating a
new batch of records (call it information) from two input files. For
example, suppose you have IN1 and IN2, like this:

IN1
1,Learning Perl,Schwartz
2,Beginning Perl,Lee
3,Sam's Teach Yourself Perl,Lemay
4,Perl for Dummies,Hoffman

IN2
Saville,2010-10-2,1
McClellan,1020-10-5,1;2
sln,2010-10-16,3;4
Guttman,2010-10-23,1;2;3;4

and you want a report of each book your person read. You can do
something like this:

Actually the data is more like IN2 and what I want is everyone who
read book 4.

Hello I am learning how to code and I tried making a calculator with HTML and js with some CSS I am stuck at thing, Like the screen value is	0	Mar 13, 2025
FAQ 4.42 How can I tell whether a certain element is contained in a list or array?	0	Feb 8, 2011
Finding Dupe in a List	4	Jul 30, 2007
How to check is something is a list or a dictionary or a string?	6	Aug 29, 2008
Engineering a list container. Part 1.	71	Dec 7, 2013
Code suggestion - List comprehension	0	Dec 12, 2013
Checking if any item in a radio list is checked	2	Apr 21, 2009
Probelm to post XML data in a loop. First time XML is posted, second time data is getting truncated.	7	Feb 9, 2006

Finding if something is in a list

Dave Saville

Dave Saville

Willem

sln

Dave Saville

Dave Saville

Dave Saville

sln

ccc31807

sln

Bart Lateur

Dave Saville

Uri Guttman

Dave Saville

C.DeRykus

ccc31807

Uri Guttman

Keith Thompson

C.DeRykus

Dave Saville

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads