Finding if something is in a list

D

Dave Saville

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

Neither of which I am that pleased with. Is there a better way?

TIA
 
W

Willem

Dave Saville wrote:
) I am processing records, one field of which is a "book number" this is
) the "book" that the item belongs to. However it can belong to
) multiple books. So the field can either be "1" or a semi colon
) separated list "1;5;7" for example.
)
) I want a simple test to see if an item belongs to a given book. Pseudo
) code: "next if list does not contain book #".
)
) Assuming $req holds the requested book number and $books holds the
) string of books, two ways I have come up with that work are:
)
) next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;
)
) or
)
) next if ! grep { $req == $_ } split /;/, $books;
)
) Neither of which I am that pleased with. Is there a better way?

One trick to do this is this:

next if ";$books;" !~ m/;$req;/;


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

sln

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

Neither of which I am that pleased with. Is there a better way?

TIA

Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

So for brevity, use an '==' condition if your not sure,
don't use regex at all (other than for the split delimeter).

Otherwise, for regex, it should be an all string proposition.

-sln

use strict;
use warnings;

my @books = (
'3;4;05;010,0000',
'01;3;09;020',
'2;1;099;030,0',
);

my $req = '00';
$req =~ s/^0*(\d+)/$1/;

for my $book (@books) {
next unless $book =~ /\b0*($req)\b/;
print "found '$1' -> '$book'\n";
}
 
D

Dave Saville

Dave Saville wrote:
) I am processing records, one field of which is a "book number" this is
) the "book" that the item belongs to. However it can belong to
) multiple books. So the field can either be "1" or a semi colon
) separated list "1;5;7" for example.
)
) I want a simple test to see if an item belongs to a given book. Pseudo
) code: "next if list does not contain book #".
)
) Assuming $req holds the requested book number and $books holds the
) string of books, two ways I have come up with that work are:
)
) next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;
)
) or
)
) next if ! grep { $req == $_ } split /;/, $books;
)
) Neither of which I am that pleased with. Is there a better way?

One trick to do this is this:

next if ";$books;" !~ m/;$req;/;

Nice. Thanks, I should have thought of that. Used to have to use a
similar trick in shell scripts that did not like empty strings in
compares :)
 
D

Dave Saville

Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

All numbers and no leading zeros.
So for brevity, use an '==' condition if your not sure,
don't use regex at all (other than for the split delimeter).

Otherwise, for regex, it should be an all string proposition.

-sln

use strict;
use warnings;

my @books = (
'3;4;05;010,0000',
'01;3;09;020',
'2;1;099;030,0',
);

my $req = '00';
$req =~ s/^0*(\d+)/$1/;

for my $book (@books) {
next unless $book =~ /\b0*($req)\b/;
print "found '$1' -> '$book'\n";
}

Hmm, works - but I wanted to "next" to the loop reading the items.
 
D

Dave Saville

If the data is exactly as you have described, then even this will do it:


next unless $books =~ /\b$req\b/;

Yes it is as described. Thanks. I also learnt something about or's and
brackets from your first suggestion. I had only used () to use $1 etc.
Did not realise you could group logically like that. I really must
read all the regx book :)
 
S

sln

On Mon, 22 Nov 2010 16:23:19 UTC, (e-mail address removed) wrote:
[snip problem info]
Are you talking book strings or book numbers (integer)?

If numbers, its better to keep what you have '==', otherwise using
$req in a regex will be evaluated as a string.
If $req starts out as a string, say '012', it stays as a string
in the regex, and visa versa for the match, the delimeter could be
voided with any leading '0' characters.

All numbers and no leading zeros.

Literal numbers aren't numbers in regular expressions, they
are digits, as in if a variable, converted to digits.
Look up the FAQ on how to determine if a variable is a
number.
Hmm, works - but I wanted to "next" to the loop reading the items.

The analagy is the same ..

# data file open code ..
#
$req =~ s/^0*(\d+)/$1/;

while (defined (my $books = <DATA>) {
next unless $books =~ /\b0*($req)\b/;
print "found '$1' -> '$books'\n";
}
# data file close code ..
#
----------------------------------

At a minimum, /\b0*$req\b/ can't hurt.
Your way is not a solid defense when comparing numbers as strings.
The numeric equality test '==' is a solution when a numeric test
is called for.
Otherwise, they should both be treated as a string when using a
regex solution. It doesn't matter how pretty you think it is or not.

-sln
 
C

ccc31807

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to  
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

You are not searching a list, but a string.

Searching a string to see if it matches a pattern is easy, as long as
you have unique values. If you use digits as identifies in your
string, then '2' will match '2', '12', '20', '21', etc.

It strikes me that, if you want to use a unique identifies to match a
book number, you would likely want something like the ISBN. These will
always be unique strings.

Depending on your requirements, you might want to play with using an
anonymous array to hold your book values. You can set a scalar
variable to an anonymous array like this: $arref = []; and then push/
pop your identifiers.

CC.
 
S

sln

I am processing records, one field of which is a "book number" this is
the "book" that the item belongs to. However it can belong to
multiple books. So the field can either be "1" or a semi colon
separated list "1;5;7" for example.

I want a simple test to see if an item belongs to a given book. Pseudo
code: "next if list does not contain book #".

Assuming $req holds the requested book number and $books holds the
string of books, two ways I have come up with that work are:

next if $books !~ m/^$req$|^$req;|;$req;|;$req$/;

or

next if ! grep { $req == $_ } split /;/, $books;

IMO, this ^^^^ is the best way to go
if you are sure that $req is actually a 'number' and not
a string (like a book code, where every digit matters and
might contain no non-digits) ,

next if ! grep { $req eq $_ } split /;/, $books;

otherwise.

-sln
 
B

Bart Lateur

Tad said:
next unless $books =~ /(^|;)$req(;|$)/;

Using double negation, that can be rewritten as

/(?<![^;])$req(?![^;])/


Or, using the inverted approach, you could convert the word list to a
single regex (possibly optimized with Regex::preSuf) and match the book
id against it.

my $books = "1;5;7";
$books =~ tr/;/|/;
$req =~ /^($books)$/ or next;
 
D

Dave Saville

You are not searching a list, but a string.

True, but logically it is a list of identifiers. I could have
formulated the question better, but the question "finding if something
is in a string" smacks of very newby - not worth answering - RTFM
area. :)
Searching a string to see if it matches a pattern is easy, as long as
you have unique values. If you use digits as identifies in your
string, then '2' will match '2', '12', '20', '21', etc.

Quite, but that is exactly what I don't want. The answers from others,
and my original, cope with that, ie 2 will match 2 but not 22.
It strikes me that, if you want to use a unique identifies to match a
book number, you would likely want something like the ISBN. These will
always be unique strings.

Can't change the string - it is coming from another application and I
can't change the data format.
Depending on your requirements, you might want to play with using an
anonymous array to hold your book values. You can set a scalar
variable to an anonymous array like this: $arref = []; and then push/
pop your identifiers.

CC.
 
U

Uri Guttman

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.

that makes no sense. you CAN always change it for internal use like
searching. are you looking into this string many times? if so, spliting
the values out to a hash and searching that will be much faster and
simpler. no need for much other than split and a hash lookup:

my %is_book_num = map { $_ => 1 } split /;/, $string ;

that will create a leading empty field which shouldn't matter in your
lookups. if you are worried about it, then either grep that out or use a
different way to grab them (\d+ comes to mind) in a regex:

my %is_book_num = map { $_ => 1 } $string =~ /(\d+)/ ;

and yes, you should have said string and not list. you could have said
finding in a string of separated values.

uri
 
D

Dave Saville

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.

that makes no sense. you CAN always change it for internal use like
searching. are you looking into this string many times? if so, spliting
the values out to a hash and searching that will be much faster and
simpler. no need for much other than split and a hash lookup:

No, searching many records that contain a similar string.

and yes, you should have said string and not list. you could have said
finding in a string of separated values.

Hindsight is a wonderful thing :)
 
C

C.DeRykus

Yes it is as described. Thanks. I also learnt something about or's and
brackets from your first suggestion. I had only used () to use $1 etc.
Did not realise you could group logically like that.

Not a biggie which I'm sure is why Tad
just used () instead of (?:) but any ()
creates a backref. (?:) though groups
without capturing. So (?:^|;) in this
case is faster since no capture/copy
is needed. Also, clearer about what's
going on which is important in longer
regexes.
 
C

ccc31807

True, but logically it is a list of identifiers. I could have
formulated the question better, but the question "finding if something
is in a string" smacks of very newby - not worth answering - RTFM
area. :)

Okay, let's abstract a couple of levels up. Matching might not be an
end, but merely a means to an end.

You have an identifier, and other pieces of data. You are munging the
data. Data munging is real easy to do with hashes. I don't know what
your data looks like, but a typical scenario would involve creating a
new batch of records (call it information) from two input files. For
example, suppose you have IN1 and IN2, like this:

IN1
1,Learning Perl,Schwartz
2,Beginning Perl,Lee
3,Sam's Teach Yourself Perl,Lemay
4,Perl for Dummies,Hoffman

IN2
Saville,2010-10-2,1
McClellan,1020-10-5,1;2
sln,2010-10-16,3;4
Guttman,2010-10-23,1;2;3;4

and you want a report of each book your person read. You can do
something like this:

#build a hash of books from IN1
while(<IN1>)
{
($id,$title,$author) = split(/,/);
$books{$id} = {
title => $title,
author => $author,
};
}

#print information from IN2
while(<IN2>)
{
($name,$data,$books) = split(/,/);
@books = split(/;/, $books);
print "PERSON: $name = $date\n";
foreach my $ele (@books)
{
print "BOOK: $books{$ele}{name} $books{$ele}{author}\n";
}
}

If you split the string into an array, and use the array elements as
an index into another data structure, you don't need to match the
string. Depending on the structure and amount of your data, splitting
the books string may be quicker, and potentially a lot more useful. In
any case, the string elements are merely keys, and the key is valuable
because of the access it gives you, not because it's valuable in and
of itself.

CC.
 
U

Uri Guttman

DS> Can't change the string - it is coming from another application and I
DS> can't change the data format.
DS> No, searching many records that contain a similar string.

you want to find which string has the number? hash to the rescue again!

just make a hash of the numbers of all the strings with the value being
the string. if multiple strings have the same number either make it a
hash of arrays or hashes. do that once and then finding which string has
a number is again a simple hash lookup (maybe with a second lookup).

DS> Hindsight is a wonderful thing :)

well, consider it a teaching moment! specifying your problem clearly in
the subject is a skill all posters need. you know the fun kind of newbie
subject that says nothing at all like "doesn't work"! :)

uri
 
K

Keith Thompson

C.DeRykus said:
Not a biggie which I'm sure is why Tad
just used () instead of (?:) but any ()
creates a backref. (?:) though groups
without capturing. So (?:^|;) in this
case is faster since no capture/copy
is needed. Also, clearer about what's
going on which is important in longer
regexes.

I (almost) wish that the syntax for grouping without backrefs
were at least as terse as the syntax for grouping with backrefs.
Having to add extra punctuation to indicate *not* doing something
just seems counterintuitive.

The (?:) syntax was a later addition to the language, when () was
already well established, so that wasn't really an option. (?:) is
also slightly harder to read for people familiar with regexp syntaxes
other than Perl's (and for those of us who first learned Perl before
it had (?:)). I'm not saying that's an excuse for creating backrefs
unnecessarily, but there is some pressure to use () because it works.
 
C

C.DeRykus

[...]

I (almost) wish that the syntax for grouping without backrefs
were at least as terse as the syntax for grouping with backrefs.
Having to add extra punctuation to indicate *not* doing something
just seems counterintuitive.

The (?:) syntax was a later addition to the language, when () was
already well established, so that wasn't really an option.  (?:) is
also slightly harder to read for people familiar with regexp syntaxes
other than Perl's (and for those of us who first learned Perl before
it had (?:)).  I'm not saying that's an excuse for creating backrefs
unnecessarily, but there is some pressure to use () because it works.

A hasty* thought occurs to me that <> seems
metacharacter-ish enough that it could be
used for a non-capturing cluster. Of course
having to backwhack every literal <,> would
be a pain but non-capturing groups are so
frequent, the gain might be worth the pain.

* to cover any obvious objections that'd
make this suggestion seem totally silly
 
D

Dave Saville

Okay, let's abstract a couple of levels up. Matching might not be an
end, but merely a means to an end.

You have an identifier, and other pieces of data. You are munging the
data. Data munging is real easy to do with hashes. I don't know what
your data looks like, but a typical scenario would involve creating a
new batch of records (call it information) from two input files. For
example, suppose you have IN1 and IN2, like this:

IN1
1,Learning Perl,Schwartz
2,Beginning Perl,Lee
3,Sam's Teach Yourself Perl,Lemay
4,Perl for Dummies,Hoffman

IN2
Saville,2010-10-2,1
McClellan,1020-10-5,1;2
sln,2010-10-16,3;4
Guttman,2010-10-23,1;2;3;4

and you want a report of each book your person read. You can do
something like this:

Actually the data is more like IN2 and what I want is everyone who
read book 4.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top