"between" function equivalent in Perl?

A

Alexandra

Hello All-

I'm attempting to extract a substring of characters from an alphanumeric
text string. I've read a lot of Perl documentation on the 'index' and
'substring' functions; however, I cannot find a regular expression (or an
example of one) that is the equivalent of a "between" function in Perl. Perl
is very good at string manipulation so I assume there must be a way to do
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

Rather than use a series of clunky index and substring calls, does anyone
have a better suggestion? If anyone can recommend a good Perl language
reference website (or book) that has some excellent examples of regular
expressions, that would be helpful too.

Thanks in advance,

Alexandra
 
M

Matija Papec

X-Ftn-To: Alexandra

Alexandra said:
example of one) that is the equivalent of a "between" function in Perl. Perl
is very good at string manipulation so I assume there must be a way to do
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

my($match) = $mystring =~ /=(\S+)/;
Rather than use a series of clunky index and substring calls, does anyone
have a better suggestion? If anyone can recommend a good Perl language
reference website (or book) that has some excellent examples of regular
expressions, that would be helpful too.

I've heard that "Mastering regular expression" is very good.
 
B

Brian Wakem

Alexandra said:
Hello All-

I'm attempting to extract a substring of characters from an alphanumeric
text string. I've read a lot of Perl documentation on the 'index' and
'substring' functions; however, I cannot find a regular expression (or an
example of one) that is the equivalent of a "between" function in Perl. Perl
is very good at string manipulation so I assume there must be a way to do
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

Rather than use a series of clunky index and substring calls, does anyone
have a better suggestion? If anyone can recommend a good Perl language
reference website (or book) that has some excellent examples of regular
expressions, that would be helpful too.


print $mystring =~ m/=([^\s]+)/;
 
A

Anno Siegel

Alexandra said:
Hello All-

I'm attempting to extract a substring of characters from an alphanumeric
text string. I've read a lot of Perl documentation on the 'index' and
'substring' functions; however, I cannot find a regular expression (or an
example of one) that is the equivalent of a "between" function in Perl. Perl
is very good at string manipulation so I assume there must be a way to do
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

The notion of "between" can very well be expressed in regular expressions.
If $from and $to are regular expressions, the expression /$from(.*)$to/
catches everything between the first match of $from and the last
match of $to. If you want to delimit from the first match of $from
to the first match of $to after that point, use a non-greedy pattern in the
middle: /from(.*?)$to/. In your case it doesn't matter:

my $from = qr/=/; # a "="
my $to = qr/\s/; # any white space
my ( $between) = $mystring =~ /$from(.*?)$to/;
print "$between\n";

That prints "xxxx".

Anno
 
A

Alexandra

David Oswald said:
:

You read a "lot" of Perl documentation, and regular expression discussion,
and didn't find anything therein that inspired a solution?

Actually, it inspired me to take up hieroglyphics.
Assuming there is only one occurrence per string, and that there are no
newlines in the string, this works.
my $matched = $mystring =~ m/=(.*?)\s/;

This also evaluated to 1. I had tried this statement with and without the
leading 'm' and came up with the same result. I'm wondering if the
expression is evaluating at all or if Perl is just seeing it as a true or
valid expression. When I do type in something incorrect I do get a Server
Error. So the expression must be evaluating on some level. Any insight or
further suggestions would be appreciated.
If there is more than one occurrence per string, and equals may also be
embedded in the portion of the string you're attempting to extract, it
becomes more complicated. Assuming that the space character is unique in
that it is always a delimiter, then you probably should make your life a
little easier by first splitting on space, just so that you don't have to
write a regexp that handles embedded equals differently from
string-initiated equals signs, and stuff like that.

There is only one occurrence of the '=' per string. I'm not sure what
embedded means in Perl terms. (Does that mean the '=' is not preceded by a
word boundary or space?) I looked in the "Programming Perl" reference and
didn't find any index or glossary references for embedded or
string-initiated, and I don't have enough sense of Perl yet to distinguish
this intuitively. The '=' in the string does not have any leading or
trailing spaces; it is smashed between two characters.

I've played with the index and substr functions to work with a smaller
initial value for $mystring, but I still get the same Boolean result. I will
try playing with splitting to see if I can get it to work that way, but I
suspect it may return the same result.
Have you tried the perldocs, specifically 'perldoc perlbook' for book
suggestions? The bible is the Camel book (Programming Perl), and the sunday
school lesson manual is Learning Perl (the Llama book).

We do have Learning Perl and Programming Perl. I find them excellent
references but slow to digest. There are simply not enough examples for me
to work backwards and easily "decode" the language explanations. When
something clicks (from the examples provided) and I re-read that section in
the O'Reilly references, the explanations then make perfect sense. <g> I
suppose it will just mean sheer time and effort in seeking more examples out
on the web as well as helpful hints from groups such as this one.

Alexandra
 
A

Alexandra

Anno Siegel said:
Alexandra wrote in comp.lang.perl.misc:
I'm attempting to extract a substring of characters from an alphanumeric
text string. [...] Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

The notion of "between" can very well be expressed in regular expressions.
If $from and $to are regular expressions, the expression /$from(.*)$to/
catches everything between the first match of $from and the last
match of $to. If you want to delimit from the first match of $from
to the first match of $to after that point, use a non-greedy pattern in the
middle: /from(.*?)$to/. In your case it doesn't matter:

my $from = qr/=/; # a "="
my $to = qr/\s/; # any white space
my ( $between) = $mystring =~ /$from(.*?)$to/;
print "$between\n";

That prints "xxxx".

If only it would... <g>. For some reason, I'm receiving a 1 as the return
value. I did some debugging to ensure the $to and $from values are correct
and that the string is a valid string value. I also tried using other valid
character values in the $to and $from fields to see if the '=' was the
culprit and still the expression evaluated to 1.

Alexandra
 
A

Alexandra

Brian Wakem said:
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

Rather than use a series of clunky index and substring calls, does anyone
have a better suggestion? If anyone can recommend a good Perl language
reference website (or book) that has some excellent examples of regular
expressions, that would be helpful too.


print $mystring =~ m/=([^\s]+)/;

Thanks for your reply. For some reason this also evaluated to 1, instead of
the desired substring. And there is not a "1" character in the initial
string ($mystring). I tried placing brackets [] around the equal sign and
still no dice.

The expression seems, somehow, to be evaluating for a truth value. (??)

Alexandra
 
A

Alexandra

Matija Papec said:
my($match) = $mystring =~ /=(\S+)/;

Thanks for your reply. Unfortunately, I only received a "1" as the return
value, as I also did with some of the other suggestions. I'm not sure if
this means it's evaluating to Boolean (or why). I did some debugging to
ensure $mystring is a valid text value, and it seems to be.
I've heard that "Mastering regular expression" is very good.

We do have some of the O'Reilly books in the office but not that one. Thank
you, I'll check it out.

Alexandra
 
U

Uri Guttman

A> Thanks for your reply. Unfortunately, I only received a "1" as the return
A> value, as I also did with some of the other suggestions. I'm not sure if
A> this means it's evaluating to Boolean (or why). I did some debugging to
A> ensure $mystring is a valid text value, and it seems to be.

the above will not evaluate to 1. your code is not the same as that line
of code. wanna bet you don't have () around your $match var?

A> We do have some of the O'Reilly books in the office but not that one. Thank
A> you, I'll check it out.

this is covered in perlre, perlretut and perlrequick. you need to learn
how regexes work in different contexts.

uri
 
U

Uri Guttman

A> This also evaluated to 1. I had tried this statement with and without the
A> leading 'm' and came up with the same result. I'm wondering if the
A> expression is evaluating at all or if Perl is just seeing it as a true or
A> valid expression. When I do type in something incorrect I do get a Server
A> Error. So the expression must be evaluating on some level. Any insight or
A> further suggestions would be appreciated.

and that line of code is wrong. see my other post.

uri
 
U

Uri Guttman

A> If only it would... <g>. For some reason, I'm receiving a 1 as the return
A> value. I did some debugging to ensure the $to and $from values are correct
A> and that the string is a valid string value. I also tried using other valid
A> character values in the $to and $from fields to see if the '=' was the
A> culprit and still the expression evaluated to 1.

SHOW YOUR CODE. saying it doesn't work without showing your code is a
waste of everyone's time. the above code is fine. obviously yours is not
but we can't fix it without seeing it.

uri
 
U

Uri Guttman

A" == Alexandra said:
print $mystring =~ m/=([^\s]+)/;

A> Thanks for your reply. For some reason this also evaluated to 1, instead of
A> the desired substring. And there is not a "1" character in the initial
A> string ($mystring). I tried placing brackets [] around the equal sign and
A> still no dice.

gack, you don't get context. print is returning the 1. or you are not
realizing print provides list context. this is a poor example since it
isn't clear why the grabbed string is printed.

A> The expression seems, somehow, to be evaluating for a truth value. (??)

again, SHOW YOUR CODE.

uri
 
D

Dave Saville

I just cut n pasted the code from previous posts

use strict;
use warnings;
my $mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h';
my $from = qr/=/; # a "="
my $to = qr/\s/; # any white space
my ( $between) = $mystring =~ /$from(.*?)$to/;
print "$between\n";


Prints xxxx here. Perl 5.8.0 on OS/2. Something odd your end I feel.



Regards

Dave Saville

NB switch saville for nospam in address
 
A

Alexandra

Uri Guttman said:
A> Thanks for your reply. Unfortunately, I only received a "1" as the return
A> value, as I also did with some of the other suggestions. I'm not sure if
A> this means it's evaluating to Boolean (or why). I did some debugging to
A> ensure $mystring is a valid text value, and it seems to be.

the above will not evaluate to 1. your code is not the same as that line
of code. wanna bet you don't have () around your $match var?

Yes! That was it, thank you. All of the following 'mysubstr#' expressions
now work perfectly well. (My apologies to original posters for not
understanding this aspect.)

$mystr = `grep $in{lookup_key} /AAA/bbbb/ccc/dd/eeee.ff`;
$in{mystr} = $mystr; #debug
$from = '=';
$in{myfrom} = $from; #debug
$to = ' ';
$in{myto} = $to; #debug

($in{mysubstr0}) = $mystr =~ /=(\S+)/;
($in{mysubstr1}) = $mystr =~ /=(.*?)\s/;
($in{mysubstr2}) = $mystr =~ m/[=]([^\s]+)/;
($in{mysubstr3}) = $mystr =~ /$from(.*?)$to/;


I found that most all of the regular expression examples in the Programming
Perl book use shorthand of referring to $_ (thus, not providing an
explicitly named variable at all). The examples that do refer to a named
variable have something like "($foo = $bar) =~ s/this/that;". So, I'd
assumed the parentheses were only needed when using mulitple explicitly
named variables. Good for non-beginners, otherwise difficult to decipher.
A> We do have some of the O'Reilly books in the office but not that one. Thank
A> you, I'll check it out.

this is covered in perlre, perlretut and perlrequick. you need to learn
how regexes work in different contexts.

Great, I found them on perldoc.com and will read them.

Btw, the only reference to contexts in Programming Perl focused on lists and
scalar contexts. Nowhere did I see mention of enclosing an assignment
variable in parentheses or why. Though I did find this quote: "You will be
miserable until you learn the difference between scalar and list context,
because certain operators know which context they are in, and return lists
in contexts wanting a list, and scalar values in contexts wanting a scalar."

It's all so clear now...

Anyway, thanks for replying and for the fix.

Alexandra
 
A

Alexandra

Dave Saville said:
I just cut n pasted the code from previous posts

use strict;
use warnings;
my $mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h';
my $from = qr/=/; # a "="
my $to = qr/\s/; # any white space
my ( $between) = $mystring =~ /$from(.*?)$to/;
print "$between\n";


Prints xxxx here. Perl 5.8.0 on OS/2. Something odd your end I feel.

Yes, thank you for taking the time. It was indeed the lack of () on my end.

All of the initial suggestions worked once I added them:

($in{login0}) = $user_rec =~ /=(\S+)/;
($in{login1}) = $user_rec =~ /=(.*?)\s/;
($in{login2}) = $user_rec =~ m/[=]([^\s]+)/;
($in{login3}) = $user_rec =~ /$from(.*?)$to/;

Alexandra
 
B

Brian Wakem

Alexandra said:
Brian Wakem said:
this. Here's what I'm trying to do:

$mystring = 'aaa @ bbbb @ c @ dd @ eeeeee @ FFFFF=xxxx @ ggg @ h'

The goal is to extract the 'xxxx' substring between the '=' sign and the
next whitespace character. (There is no fixed length for the 'xxxx'
substring.)

Rather than use a series of clunky index and substring calls, does anyone
have a better suggestion? If anyone can recommend a good Perl language
reference website (or book) that has some excellent examples of regular
expressions, that would be helpful too.


print $mystring =~ m/=([^\s]+)/;

Thanks for your reply. For some reason this also evaluated to 1, instead of
the desired substring. And there is not a "1" character in the initial
string ($mystring). I tried placing brackets [] around the equal sign and
still no dice.

The expression seems, somehow, to be evaluating for a truth value. (??)


All of the examples given in the replies to your original message work
absolutely fine. You are doing something you aren't telling us. Post your
code.
 
A

Alexandra

Brian Wakem said:
"Alexandra" wrote:
All of the examples given in the replies to your original message work
absolutely fine. You are doing something you aren't telling us. Post your
code.

Yes, they do work (see other replies). It was my error with lack of
parentheses around the assignment var. Next time I'll include my code in
initial replies.

Regards,

Alexandra
 
T

Tad McClellan

Alexandra said:
Btw, the only reference to contexts in Programming Perl focused on lists and
scalar contexts.


That is what we are focusing on here as well!

You needed a m// in list context to get what you wanted.

Nowhere did I see mention of enclosing an assignment
variable in parentheses or why.


From p69 in the 3rd Camel:

Assignment to a list of scalars also provides list context
to the righthand side, even if there's only one element
in the list.


($foo) = ...list context...

has only one element in the list.
 
J

John W. Krahn

Alexandra said:
Yes! That was it, thank you. All of the following 'mysubstr#' expressions
now work perfectly well. (My apologies to original posters for not
understanding this aspect.)

$mystr = `grep $in{lookup_key} /AAA/bbbb/ccc/dd/eeee.ff`;

Instead of running an external program for this you can do it in perl
and have better control of error reporting:

my $file = '/AAA/bbbb/ccc/dd/eeee.ff';
open my $fh, '<', $file or die "Cannot open $file: $!"
my $mystr = join '', grep $in{lookup_key}, <$fh>;
close $fh;

$in{mystr} = $mystr; #debug
$from = '=';
$in{myfrom} = $from; #debug
$to = ' ';
$in{myto} = $to; #debug

($in{mysubstr0}) = $mystr =~ /=(\S+)/;
($in{mysubstr1}) = $mystr =~ /=(.*?)\s/;
($in{mysubstr2}) = $mystr =~ m/[=]([^\s]+)/;
($in{mysubstr3}) = $mystr =~ /$from(.*?)$to/;

I found that most all of the regular expression examples in the Programming
Perl book use shorthand of referring to $_ (thus, not providing an
explicitly named variable at all).

Whenever you see "/regex/" or "$var = /regex/" you can expand them to
"$_ =~ /regex/" and "$var = $_ =~ /regex/" then the "$_ =~" part can be
replaced with whatever scalar variable you want.

The examples that do refer to a named
variable have something like "($foo = $bar) =~ s/this/that;".

Because the binding operator (=~) has higher precedence then the
assignment operator (=) the parenthesis are required to assign the
contents of $bar to $foo before the substitution is performed on the
result so that $foo is changed and $bar is not. Without the parenthesis
the substitution would be performed on $bar first and the result of that
(true or false) would be assigned to $foo.

So, I'd
assumed the parentheses were only needed when using mulitple explicitly
named variables. Good for non-beginners, otherwise difficult to decipher.


Great, I found them on perldoc.com and will read them.

Btw, the only reference to contexts in Programming Perl focused on lists and
scalar contexts. Nowhere did I see mention of enclosing an assignment
variable in parentheses or why. Though I did find this quote: "You will be
miserable until you learn the difference between scalar and list context,
because certain operators know which context they are in, and return lists
in contexts wanting a list, and scalar values in contexts wanting a scalar."

It's all so clear now...

Is that irony?

perlsub.pod explains a bit about the difference between scalar and list
context.

perldoc perlsub



John
 
A

Alexandra

Matija Papec said:
X-Ftn-To: Alexandra


Now when you know that you've been missing (), try to figure out why you've
received "1".

Okay, I read up a bit more (and re-read), and I still do not fully
understand the implications of scalar v. list context. And I tried to get
the expression to evaluate to 0 but achieve that either. My best guess until
I read/experiment more and it clicks, is that without the parentheses the
$match variable simply cares whether the regular expression evaluated
successfully so it returns True. The parentheses force $match to remember
the matched value.
There are also docs that come bundled with perl(perldoc perlrequick) if you
don't mind staring at your monitor. :)

Nope, don't mind. Do that all day anyway. Didn't realize they were bundled
though. Thanks again!

Alexandra
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,058
Latest member
QQXCharlot

Latest Threads

Top