search "window" pattern matching

C

Cheez

Hello, hard to desribe my question in a clear way. I want to process
a string that looks like this:

$mystring = "thetextinherewillbefairlyrandom";

I want to capture chunks of text and place them in an array or hash
table. If possible, I want to make a regex that will start at the
first letter and capture letters 1 - 5, in this case $capture =
"thete". Then, I want this window to shift 1 letter so that the next
captured string is letters 2 - 6, or $capture= "hetex" and so on until
the end of the line. Can anyone offer up a sample regex would
accomplish this task?

Thanks,
Cheez

==============================================

My idea is this (although it doesn't work):

$mystring = "thetextinherewillbefairlyrandom";

$length = scalar ($mystring);

while ($counter < $length) {

$_ =~ /\w[$counter-$counter+4]/; # 'capture' regex

push @newarray; $counter++; # regex capture window increments by
1
# pushing chunks into array
}

foreach (@newarray) { #sample output

print "$newarray";

}
 
R

Randal L. Schwartz

Cheez> Hello, hard to desribe my question in a clear way. I want to process
Cheez> a string that looks like this:

Cheez> $mystring = "thetextinherewillbefairlyrandom";

Cheez> I want to capture chunks of text and place them in an array or hash
Cheez> table. If possible, I want to make a regex that will start at the
Cheez> first letter and capture letters 1 - 5, in this case $capture =
Cheez> "thete". Then, I want this window to shift 1 letter so that the next
Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on until
Cheez> the end of the line. Can anyone offer up a sample regex would
Cheez> accomplish this task?

Use string lookahead, so they can be overlapping:

while ($mystring =~ /(?=.{5})/sg) {
push @result, $1;
}

print "Just another Perl hacker,"
 
T

Toby

Cheez said:
Hello, hard to desribe my question in a clear way. I want to process
a string that looks like this:

$mystring = "thetextinherewillbefairlyrandom";

I want to capture chunks of text and place them in an array or hash

perldoc -f substr

maybe what you're looking for.
 
G

gnari

Randal L. Schwartz said:
Cheez> Hello, hard to desribe my question in a clear way. I want to process
Cheez> a string that looks like this:

Cheez> $mystring = "thetextinherewillbefairlyrandom";

Cheez> I want to capture chunks of text and place them in an array or hash
Cheez> table. If possible, I want to make a regex that will start at the
Cheez> first letter and capture letters 1 - 5, in this case $capture =
Cheez> "thete". Then, I want this window to shift 1 letter so that the next
Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on until
Cheez> the end of the line. Can anyone offer up a sample regex would
Cheez> accomplish this task?

Use string lookahead, so they can be overlapping:

while ($mystring =~ /(?=.{5})/sg) {
push @result, $1;
}

or use pos(),
or more likely, use substr()

gnari
 
M

Marc Bissonnette

(e-mail address removed) (Cheez) wrote in @posting.google.com:
Hello, hard to desribe my question in a clear way. I want to process
a string that looks like this:

$mystring = "thetextinherewillbefairlyrandom";

I want to capture chunks of text and place them in an array or hash
table. If possible, I want to make a regex that will start at the
first letter and capture letters 1 - 5, in this case $capture =
"thete". Then, I want this window to shift 1 letter so that the next
captured string is letters 2 - 6, or $capture= "hetex" and so on until
the end of the line. Can anyone offer up a sample regex would
accomplish this task?

Thanks,
Cheez

==============================================

My idea is this (although it doesn't work):

$mystring = "thetextinherewillbefairlyrandom";

$length = scalar ($mystring);

while ($counter < $length) {

$_ =~ /\w[$counter-$counter+4]/; # 'capture' regex

push @newarray; $counter++; # regex capture window increments by
1
# pushing chunks into array
}

foreach (@newarray) { #sample output

print "$newarray";

}

Lemme take a crack at it:

#!/usr/bin/perl
use strict;
use warnings;
my $mystring = "thetextinherewillbefairlyrandom";
# get the length of $mystring:
my $length = length $mystring;
# set / declare the counter:
my $counter=0;
# set / declare the array:
my @newarray;
# while the counter is less than the length of $mystring, grab bits of
text:
while ($counter < $length) {
# grab 5 characters from the last position used within $mystring
my $tempstring = substr $mystring,$counter,5;
# dump it into @newarray:
push @newarray,$tempstring;
# increment the counter and loop again
++ $counter;
}
for (@newarray) {
print "$_\n";
}

output:

thete
hetex
etext
texti
extin
xtinh
tinhe
inher
nhere
herew
erewi
rewil
ewill
willb
illbe
llbef
lbefa
befai
efair
fairl
airly
irlyr
rlyra
lyran
yrand
rando
andom
ndom
dom
om
m
 
R

Randal L. Schwartz

gnari> or use pos(),
gnari> or more likely, use substr()

Uh, why? Any solution with pos and substr is likely to be a lot
more complex than this simple regex.

Or are you of the habit of replacing simple solutions with complex
ones for the helluvit? :)

print "Just another Perl hacker,"
 
T

Tad McClellan

Marc Bissonnette said:
# get the length of $mystring:
my $length = length $mystring;
# set / declare the counter:
my $counter=0;
# set / declare the array:
my @newarray;


Comments that repeat what is already said in the code are worse
than no comments.

They are distracting, plus you have to remember to change stuff
in 2 places, the code and the comment that repeats the code.
(they have a very good chance of getting out-of-sync)
 
G

gnari

Randal L. Schwartz said:
gnari> or use pos(),
gnari> or more likely, use substr()

Uh, why? Any solution with pos and substr is likely to be a lot
more complex than this simple regex.

Or are you of the habit of replacing simple solutions with complex
ones for the helluvit? :)

sometimes :)

I just have the impression that a substr() solution is
easier for a beginner to understand and change, if
necessary.
Also, it is allways good to rub in the TMWTDI.

On the other hand, maybe the OP really just wanted
to know if there was a *regexp* solution. In that case,
he will just ignore my comment.

gnari
 
J

John W. Krahn

Randal L. Schwartz said:
Cheez> Hello, hard to desribe my question in a clear way. I want to process
Cheez> a string that looks like this:

Cheez> $mystring = "thetextinherewillbefairlyrandom";

Cheez> I want to capture chunks of text and place them in an array or hash
Cheez> table. If possible, I want to make a regex that will start at the
Cheez> first letter and capture letters 1 - 5, in this case $capture =
Cheez> "thete". Then, I want this window to shift 1 letter so that the next
Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on until
Cheez> the end of the line. Can anyone offer up a sample regex would
Cheez> accomplish this task?

Use string lookahead, so they can be overlapping:

while ($mystring =~ /(?=.{5})/sg) {
push @result, $1;
}

(?=) doesn't capture. You probably meant /(?=(.{5}))/sg


:)

John
 
C

Cheez

Blown away at how useful c.l.p.m is for a newbie perl dude. I thanks
all again for the replies. I think Gnari made a point about $substr
being easier to understand for newbies... Yes! I have Java
background so it's always nice to see a friendly face (substring)!

God is in the regex's though ;)

Cheers,
Cheez

Hello, hard to desribe my question in a clear way. I want to process
a string that looks like this:
[snip]
 
R

Randal L. Schwartz

John> (?=) doesn't capture. You probably meant /(?=(.{5}))/sg

Brainlapse. yes. Thanks.
 
G

gnari

Cheez said:
Blown away at how useful c.l.p.m is for a newbie perl dude. I thanks
all again for the replies. I think Gnari made a point about $substr

minor nitpick #1: it is substr() not $substr (function, not vatiable)
being easier to understand for newbies... Yes! I have Java
background so it's always nice to see a friendly face (substring)!

God is in the regex's though ;)
indeed.

(e-mail address removed) (Cheez) wrote in message
Hello, hard to desribe my question in a clear way. I want to process
a string that looks like this:
[snip]

minor nitpick #2:
what you did here is called top-posting: you made a follow-up,
and quoted the message you are following-up on below.
this practice is frowned-upon in this newsgroup.
this case it is not serious, because you did not actually quote the whole
article below.

gnari
 
A

Anno Siegel

Randal L. Schwartz said:
gnari> or use pos(),
gnari> or more likely, use substr()

Uh, why? Any solution with pos and substr is likely to be a lot
more complex than this simple regex.

Or are you of the habit of replacing simple solutions with complex
ones for the helluvit? :)

Are you? Why loop when list context does the same thing?

my @result2 = $mystring =~ /(?=(.{5}))/sg;

Anno
 
M

Marc Bissonnette

(e-mail address removed) (Tad McClellan) wrote in
Comments that repeat what is already said in the code are worse
than no comments.

They are distracting, plus you have to remember to change stuff
in 2 places, the code and the comment that repeats the code.
(they have a very good chance of getting out-of-sync)

Good point; I was trying to be extra-thorough in showing the OP what I was
trying to do (which was, of course, way longer than Randall's one-liner).

I comment my own code usually with only a single comment for each
subroutine, or blocks that I know I'd need a reminder on in the future.

Out of curiosity, is there a resource or guideline on the web for 'proper'
perl commenting ?

A google search for
perl "proper comment" code
didn't seem to turn anything up that was completely relevant.
 
A

A. Sinan Unur

Out of curiosity, is there a resource or guideline on the web for
'proper' perl commenting ?

A google search for
perl "proper comment" code
didn't seem to turn anything up that was completely relevant.

How about perldoc perlstyle?
 
M

Marc Bissonnette

Well, I must have been confused because it says nothing about
comments. I found the following page the contents of which I thought
came from perldoc perlstyle.

http://www.perl.com/language/style/slide5.html

I think that bit is complimentary to perldoc perlstyle - or the other way
around. From what I get out of the two - if one follows the advice of
perldoc perlstyle along with decent perl itself, then excessive, or even
frequent, comments should be completely avoidable, as they will be
unnecessary.
 
B

Ben Morrow

[article references removed 'cos it was getting silly :)]

Marc Bissonnette said:
I think that bit is complimentary to perldoc perlstyle - or the other way
around. From what I get out of the two - if one follows the advice of
perldoc perlstyle along with decent perl itself, then excessive, or even
frequent, comments should be completely avoidable, as they will be
unnecessary.

This was written wrt C, not Perl, but I tend to follow this from
/usr/src/linux/Documentation/CodingStyle:
| Chapter 5: Commenting
|
| Comments are good, but there is also a danger of over-commenting.
| NEVER try to explain HOW your code works in a comment: it's much
| better to write the code so that the _working_ is obvious, and it's
| a waste of time to explain badly written code.
|
| Generally, you want your comments to tell WHAT your code does, not
| HOW. Also, try to avoid putting comments inside a function body: if
| the function is so complex that you need to separately comment parts
| of it, you should probably go back to chapter 4 for a while. You
| can make small comments to note or warn about something particularly
| clever (or ugly), but try to avoid excess. Instead, put the
| comments at the head of the function, telling people what it does,
| and possibly WHY it does it.

Ben
 
M

Marc Bissonnette

[article references removed 'cos it was getting silly :)]

Marc Bissonnette said:
I think that bit is complimentary to perldoc perlstyle - or the other
way around. From what I get out of the two - if one follows the
advice of perldoc perlstyle along with decent perl itself, then
excessive, or even frequent, comments should be completely avoidable,
as they will be unnecessary.

This was written wrt C, not Perl, but I tend to follow this from
/usr/src/linux/Documentation/CodingStyle:
| Chapter 5: Commenting
|
| Comments are good, but there is also a danger of over-commenting.
| NEVER try to explain HOW your code works in a comment: it's much
| better to write the code so that the _working_ is obvious, and it's
| a waste of time to explain badly written code.
|
| Generally, you want your comments to tell WHAT your code does, not
| HOW. Also, try to avoid putting comments inside a function body: if
| the function is so complex that you need to separately comment parts
| of it, you should probably go back to chapter 4 for a while. You
| can make small comments to note or warn about something particularly
| clever (or ugly), but try to avoid excess. Instead, put the
| comments at the head of the function, telling people what it does,
| and possibly WHY it does it.

That's a good guideline and pretty much what I've been following to date
- i.e. comments at the beginning of subroutines that go into more detail
that what the subroutine name already suggests.

My over-commenting in the NG was my own fault - should have known better
to simply follow what works best in the real app, too :)

I'm going to re-review the perldoc perlstyle, just to see if there's
anything I've been missing. Overall, I think my code is fairly decent - I
can go back to almost all of my stuff over the years and still have
relatively little problems understanding what I was getting at in the
code, even if I've since learned much more efficient manners of doing it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top