search "window" pattern matching

Discussion in 'Perl Misc' started by Cheez, Jan 11, 2004.

  1. Cheez

    Cheez Guest

    Hello, hard to desribe my question in a clear way. I want to process
    a string that looks like this:

    $mystring = "thetextinherewillbefairlyrandom";

    I want to capture chunks of text and place them in an array or hash
    table. If possible, I want to make a regex that will start at the
    first letter and capture letters 1 - 5, in this case $capture =
    "thete". Then, I want this window to shift 1 letter so that the next
    captured string is letters 2 - 6, or $capture= "hetex" and so on until
    the end of the line. Can anyone offer up a sample regex would
    accomplish this task?

    Thanks,
    Cheez

    ==============================================

    My idea is this (although it doesn't work):

    $mystring = "thetextinherewillbefairlyrandom";

    $length = scalar ($mystring);

    while ($counter < $length) {

    $_ =~ /\w[$counter-$counter+4]/; # 'capture' regex

    push @newarray; $counter++; # regex capture window increments by
    1
    # pushing chunks into array
    }

    foreach (@newarray) { #sample output

    print "$newarray";

    }
    Cheez, Jan 11, 2004
    #1
    1. Advertising

  2. >>>>> "Cheez" == Cheez <> writes:

    Cheez> Hello, hard to desribe my question in a clear way. I want to process
    Cheez> a string that looks like this:

    Cheez> $mystring = "thetextinherewillbefairlyrandom";

    Cheez> I want to capture chunks of text and place them in an array or hash
    Cheez> table. If possible, I want to make a regex that will start at the
    Cheez> first letter and capture letters 1 - 5, in this case $capture =
    Cheez> "thete". Then, I want this window to shift 1 letter so that the next
    Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on until
    Cheez> the end of the line. Can anyone offer up a sample regex would
    Cheez> accomplish this task?

    Use string lookahead, so they can be overlapping:

    while ($mystring =~ /(?=.{5})/sg) {
    push @result, $1;
    }

    print "Just another Perl hacker,"

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal L. Schwartz, Jan 11, 2004
    #2
    1. Advertising

  3. Cheez

    Toby Guest

    Cheez wrote:
    > Hello, hard to desribe my question in a clear way. I want to process
    > a string that looks like this:
    >
    > $mystring = "thetextinherewillbefairlyrandom";
    >
    > I want to capture chunks of text and place them in an array or hash


    perldoc -f substr

    maybe what you're looking for.
    Toby, Jan 11, 2004
    #3
  4. Cheez

    gnari Guest

    "Randal L. Schwartz" <> wrote in message
    news:...
    > >>>>> "Cheez" == Cheez <> writes:

    >
    > Cheez> Hello, hard to desribe my question in a clear way. I want to

    process
    > Cheez> a string that looks like this:
    >
    > Cheez> $mystring = "thetextinherewillbefairlyrandom";
    >
    > Cheez> I want to capture chunks of text and place them in an array or hash
    > Cheez> table. If possible, I want to make a regex that will start at the
    > Cheez> first letter and capture letters 1 - 5, in this case $capture =
    > Cheez> "thete". Then, I want this window to shift 1 letter so that the

    next
    > Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on

    until
    > Cheez> the end of the line. Can anyone offer up a sample regex would
    > Cheez> accomplish this task?
    >
    > Use string lookahead, so they can be overlapping:
    >
    > while ($mystring =~ /(?=.{5})/sg) {
    > push @result, $1;
    > }


    or use pos(),
    or more likely, use substr()

    gnari
    gnari, Jan 11, 2004
    #4
  5. (Cheez) wrote in news:1e85f7c8.0401111026.52915a71
    @posting.google.com:

    > Hello, hard to desribe my question in a clear way. I want to process
    > a string that looks like this:
    >
    > $mystring = "thetextinherewillbefairlyrandom";
    >
    > I want to capture chunks of text and place them in an array or hash
    > table. If possible, I want to make a regex that will start at the
    > first letter and capture letters 1 - 5, in this case $capture =
    > "thete". Then, I want this window to shift 1 letter so that the next
    > captured string is letters 2 - 6, or $capture= "hetex" and so on until
    > the end of the line. Can anyone offer up a sample regex would
    > accomplish this task?
    >
    > Thanks,
    > Cheez
    >
    > ==============================================
    >
    > My idea is this (although it doesn't work):
    >
    > $mystring = "thetextinherewillbefairlyrandom";
    >
    > $length = scalar ($mystring);
    >
    > while ($counter < $length) {
    >
    > $_ =~ /\w[$counter-$counter+4]/; # 'capture' regex
    >
    > push @newarray; $counter++; # regex capture window increments by
    > 1
    > # pushing chunks into array
    > }
    >
    > foreach (@newarray) { #sample output
    >
    > print "$newarray";
    >
    > }


    Lemme take a crack at it:

    #!/usr/bin/perl
    use strict;
    use warnings;
    my $mystring = "thetextinherewillbefairlyrandom";
    # get the length of $mystring:
    my $length = length $mystring;
    # set / declare the counter:
    my $counter=0;
    # set / declare the array:
    my @newarray;
    # while the counter is less than the length of $mystring, grab bits of
    text:
    while ($counter < $length) {
    # grab 5 characters from the last position used within $mystring
    my $tempstring = substr $mystring,$counter,5;
    # dump it into @newarray:
    push @newarray,$tempstring;
    # increment the counter and loop again
    ++ $counter;
    }
    for (@newarray) {
    print "$_\n";
    }

    output:

    thete
    hetex
    etext
    texti
    extin
    xtinh
    tinhe
    inher
    nhere
    herew
    erewi
    rewil
    ewill
    willb
    illbe
    llbef
    lbefa
    befai
    efair
    fairl
    airly
    irlyr
    rlyra
    lyran
    yrand
    rando
    andom
    ndom
    dom
    om
    m

    --
    Marc Bissonnette
    CGI / Database / Web Management Tools: http://www.internalysis.com
    Something To Sell? Looking To Buy? http://www.whitewaterclassifieds.ca
    Looking for a new ISP? http://www.canadianisp.com
    Marc Bissonnette, Jan 11, 2004
    #5
  6. >>>>> "gnari" == gnari <> writes:

    >> Use string lookahead, so they can be overlapping:
    >>
    >> while ($mystring =~ /(?=.{5})/sg) {
    >> push @result, $1;
    >> }


    gnari> or use pos(),
    gnari> or more likely, use substr()

    Uh, why? Any solution with pos and substr is likely to be a lot
    more complex than this simple regex.

    Or are you of the habit of replacing simple solutions with complex
    ones for the helluvit? :)

    print "Just another Perl hacker,"

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal L. Schwartz, Jan 11, 2004
    #6
  7. Marc Bissonnette <> wrote:

    > # get the length of $mystring:
    > my $length = length $mystring;
    > # set / declare the counter:
    > my $counter=0;
    > # set / declare the array:
    > my @newarray;



    Comments that repeat what is already said in the code are worse
    than no comments.

    They are distracting, plus you have to remember to change stuff
    in 2 places, the code and the comment that repeats the code.
    (they have a very good chance of getting out-of-sync)


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jan 11, 2004
    #7
  8. Cheez

    gnari Guest

    "Randal L. Schwartz" <> wrote in message
    news:...
    > >>>>> "gnari" == gnari <> writes:

    >
    > >> Use string lookahead, so they can be overlapping:
    > >>
    > >> while ($mystring =~ /(?=.{5})/sg) {
    > >> push @result, $1;
    > >> }

    >
    > gnari> or use pos(),
    > gnari> or more likely, use substr()
    >
    > Uh, why? Any solution with pos and substr is likely to be a lot
    > more complex than this simple regex.
    >
    > Or are you of the habit of replacing simple solutions with complex
    > ones for the helluvit? :)


    sometimes :)

    I just have the impression that a substr() solution is
    easier for a beginner to understand and change, if
    necessary.
    Also, it is allways good to rub in the TMWTDI.

    On the other hand, maybe the OP really just wanted
    to know if there was a *regexp* solution. In that case,
    he will just ignore my comment.

    gnari
    gnari, Jan 11, 2004
    #8
  9. "Randal L. Schwartz" wrote:
    >
    > >>>>> "Cheez" == Cheez <> writes:

    >
    > Cheez> Hello, hard to desribe my question in a clear way. I want to process
    > Cheez> a string that looks like this:
    >
    > Cheez> $mystring = "thetextinherewillbefairlyrandom";
    >
    > Cheez> I want to capture chunks of text and place them in an array or hash
    > Cheez> table. If possible, I want to make a regex that will start at the
    > Cheez> first letter and capture letters 1 - 5, in this case $capture =
    > Cheez> "thete". Then, I want this window to shift 1 letter so that the next
    > Cheez> captured string is letters 2 - 6, or $capture= "hetex" and so on until
    > Cheez> the end of the line. Can anyone offer up a sample regex would
    > Cheez> accomplish this task?
    >
    > Use string lookahead, so they can be overlapping:
    >
    > while ($mystring =~ /(?=.{5})/sg) {
    > push @result, $1;
    > }


    (?=) doesn't capture. You probably meant /(?=(.{5}))/sg


    :)

    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Jan 11, 2004
    #9
  10. Cheez

    Cheez Guest

    Blown away at how useful c.l.p.m is for a newbie perl dude. I thanks
    all again for the replies. I think Gnari made a point about $substr
    being easier to understand for newbies... Yes! I have Java
    background so it's always nice to see a friendly face (substring)!

    God is in the regex's though ;)

    Cheers,
    Cheez

    (Cheez) wrote in message news:<>...
    > Hello, hard to desribe my question in a clear way. I want to process
    > a string that looks like this:

    [snip]
    Cheez, Jan 12, 2004
    #10
  11. >>>>> "John" == John W Krahn <> writes:

    John> (?=) doesn't capture. You probably meant /(?=(.{5}))/sg

    Brainlapse. yes. Thanks.

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal L. Schwartz, Jan 12, 2004
    #11
  12. Cheez

    gnari Guest

    "Cheez" <> wrote in message
    news:...
    > Blown away at how useful c.l.p.m is for a newbie perl dude. I thanks
    > all again for the replies. I think Gnari made a point about $substr


    minor nitpick #1: it is substr() not $substr (function, not vatiable)

    > being easier to understand for newbies... Yes! I have Java
    > background so it's always nice to see a friendly face (substring)!
    >
    > God is in the regex's though ;)


    indeed.

    > (Cheez) wrote in message

    news:<>...
    > > Hello, hard to desribe my question in a clear way. I want to process
    > > a string that looks like this:

    > [snip]


    minor nitpick #2:
    what you did here is called top-posting: you made a follow-up,
    and quoted the message you are following-up on below.
    this practice is frowned-upon in this newsgroup.
    this case it is not serious, because you did not actually quote the whole
    article below.

    gnari
    gnari, Jan 12, 2004
    #12
  13. Cheez

    Anno Siegel Guest

    Randal L. Schwartz <> wrote in comp.lang.perl.misc:
    > >>>>> "gnari" == gnari <> writes:

    >
    > >> Use string lookahead, so they can be overlapping:
    > >>
    > >> while ($mystring =~ /(?=.{5})/sg) {
    > >> push @result, $1;
    > >> }

    >
    > gnari> or use pos(),
    > gnari> or more likely, use substr()
    >
    > Uh, why? Any solution with pos and substr is likely to be a lot
    > more complex than this simple regex.
    >
    > Or are you of the habit of replacing simple solutions with complex
    > ones for the helluvit? :)


    Are you? Why loop when list context does the same thing?

    my @result2 = $mystring =~ /(?=(.{5}))/sg;

    Anno
    Anno Siegel, Jan 12, 2004
    #13
  14. (Tad McClellan) wrote in
    news::

    > Marc Bissonnette <> wrote:
    >
    >> # get the length of $mystring:
    >> my $length = length $mystring;
    >> # set / declare the counter:
    >> my $counter=0;
    >> # set / declare the array:
    >> my @newarray;

    >
    >
    > Comments that repeat what is already said in the code are worse
    > than no comments.
    >
    > They are distracting, plus you have to remember to change stuff
    > in 2 places, the code and the comment that repeats the code.
    > (they have a very good chance of getting out-of-sync)


    Good point; I was trying to be extra-thorough in showing the OP what I was
    trying to do (which was, of course, way longer than Randall's one-liner).

    I comment my own code usually with only a single comment for each
    subroutine, or blocks that I know I'd need a reminder on in the future.

    Out of curiosity, is there a resource or guideline on the web for 'proper'
    perl commenting ?

    A google search for
    perl "proper comment" code
    didn't seem to turn anything up that was completely relevant.



    --
    Marc Bissonnette
    CGI / Database / Web Management Tools: http://www.internalysis.com
    Something To Sell? Looking To Buy? http://www.whitewaterclassifieds.ca
    Looking for a new ISP? http://www.canadianisp.com
    Marc Bissonnette, Jan 12, 2004
    #14
  15. Marc Bissonnette <> wrote in
    news:Xns946E6152E70B8dragnetinternalysisc@206.172.150.14:

    > Out of curiosity, is there a resource or guideline on the web for
    > 'proper' perl commenting ?
    >
    > A google search for
    > perl "proper comment" code
    > didn't seem to turn anything up that was completely relevant.


    How about perldoc perlstyle?

    --
    A. Sinan Unur
    (reverse each component for email address)
    A. Sinan Unur, Jan 12, 2004
    #15
  16. "A. Sinan Unur" <> wrote in
    news:Xns946E6CC421675asu1cornelledu@132.236.56.8:

    > Marc Bissonnette <> wrote in
    > news:Xns946E6152E70B8dragnetinternalysisc@206.172.150.14:
    >
    >> Out of curiosity, is there a resource or guideline on the web for
    >> 'proper' perl commenting ?
    >>
    >> A google search for
    >> perl "proper comment" code
    >> didn't seem to turn anything up that was completely relevant.

    >
    > How about perldoc perlstyle?


    Thank you - Reading it now :)

    --
    Marc Bissonnette
    CGI / Database / Web Management Tools: http://www.internalysis.com
    Something To Sell? Looking To Buy? http://www.whitewaterclassifieds.ca
    Looking for a new ISP? http://www.canadianisp.com
    Marc Bissonnette, Jan 12, 2004
    #16
  17. Marc Bissonnette <> wrote in
    news:Xns946E6DBAEB8AFdragnetinternalysisc@207.35.177.135:

    > "A. Sinan Unur" <> wrote in
    > news:Xns946E6CC421675asu1cornelledu@132.236.56.8:
    >
    >> Marc Bissonnette <> wrote in
    >> news:Xns946E6152E70B8dragnetinternalysisc@206.172.150.14:
    >>
    >>> Out of curiosity, is there a resource or guideline on the web for
    >>> 'proper' perl commenting ?
    >>>
    >>> A google search for
    >>> perl "proper comment" code
    >>> didn't seem to turn anything up that was completely relevant.

    >>
    >> How about perldoc perlstyle?

    >
    > Thank you - Reading it now :)


    Well, I must have been confused because it says nothing about comments. I
    found the following page the contents of which I thought came from
    perldoc perlstyle.

    http://www.perl.com/language/style/slide5.html



    --
    A. Sinan Unur
    (reverse each component for email address)
    A. Sinan Unur, Jan 12, 2004
    #17
  18. "A. Sinan Unur" <> wrote in
    news:Xns946E720126D96asu1cornelledu@132.236.56.8:

    > Marc Bissonnette <> wrote in
    > news:Xns946E6DBAEB8AFdragnetinternalysisc@207.35.177.135:
    >
    >> "A. Sinan Unur" <> wrote in
    >> news:Xns946E6CC421675asu1cornelledu@132.236.56.8:
    >>
    >>> Marc Bissonnette <> wrote in
    >>> news:Xns946E6152E70B8dragnetinternalysisc@206.172.150.14:
    >>>
    >>>> Out of curiosity, is there a resource or guideline on the web for
    >>>> 'proper' perl commenting ?
    >>>>
    >>>> A google search for
    >>>> perl "proper comment" code
    >>>> didn't seem to turn anything up that was completely relevant.
    >>>
    >>> How about perldoc perlstyle?

    >>
    >> Thank you - Reading it now :)

    >
    > Well, I must have been confused because it says nothing about
    > comments. I found the following page the contents of which I thought
    > came from perldoc perlstyle.
    >
    > http://www.perl.com/language/style/slide5.html


    I think that bit is complimentary to perldoc perlstyle - or the other way
    around. From what I get out of the two - if one follows the advice of
    perldoc perlstyle along with decent perl itself, then excessive, or even
    frequent, comments should be completely avoidable, as they will be
    unnecessary.



    --
    Marc Bissonnette
    CGI / Database / Web Management Tools: http://www.internalysis.com
    Something To Sell? Looking To Buy? http://www.whitewaterclassifieds.ca
    Looking for a new ISP? http://www.canadianisp.com
    Marc Bissonnette, Jan 12, 2004
    #18
  19. Cheez

    Ben Morrow Guest

    [article references removed 'cos it was getting silly :)]

    Marc Bissonnette <> wrote:
    > "A. Sinan Unur" <> wrote in
    > > Marc Bissonnette <> wrote in
    > >> "A. Sinan Unur" <> wrote in
    > >>> Marc Bissonnette <> wrote in
    > >>>
    > >>>> Out of curiosity, is there a resource or guideline on the web for
    > >>>> 'proper' perl commenting ?
    > >>>>
    > >>>> A google search for
    > >>>> perl "proper comment" code
    > >>>> didn't seem to turn anything up that was completely relevant.
    > >>>
    > >>> How about perldoc perlstyle?
    > >>
    > >> Thank you - Reading it now :)

    > >
    > > Well, I must have been confused because it says nothing about
    > > comments. I found the following page the contents of which I thought
    > > came from perldoc perlstyle.
    > >
    > > http://www.perl.com/language/style/slide5.html

    >
    > I think that bit is complimentary to perldoc perlstyle - or the other way
    > around. From what I get out of the two - if one follows the advice of
    > perldoc perlstyle along with decent perl itself, then excessive, or even
    > frequent, comments should be completely avoidable, as they will be
    > unnecessary.


    This was written wrt C, not Perl, but I tend to follow this from
    /usr/src/linux/Documentation/CodingStyle:
    | Chapter 5: Commenting
    |
    | Comments are good, but there is also a danger of over-commenting.
    | NEVER try to explain HOW your code works in a comment: it's much
    | better to write the code so that the _working_ is obvious, and it's
    | a waste of time to explain badly written code.
    |
    | Generally, you want your comments to tell WHAT your code does, not
    | HOW. Also, try to avoid putting comments inside a function body: if
    | the function is so complex that you need to separately comment parts
    | of it, you should probably go back to chapter 4 for a while. You
    | can make small comments to note or warn about something particularly
    | clever (or ugly), but try to avoid excess. Instead, put the
    | comments at the head of the function, telling people what it does,
    | and possibly WHY it does it.

    Ben

    --
    perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
    qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
    1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
    2047502190/' #
    Ben Morrow, Jan 12, 2004
    #19
  20. Ben Morrow <> wrote in
    news:btupjv$bi0$:

    > [article references removed 'cos it was getting silly :)]
    >
    > Marc Bissonnette <> wrote:
    >> "A. Sinan Unur" <> wrote in
    >> > Marc Bissonnette <> wrote in
    >> >> "A. Sinan Unur" <> wrote in
    >> >>> Marc Bissonnette <> wrote in
    >> >>>
    >> >>>> Out of curiosity, is there a resource or guideline on the web
    >> >>>> for 'proper' perl commenting ?
    >> >>>>
    >> >>>> A google search for
    >> >>>> perl "proper comment" code
    >> >>>> didn't seem to turn anything up that was completely relevant.
    >> >>>
    >> >>> How about perldoc perlstyle?
    >> >>
    >> >> Thank you - Reading it now :)
    >> >
    >> > Well, I must have been confused because it says nothing about
    >> > comments. I found the following page the contents of which I
    >> > thought came from perldoc perlstyle.
    >> >
    >> > http://www.perl.com/language/style/slide5.html

    >>
    >> I think that bit is complimentary to perldoc perlstyle - or the other
    >> way around. From what I get out of the two - if one follows the
    >> advice of perldoc perlstyle along with decent perl itself, then
    >> excessive, or even frequent, comments should be completely avoidable,
    >> as they will be unnecessary.

    >
    > This was written wrt C, not Perl, but I tend to follow this from
    > /usr/src/linux/Documentation/CodingStyle:
    >| Chapter 5: Commenting
    >|
    >| Comments are good, but there is also a danger of over-commenting.
    >| NEVER try to explain HOW your code works in a comment: it's much
    >| better to write the code so that the _working_ is obvious, and it's
    >| a waste of time to explain badly written code.
    >|
    >| Generally, you want your comments to tell WHAT your code does, not
    >| HOW. Also, try to avoid putting comments inside a function body: if
    >| the function is so complex that you need to separately comment parts
    >| of it, you should probably go back to chapter 4 for a while. You
    >| can make small comments to note or warn about something particularly
    >| clever (or ugly), but try to avoid excess. Instead, put the
    >| comments at the head of the function, telling people what it does,
    >| and possibly WHY it does it.


    That's a good guideline and pretty much what I've been following to date
    - i.e. comments at the beginning of subroutines that go into more detail
    that what the subroutine name already suggests.

    My over-commenting in the NG was my own fault - should have known better
    to simply follow what works best in the real app, too :)

    I'm going to re-review the perldoc perlstyle, just to see if there's
    anything I've been missing. Overall, I think my code is fairly decent - I
    can go back to almost all of my stuff over the years and still have
    relatively little problems understanding what I was getting at in the
    code, even if I've since learned much more efficient manners of doing it.



    --
    Marc Bissonnette
    CGI / Database / Web Management Tools: http://www.internalysis.com
    Something To Sell? Looking To Buy? http://www.whitewaterclassifieds.ca
    Looking for a new ISP? http://www.canadianisp.com
    Marc Bissonnette, Jan 12, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DelphiDude
    Replies:
    3
    Views:
    1,160
  2. Abby Lee
    Replies:
    5
    Views:
    378
    Abby Lee
    Aug 2, 2004
  3. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    220
    Marc Bissonnette
    Jan 13, 2004
  4. Replies:
    1
    Views:
    90
    Tad McClellan
    Apr 6, 2007
  5. Bobby Chamness
    Replies:
    2
    Views:
    215
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page