s modifier doesn't seem to work

Discussion in 'Perl Misc' started by fmassion@web.de, Aug 10, 2013.

  1. Guest

    Hi everybody,

    I am currently testing a string search over line breaks.

    My file is UTF-8 encoded.

    This is my test text (with linebreaks at the end):
    ----------
    Das ist ein Beispiel mit 3 Sätzen
    Das ist ein 1122-22-11 Format
    Hier ist keine Zahl.
    Hier ist kein Punkt
    nur Text Hier ist nur Text ist aber nur Text
    ----------

    This is a code extract:

    foreach $satz (@satz) {
    chomp $satz;
    if ($satz =~ m/\d(?s)(.*)keine/g) {
    $satz =~ s/$&/xxxx/g;
    }
    print "$satz\n";
    }

    I would expect the following result for the first three lines:
    'Das ist ein Beispiel mit xxxxx Zahl.'

    With this search string, I get however no match. I have entered the same expression in UltraEdit (Regex-Perl-Search) and it works correctly.

    What is wrong here?
    , Aug 10, 2013
    #1
    1. Advertising

  2. On 2013-08-10 09:16, <> wrote:
    > I am currently testing a string search over line breaks.

    [...]
    > This is my test text (with linebreaks at the end):
    > ----------
    > Das ist ein Beispiel mit 3 Sätzen
    > Das ist ein 1122-22-11 Format
    > Hier ist keine Zahl.
    > Hier ist kein Punkt
    > nur Text Hier ist nur Text ist aber nur Text
    > ----------

    [...]
    > if ($satz =~ m/\d(?s)(.*)keine/g) {

    [...]
    > With this search string, I get however no match.

    [...]
    > What is wrong here?


    Read the section "Modifiers" in perldoc perlre.

    hp

    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Aug 10, 2013
    #2
    1. Advertising

  3. > I would expect the following result for the first three lines:
    > 'Das ist ein Beispiel mit xxxxx Zahl.'
    >
    > With this search string, I get however no match. I have entered the same expression in UltraEdit (Regex-Perl-Search) and it works correctly.
    >
    > What is wrong here?
    >




    while (<DATA>)
    {
    s/(\d|-|keine)+/xxxx/g;
    print "$_"
    }

    __DATA__
    Das ist ein Beispiel mit 3 Sätzen
    Das ist ein 1122-22-11 Format
    Hier ist keine Zahl.
    Hier ist kein Punkt
    nur Text Hier ist nur Text ist aber nur Text
    George Mpouras, Aug 10, 2013
    #3
  4. On 2013-08-10 11:17, Ben Morrow <> wrote:
    > Quoth "Peter J. Holzer" <>:
    >> On 2013-08-10 09:16, <> wrote:
    >> > if ($satz =~ m/\d(?s)(.*)keine/g) {

    >> [...]
    >> > With this search string, I get however no match.

    >> [...]
    >> > What is wrong here?

    >>
    >> Read the section "Modifiers" in perldoc perlre.

    >
    > Read the section '(?adlupimsx-imsx)' in perldoc perlre :).


    I've cancelled that article. Either I wasn't fast enough or your
    Newsserver doesn't honor cancels (without cancel-lock).

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Aug 10, 2013
    #4
  5. Guest

    I think Ben has the right hint. Indeed I read the file into the array (@satz) and then I go
    'foreach $satz (@satz)'
    Geaorge's code doesn't work though. It returns the following result for thefirst 3 lines:

    Das ist ein Beispiel mit xxxx Sätzen
    Das ist ein xxxx Format
    Hier ist xxxx Zahl.

    The solution is still pending but thanks for the help.

    Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :
    > Hi everybody,
    >
    >
    >
    > I am currently testing a string search over line breaks.
    >
    >
    >
    > My file is UTF-8 encoded.
    >
    >
    >
    > This is my test text (with linebreaks at the end):
    >
    > ----------
    >
    > Das ist ein Beispiel mit 3 Sätzen
    >
    > Das ist ein 1122-22-11 Format
    >
    > Hier ist keine Zahl.
    >
    > Hier ist kein Punkt
    >
    > nur Text Hier ist nur Text ist aber nur Text
    >
    > ----------
    >
    >
    >
    > This is a code extract:
    >
    >
    >
    > foreach $satz (@satz) {
    >
    > chomp $satz;
    >
    > if ($satz =~ m/\d(?s)(.*)keine/g) {
    >
    > $satz =~ s/$&/xxxx/g;
    >
    > }
    >
    > print "$satz\n";
    >
    > }
    >
    >
    >
    > I would expect the following result for the first three lines:
    >
    > 'Das ist ein Beispiel mit xxxxx Zahl.'
    >
    >
    >
    > With this search string, I get however no match. I have entered the same expression in UltraEdit (Regex-Perl-Search) and it works correctly.
    >
    >
    >
    > What is wrong here?
    , Aug 10, 2013
    #5
  6. Guest

    This works as expected, but I don't quite understand what happens


    undef $/;
    while (<DATA>) {
    chomp;
    print "$_<<\n";
    s/\d(.*)Zahl/xxxx/sg;
    print "\n$_\n"
    }
    It searches over the first 3 lines and outputs as expected:
    'Das ist ein Beispiel mit xxxx'


    Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :
    > Hi everybody,
    >
    >
    >
    > I am currently testing a string search over line breaks.
    >
    >
    >
    > My file is UTF-8 encoded.
    >
    >
    >
    > This is my test text (with linebreaks at the end):
    >
    > ----------
    >
    > Das ist ein Beispiel mit 3 Sätzen
    >
    > Das ist ein 1122-22-11 Format
    >
    > Hier ist keine Zahl.
    >
    > Hier ist kein Punkt
    >
    > nur Text Hier ist nur Text ist aber nur Text
    >
    > ----------
    >
    >
    >
    > This is a code extract:
    >
    >
    >
    > foreach $satz (@satz) {
    >
    > chomp $satz;
    >
    > if ($satz =~ m/\d(?s)(.*)keine/g) {
    >
    > $satz =~ s/$&/xxxx/g;
    >
    > }
    >
    > print "$satz\n";
    >
    > }
    >
    >
    >
    > I would expect the following result for the first three lines:
    >
    > 'Das ist ein Beispiel mit xxxxx Zahl.'
    >
    >
    >
    > With this search string, I get however no match. I have entered the same expression in UltraEdit (Regex-Perl-Search) and it works correctly.
    >
    >
    >
    > What is wrong here?
    , Aug 10, 2013
    #6
  7. >>
    >>
    >> What is wrong here?

    >


    please explain again more detailed the requirements. I can not
    understand what you expect
    George Mpouras, Aug 10, 2013
    #7
  8. On 8/10/2013 8:39 AM, wrote:
    > This works as expected, but I don't quite understand what happens
    >
    >
    > undef $/;


    > while (<DATA>) {
    > chomp;
    > print "$_<<\n";
    > s/\d(.*)Zahl/xxxx/sg;
    > print "\n$_\n"
    > }
    > It searches over the first 3 lines and outputs as expected:
    > 'Das ist ein Beispiel mit xxxx'
    >
    >


    See: perldoc perlvar --> $/

    See: perldoc perlretut --> why '.' matches everything but "\n"
    or
    See: perldoc perlre -> Modifiers --> s Treat string as single line

    --
    Charles DeRykus
    Charles DeRykus, Aug 10, 2013
    #8
  9. Guest

    Am Samstag, 10. August 2013 21:57:07 UTC+2 schrieb Ben Morrow:
    > [Please quote properly: that is, put your reply underneath the bit of
    >
    > text you are replying to. It's also not helpful to keep replying to
    >
    > yourself; instead you should reply to the article you are, um, replying
    >
    > to. You appear to be using Google Groups, which has recently started
    >
    > inserting extra blank lines whenever it quotes something; if you can't
    >
    > find any way of turning this off you need to remove them by hand before
    >
    > posting.]
    >
    >
    >
    > Quoth :
    >
    > > Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :

    >
    > > >

    >
    > > > I am currently testing a string search over line breaks.

    >
    > [...]
    >
    > > >

    >
    > > > This is a code extract:

    >
    > > >

    >
    > > > foreach $satz (@satz) {

    >
    > > > chomp $satz;

    >
    > > > if ($satz =~ m/\d(?s)(.*)keine/g) {

    >
    > > > $satz =~ s/$&/xxxx/g;

    >
    > > > }

    >
    > > > print "$satz\n";

    >
    > > > }

    >
    > > >

    >
    > > >

    >
    > > >

    >
    > > > I would expect the following result for the first three lines:

    >
    > > > 'Das ist ein Beispiel mit xxxxx Zahl.'

    >
    > > >

    >
    > > > With this search string, I get however no match. I have entered the

    >
    > >

    >
    > > This works as expected, but I don't quite understand what happens

    >
    > >

    >
    > > undef $/;

    >
    >
    >
    > This is documented in perldoc perlvar, under $/. Setting $/ to undef
    >
    > causes <> to read the whole file in one go. This means you now have your
    >
    > whole file in one string, so the s/// works over multiple lines.
    >
    >
    >
    > > while (<DATA>) {

    >
    >
    >
    > Since you are reading the whole file, there will only ever be one entry
    >
    > to loop over, so you don't really need a loop.
    >
    >
    >
    > > chomp;

    >
    >
    >
    > With $/=undef chomp doesn't do anything.
    >
    >
    >
    > > print "$_<<\n";

    >
    > > s/\d(.*)Zahl/xxxx/sg;

    >
    > > print "\n$_\n"

    >
    > > }

    >
    > > It searches over the first 3 lines and outputs as expected:

    >
    > > 'Das ist ein Beispiel mit xxxx'

    >
    >
    >
    > Since you're only doing one substitution it would be better to use an
    >
    > ordinary named variable and no loop:
    >
    >
    >
    > my $text = <DATA>;
    >
    > print "$text<<\n";
    >
    >
    >
    > $text =~ s/\d(.*)Zahl/xxxx/sg;
    >
    > print "\n$text\n";
    >
    >
    >
    > Ben


    [Sorry for not replying properly. I hope this is OK now]

    I understand what 'undef $/' does but it seems to be a workaround. Basically my goal is:

    1) Read a text in an array
    2) Iterate through the variables of the array: 'foreach $satz (@satz)'
    3) Test various search and replace Regex (as a matter of fact I am working through the Regex Cookbook of Jan Goyvaerts & Steven Levithan). In this context, one of several tests concerns the s modifier. I just wonder why it isn't possible to search for an expressions which spread over more than one line if I add this modifier. It works in UltraEdit. It works in a few other tools as well but I can't make it function in my perl script. If I use the undefine-workaround, other search expressions (e.g. with $ to mark the end of the string) won't work.

    In one of the tools I use (Expresso), I see that the EOL is coded as [CR][LF]. Is this a reason for the problem with the s modifier?
    , Aug 11, 2013
    #9
  10. On 2013-08-11 09:49, <> wrote:
    > Am Samstag, 10. August 2013 21:57:07 UTC+2 schrieb Ben Morrow:
    >> [Please quote properly: that is, put your reply underneath the bit of
    >>
    >> text you are replying to. It's also not helpful to keep replying to
    >>
    >> yourself; instead you should reply to the article you are, um, replying
    >>
    >> to. You appear to be using Google Groups, which has recently started
    >>
    >> inserting extra blank lines whenever it quotes something; if you can't
    >>
    >> find any way of turning this off you need to remove them by hand before
    >>
    >> posting.]
    >>
    >>
    >>
    >> Quoth :
    >>
    >> > Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :

    >>
    >> > >

    >>
    >> > > I am currently testing a string search over line breaks.

    >>
    >> [...]
    >>
    >> > >

    >>
    >> > > This is a code extract:

    >>
    >> > >

    >>
    >> > > foreach $satz (@satz) {

    >>
    >> > > chomp $satz;

    >>
    >> > > if ($satz =~ m/\d(?s)(.*)keine/g) {

    >>
    >> > > $satz =~ s/$&/xxxx/g;

    >>
    >> > > }

    >>
    >> > > print "$satz\n";

    >>
    >> > > }

    >>
    >> > >

    >>
    >> > >

    >>
    >> > >

    >>
    >> > > I would expect the following result for the first three lines:

    >>
    >> > > 'Das ist ein Beispiel mit xxxxx Zahl.'

    >>
    >> > >

    >>
    >> > > With this search string, I get however no match. I have entered the

    >>
    >> >

    >>
    >> > This works as expected, but I don't quite understand what happens

    >>
    >> >

    >>
    >> > undef $/;

    >>

    [...]
    > [Sorry for not replying properly. I hope this is OK now]


    Not really. You are still quoting everything (whether it is relevant or
    not) and you haven't removed the empty lines inserted by google. So we
    have scroll/read through 130 lines on quotes which may or may not be
    relevant. I dare say that not every one of us has the patience.

    Do yourself and us a favour, get a real Newsreader and use one of the
    free news servers (e.g. albasani).


    > I understand what 'undef $/' does but it seems to be a workaround.
    > Basically my goal is:
    >
    > 1) Read a text in an array


    What are the elements of the array? Lines?


    > 2) Iterate through the variables of the array: 'foreach $satz (@satz)'


    So in each iteration of the loop you are looking at one line in
    isolation.


    > 3) Test various search and replace Regex (as a matter of fact I am
    > working through the Regex Cookbook of Jan Goyvaerts & Steven
    > Levithan). In this context, one of several tests concerns the s
    > modifier. I just wonder why it isn't possible to search for an
    > expressions which spread over more than one line if I add this
    > modifier.


    That's what the /s modifier does. But there have to be actually several
    lines in the variable you are looking at for this to work. If the other
    lines are in different variables, how can perl know that you would want
    to match those other variables, too, especially if to tell it
    explicitely to look only at this variable?

    > It works in UltraEdit. It works in a few other tools as well


    That's because UltraEdit and those other tools treat the whole text as
    unit. But your script (not Perl - *your* script) splits it into many
    small units and looks at each of them in isolation. None of these small
    units matches.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Aug 11, 2013
    #10
  11. Guest

    Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :
    >> [Sorry for not replying properly. I hope this is OK now]

    >Nope, the blank lines are still there.

    Sorry to Peter, Ben and all of you. I hope it's fine now
    [...]
    >> 2) Iterate through the variables of the array: 'foreach $satz (@satz)'

    >Why do you want it in an array, rather than a single string?

    Because I may want to do things only with the $satz variables which meet the regex. E.g. send them to another array or whatever. This isn't possible when I read only one big large string.
    , Aug 11, 2013
    #11
  12. On 2013-08-11 15:42, <> wrote:
    > Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :
    >>> [Sorry for not replying properly. I hope this is OK now]

    >>Nope, the blank lines are still there.

    > Sorry to Peter, Ben and all of you. I hope it's fine now


    Not quite, but a lot better, thanks.


    >>> 2) Iterate through the variables of the array: 'foreach $satz (@satz)'

    >>Why do you want it in an array, rather than a single string?

    > Because I may want to do things only with the $satz variables which
    > meet the regex.


    Apparently none of them does.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Aug 11, 2013
    #12
  13. On 8/11/2013 8:42 AM, wrote:
    > Am Samstag, 10. August 2013 11:16:58 UTC+2 schrieb :
    >>> [Sorry for not replying properly. I hope this is OK now]

    >> Nope, the blank lines are still there.

    > Sorry to Peter, Ben and all of you. I hope it's fine now
    > [...]
    >>> 2) Iterate through the variables of the array: 'foreach $satz (@satz)'

    >> Why do you want it in an array, rather than a single string?

    > Because I may want to do things only with the $satz variables which meet the regex.


    E.g. send them to another array or whatever. This isn't possible when I
    read only one big large string.
    >


    The problem is the match may extend over several $satz. If you wanted
    to identify those individual $satz which are part of the match, you
    could do something like this:

    my @satz = <DATA>;
    my $alles = join('', @satz);

    my $match;
    if ( $alles =~ /^.*\d.*Zahl.*?\n/gsmap ) {
    $match = ${^MATCH};
    foreach my $satz (@satz) {
    if ( $match =~ /$satz/ ) {
    #print "sentence is part of match: $satz"
    ...
    }
    }
    }

    --
    Charles DeRykus
    Charles DeRykus, Aug 11, 2013
    #13
  14. On 8/11/2013 11:59 AM, Charles DeRykus wrote:
    > ...
    >
    > my @satz = <DATA>;
    > my $alles = join('', @satz);
    >
    > my $match;
    > if ( $alles =~ /^.*\d.*Zahl.*?\n/gsmap ) {
    > $match = ${^MATCH};
    > foreach my $satz (@satz) {
    > if ( $match =~ /$satz/ ) {
    > #print "sentence is part of match: $satz"
    > ...
    > }
    > }
    > }
    >


    You could omit /p too:

    my $match;
    if ( $alles =~ /^(.*\d.*Zahl.*?\n)/gsma ) {
    $match = $1;
    foreach my $satz (@satz) {
    if ( $match =~ /$satz/ ) {
    #print "sentence is part of match: $satz"
    ...
    }
    }
    }
    Charles DeRykus, Aug 11, 2013
    #14
  15. On 8/11/2013 2:34 PM, Ben Morrow wrote:
    >
    > Quoth Charles DeRykus <>:


    >...


    >>
    >> The problem is the match may extend over several $satz. If you wanted
    >> to identify those individual $satz which are part of the match, you
    >> could do something like this:
    >>
    >> my @satz = <DATA>;
    >> my $alles = join('', @satz);
    >>
    >> my $match;
    >> if ( $alles =~ /^.*\d.*Zahl.*?\n/gsmap ) {
    >> $match = ${^MATCH};
    >> foreach my $satz (@satz) {
    >> if ( $match =~ /$satz/ ) {

    >
    > /^\Q$satz/m
    >
    > Also, this may pick up lines that were not part of the originally-
    > matched text. Given that the match is anchored to a whole line fore-and-
    > aft ($satz will contain a trailing newline) this can only happen with
    > whole duplicated lines, but it may still be a problem.


    And it's been bothering me since posting... here's a messier solution
    to ensure sentences overlap the target begin/end positions:

    use 5.012; # so each will work on arrays
    ....
    my @satz = <DATA>;
    my $alles = join('', @satz);

    my ($b, $e) = (0, 0);
    my @pos = map { $b= $e+1 if $e; $e += (length($_)-1); [$b,$e] } @satz;

    if ( $alles =~ /^(.*\d.*Zahl.*?\n)/gsma ) {
    my($match, $begin, $end) = ($1, $-[0], $+[0]);

    while( my($i,$satz) = each @satz ) {
    next unless ($pos[$i][0] >= $begin and $pos[$i][0] <= $end)
    or ($pos[$i][1] >= $begin and $pos[$i][1] <= $end);

    if ( $match =~ /$satz/ ) {
    print "sentence is part of match: $satz\n\n"
    ...
    }
    }
    }



    >
    > I would rather do this by matching on the whole string and then
    > splitting the result into lines if necessary, though it's still not
    > clear (at least to me) what the OP is trying to achieve here. fmassion,
    > can you explain a little more what you're trying to do? What does the
    > rest of the code surrounding this bit look like?
    >
    >



    --
    Charles DeRykus
    Charles DeRykus, Aug 12, 2013
    #15
  16. On 8/11/2013 5:42 PM, Ben Morrow wrote:
    >
    >> ...
    >> use 5.012; # so each will work on arrays
    >> ...
    >> my @satz = <DATA>;
    >> my $alles = join('', @satz);
    >>
    >> my ($b, $e) = (0, 0);
    >> my @pos = map { $b= $e+1 if $e; $e += (length($_)-1); [$b,$e] } @satz;
    >>
    >> if ( $alles =~ /^(.*\d.*Zahl.*?\n)/gsma ) {

    >
    > That .* will match across newlines, so the ^ (and the /m) does nothing.
    >
    >> my($match, $begin, $end) = ($1, $-[0], $+[0]);
    >>
    >> while( my($i,$satz) = each @satz ) {
    >> next unless ($pos[$i][0] >= $begin and $pos[$i][0] <= $end)
    >> or ($pos[$i][1] >= $begin and $pos[$i][1] <= $end);
    >>
    >> if ( $match =~ /$satz/ ) {

    >
    > You're still not quoting $satz. It's really important to quote user data
    > before interpolating it into a pattern. You're also not anchoring the
    > match at the beginning, so a line will match if it only ends with $satz.
    >


    Thanks, I'm missed that... very important to be there.

    >> print "sentence is part of match: $satz\n\n"
    >> ...
    >> }
    >> }
    >> }

    >
    > But this is still a great deal more complicated than
    >
    > my $alles = slurp \*DATA;
    >
    > while (my ($match) =
    > $alles =~ /([^\n]* \d .* Zahl [^\n]*)/gsx
    > # or perhaps /(.* \d (?s:.)* Zahl .*)/gx
    > # or /(\N* \d .* Zahl \N*)/gsx if you've got 5.12
    > ) {
    > for my $satz (split /\n/, $match) {
    > # make that /(?<=\n)/ if you don't want to chomp
    > print "sentence is part of match: $satz\n\n";
    > }
    > }
    >


    Yes, I agree that's conciser and clearer to many.
    But, it'll loop endlessly :)

    I think you probably meant:

    while ( $alles =~ /([^\n]* \d .* Zahl [^\n]*)/gsx;
    my $match = $1;
    ...
    }

    --
    Charles DeRykus
    Charles DeRykus, Aug 12, 2013
    #16
  17. On 8/11/2013 6:58 PM, Charles DeRykus wrote:
    > ...
    > I think you probably meant:
    >
    > while ( $alles =~ /([^\n]* \d .* Zahl [^\n]*)/gsx;

    ^^^^

    And of course that should be: "/gsx ) {" rather than: ";"
    --
    Charles DeRykus
    Charles DeRykus, Aug 12, 2013
    #17
  18. Guest

    Thanks to all of you for your efforts and ideas. Let me summarize the lessons I've learned in this discussion.
    The task was: Import a text, apply a regex which extends over a linebreak and display/modify the lines matching the expression.
    The original approach failed because the text was not read in one string, but split into lines in an array.
    I then wanted to be able to print each individual line of the array and to use ^ and $ in line-based regular expression.
    I have tried all the suggested code. Not everything has worked. This is my current code with which I manage to get the matched lines and the entire text:

    use utf8; # damit lassen sich UTF8 Dateien bearbeiten
    binmode STDIN, ":utf8"; # input
    binmode STDOUT, ":utf8"; # output

    #undef $/; # is not required as <DATA> read into array and then joined
    open(DATA,'D:\temp\a.txt') || die("Datei kann nicht geöffnet werden!\n");
    seek(DATA, 3, 0);
    my @satz = <DATA>;
    my $alles = join('', @satz);
    my $match;
    if ( $alles =~ /^.*\d.*Zahl.*?\n/gsma ) {
    $match = ${^MATCH}; # I don't understand what is this ${^MATCH}
    # $match = $1; # doesn't work
    print "$match<<\n"; # prints only the match
    foreach my $satz (@satz) {
    # if ( $match =~ /$satz/ ) { # if activated prints nothing
    print "sentence is part of match: $satz\n"; # prints the entire text
    # }
    }
    }
    , Aug 13, 2013
    #18
  19. writes:

    [...]

    > use utf8; # damit lassen sich UTF8 Dateien bearbeiten


    This is only needed if your source code contains UTF-*.

    > binmode STDIN, ":utf8"; # input
    > binmode STDOUT, ":utf8"; # output
    >
    > #undef $/; # is not required as <DATA> read into array and then joined


    Except in 'short files' (as here), it is usually better to use

    local $/;

    instead. This creates a new binding for $/ while preserving the old
    one which will be restored after the containing block.

    > open(DATA,'D:\temp\a.txt') || die("Datei kann nicht geöffnet werden!\n");


    This "it didn't work" style of error reporting is a bit useless. The
    message should also contain the system error code/ message.

    > seek(DATA, 3, 0);
    > my @satz = <DATA>;
    > my $alles = join('', @satz);
    > my $match;
    > if ( $alles =~ /^.*\d.*Zahl.*?\n/gsma ) {
    > $match = ${^MATCH}; # I don't understand what is this ${^MATCH}


    As 'perldoc perlvar' could have told you: The text which matched the
    regex. At least for the perl version I'm using (5.10.1), the
    documentation also says the /p match modifier is needed in order to
    use this builtin variable.

    > # $match = $1; # doesn't work


    Since the regex isn't capturing anyhing, that is to be expected.

    --
    To anybody who feels inclined to flame me for
    'attention seeking by gratuitious e-mail address changes':
    Chances are I don't care for you attention but as people
    change employers, their e-mail addresses also change.
    Rainer Weikusat, Aug 13, 2013
    #19
  20. On 8/13/2013 12:05 AM, wrote:
    > Thanks to all of you for your efforts and ideas. Let me summarize the lessons I've learned in this discussion.
    > The task was: Import a text, apply a regex which extends over a linebreak and display/modify the lines matching the expression.
    > The original approach failed because the text was not read in one string, but split into lines in an array.
    > I then wanted to be able to print each individual line of the array and to use ^ and $ in line-based regular expression.
    > I have tried all the suggested code. Not everything has worked. This is my current code with which I manage to get the matched lines and the entire text:
    >
    > use utf8; # damit lassen sich UTF8 Dateien bearbeiten
    > binmode STDIN, ":utf8"; # input
    > binmode STDOUT, ":utf8"; # output
    >
    > #undef $/; # is not required as <DATA> read into array and then joined
    > open(DATA,'D:\temp\a.txt') || die("Datei kann nicht geöffnet werden!\n");
    > seek(DATA, 3, 0);
    > my @satz = <DATA>;
    > my $alles = join('', @satz);
    > my $match;
    > if ( $alles =~ /^.*\d.*Zahl.*?\n/gsma ) {
    > $match = ${^MATCH}; # I don't understand what is this ${^MATCH}
    > # $match = $1; # doesn't work
    > print "$match<<\n"; # prints only the match
    > foreach my $satz (@satz) {
    > # if ( $match =~ /$satz/ ) { # if activated prints nothing
    > print "sentence is part of match: $satz\n"; # prints the entire text
    > # }
    > }
    > }
    >


    The $^{MATCH} is only valid with /p and was not needed. I'm not certain
    it's at all relevant to what you're doing now either.

    I think Ben's suggestion is the most promising if you want to identify
    the sentences over which the match extends:

    while (my ($match) =
    $alles =~ /([^\n]* \d .* Zahl [^\n]*)/gsx
    # or perhaps /(.* \d (?s:.)* Zahl .*)/gx
    # or /(\N* \d .* Zahl \N*)/gsx if you've got 5.12
    ) {
    for my $satz (split /\n/, $match) {
    # make that /(?<=\n)/ if you don't want to chomp
    print "sentence is part of match: $satz\n\n";
    }
    }

    --
    Charles DeRykus
    Charles DeRykus, Aug 13, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bill Green

    &nbsp; in a ListItem doesn't seem to work

    Bill Green, Feb 6, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    420
    vMike
    Feb 7, 2004
  2. Garrett
    Replies:
    4
    Views:
    4,802
    Garrett
    Dec 2, 2004
  3. tshad
    Replies:
    3
    Views:
    514
    MWells
    Jan 26, 2005
  4. Wiseguy
    Replies:
    2
    Views:
    330
    Jonathan Turkanis
    Jan 18, 2004
  5. Jerry Sievers
    Replies:
    2
    Views:
    507
    Jerry Sievers
    Nov 21, 2004
Loading...

Share This Page