Simple Regex Doubt

D

Donato Azevedo

Hi everyone,

I've got a simple question to which Ive, to this point, not been able
to solve:

I have these regexes which I want to convert into a single one:

if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
Doc2(?:=rev)?:(?<document2>.*?)\r\n
Item:(?<item>.*?)\r\n
Data\s+doc1:(?<data1>.*?)\r\n
Data\s+doc2:(?<data2>.*?)\r\n
Obs:(?<observation>.*?)\r\n
Critic:(?<criticality>.*?)\r\n
Comments:(?<comments>.*)
/isx ||
$raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
Doc2(?:=rev)?:(?<document2>.*?)\r\n
Item:(?<item>.*?)\r\n
Data\s+doc1:(?<data1>.*?)\r\n
Data\s+doc2:(?<data2>.*?)\r\n
Obs:(?<observation>.*?)\r\n
Critic:(?<criticality>.*)
/isx ) {

this is to match text that can either end in:

Critic:foobartext

or

Critic:foo
Comments:bar

The problem seems to be the greediness of the last captures: I tried
doing

Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?

and

Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?

but I must be missing something... It must be something quite simple
I'd say.

Well, any ideas?
 
C

C.DeRykus

Hi everyone,

I've got a simple question to which Ive, to this point, not been able
to solve:

I have these regexes which I want to convert into a single one:

        if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
                                          Doc2(?:=rev)?:(?<document2>.*?)\r\n
                                          Item:(?<item>.*?)\r\n
                                          Data\s+doc1:(?<data1>.*?)\r\n
                                          Data\s+doc2:(?<data2>.*?)\r\n
                                          Obs:(?<observation>.*?)\r\n
                                          Critic:(?<criticality>.*?)\r\n
                                          Comments:(?<comments>.*)
                                        /isx ||
        $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
                                         Doc2(?:=rev)?:(?<document2>.*?)\r\n
                                         Item:(?<item>.*?)\r\n
                                         Data\s+doc1:(?<data1>.*?)\r\n
                                         Data\s+doc2:(?<data2>.*?)\r\n
                                         Obs:(?<observation>.*?)\r\n
                                         Critic:(?<criticality>.*)
                                        /isx ) {

this is to match text that can either end in:

Critic:foobartext

or

Critic:foo
Comments:bar

The problem seems to be the greediness of the last captures: I tried
doing

Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?

and

Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?

but I must be missing something... It must be something quite simple
I'd say.

Well, any ideas?


You might want to post a simple, minimal example to
demo what is/isn't working. The following worked
for me:

$_ = <<'END';
one line
another line
Critic: foobartext
Comments: bunches of comments
END
my $regex = qr /.*? Critic: (?<criticality>.*?)\n
(?:Comments: (?<comments>.*))?
/isx;
if ( /$regex/ ) {
print "criticality: $+{criticality}", "\n",
"comments: $+{comments}"
}
 
S

sln

Hi everyone,

I've got a simple question to which Ive, to this point, not been able
to solve:

I have these regexes which I want to convert into a single one:

if ( $raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
Doc2(?:=rev)?:(?<document2>.*?)\r\n
Item:(?<item>.*?)\r\n
Data\s+doc1:(?<data1>.*?)\r\n
Data\s+doc2:(?<data2>.*?)\r\n
Obs:(?<observation>.*?)\r\n
Critic:(?<criticality>.*?)\r\n
Comments:(?<comments>.*)
/isx ||
$raw_content =~ /Doc1(?:=rev)?:(?<document1>.*?)\r\n
Doc2(?:=rev)?:(?<document2>.*?)\r\n
Item:(?<item>.*?)\r\n
Data\s+doc1:(?<data1>.*?)\r\n
Data\s+doc2:(?<data2>.*?)\r\n
Obs:(?<observation>.*?)\r\n
Critic:(?<criticality>.*)
/isx ) {

this is to match text that can either end in:

Critic:foobartext

or

Critic:foo
Comments:bar

The problem seems to be the greediness of the last captures: I tried
doing

Critic:(?<criticality>.*?)(\r\nComments:(?<comments>.*))?

and

Critic:(?<criticality>.*)(\r\nComments:(?<comments>.*))?

but I must be missing something... It must be something quite simple
I'd say.

Well, any ideas?

Wow, looks complicated, but isin't. Yes, as DeRykus says,
you need a quantifier '?' (0 or 1) around a non capture grouping
of --> Critic:(?<criticality>.*) in the first regex.

This will at least assign $+{criticality} a '' if there is no 'Critic:'
data (.*)and will assign (just like the $n vars I think) undef if there is no 'Critic:'

I haven't checked 5.10 much but, there may not even exist $+{criticality} if '?'
for the group is 0. Regex satisfied, but who knows how %+ hash is reset.
Probably exists, but set to undef, like its unamed capture counterpart.

Btw, whats this bizz: /(.*?)\r\n/s ??

-sln
 
S

sln

Wow, looks complicated, but isin't. Yes, as DeRykus says,
you need a quantifier '?' (0 or 1) around a non capture grouping
of --> Critic:(?<criticality>.*) in the first regex.

This will at least assign $+{criticality} a '' if there is no 'Critic:'
data (.*)and will assign (just like the $n vars I think) undef if there is no 'Critic:'

I haven't checked 5.10 much but, there may not even exist $+{criticality} if '?'
for the group is 0. Regex satisfied, but who knows how %+ hash is reset.
Probably exists, but set to undef, like its unamed capture counterpart.

Btw, whats this bizz: /(.*?)\r\n/s ??

-sln

^^
Oh, I'm sorry, s/comments/criticality/g it the above reply-post.

-sln
 
S

sln

^^
Oh, I'm sorry, s/comments/criticality/g it the above reply-post.

-sln

Warning!! ignore that man behind the curtain..
The saga continues, s/criticality/comments/g
Dyslexia is a terrible thing to waste.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top