match string by re using some pattern

frytaz · Jul 25, 2007

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
....
$re =~ s/#one#/$\.\+\?$/;
$re =~ s/#two#/$\.\+\?$/;
$re =~ s/#three#/$\.\+\?$/;

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;
}

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this

Please help
Thanks

anno4000 · Jul 25, 2007

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
...
$re =~ s/#one#/$\.\+\?$/;
$re =~ s/#two#/$\.\+\?$/;
$re =~ s/#three#/$\.\+\?$/;

What did $re contain before the replacements?

Why do you have to apply substitutions on the regex instead of
specifying it right away?

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;
}

Since we don't know what $re contains at this point, there's no way
to assess what it might be doing.

It's all right for first file line when #one# #two# #three# are in
order,

Please explain what "all right" means. I'm sure it's obvious to you,
but it isn't to us.

How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this

What is "this"?

You have given us (incomplete) code that, you say, doesn't do what
you want. Without an explanation from you, how are we supposed to
guess what you want?

Anno

Paul Lalli · Jul 25, 2007

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
...
$re =~ s/#one#/$\.\+\?$/;
$re =~ s/#two#/$\.\+\?$/;
$re =~ s/#three#/$\.\+\?$/;

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

}

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data:

umper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;
}
while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;
}
my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;
}
my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;
}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};

Hope that helps,
Paul Lalli

anno4000 · Jul 25, 2007

Paul Lalli said:
No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

Ah, you decoded the mystery question. Congrats.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'.

Once you have the hash, you can even set the variables. Nothing wrong
with that.

Here's a short-but-complete example:

[snip]

Here is my take. I'm assuming that the delimiter "#" can't appear in
the text:

while ( <DATA> ) {
print;
my %h = /#(one|two|three)#([^#\n]*)/g;
my ( $one, $two, $three) = @h{ qw( one two three)};
# do things with $one, $two, $three
print "one: '$one', two: '$two', three: '$three'\n";
}

__DATA__
#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

Anno

frytaz · Jul 25, 2007

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

Click to expand...

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

Click to expand...

and in perl
...
$re =~ s/#one#/$\.\+\?$/;
$re =~ s/#two#/$\.\+\?$/;
$re =~ s/#three#/$\.\+\?$/;

Click to expand...

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this

Click to expand...

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data:umper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;}

while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;}

my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;}

my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;

}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};

Hope that helps,
Paul Lalli

Thank You very much Paul Lalli

frytaz · Jul 25, 2007

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

Click to expand...

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

Click to expand...

and in perl
...
$re =~ s/#one#/$\.\+\?$/;
$re =~ s/#two#/$\.\+\?$/;
$re =~ s/#three#/$\.\+\?$/;

Click to expand...

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this

Click to expand...

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data:umper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;}

while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;}

my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;}

my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;

}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};

Hope that helps,
Paul Lalli

How I could recognize those patterns #one# #two# #three#

Can i do it like in c#, for instance

$match =~ s/#one#/(?<one>.*?)/;
$match =~ s/#two#/(?<two>.*?)/;
$match =~ s/#three#/(?<three>.*?)/;

then

$one = ${one};
$two = ${two};
$three = ${three};

But that doesn't work...

Thanks

Paul Lalli · Jul 25, 2007

How I could recognize those patterns #one# #two# #three#

Can i do it like in c#, for instance

$match =~ s/#one#/(?<one>.*?)/;
$match =~ s/#two#/(?<two>.*?)/;
$match =~ s/#three#/(?<three>.*?)/;

then

$one = ${one};
$two = ${two};
$three = ${three};

But that doesn't work...

Not until Perl 5.10, no it doesn't.

We've already given you an implementable solution. What issue are you
still having with it?

Paul Lalli

frytaz · Jul 25, 2007

Not until Perl 5.10, no it doesn't.

We've already given you an implementable solution. What issue are you
still having with it?

Paul Lalli

There is an issue when, theres two lines in file

1 some #one# test #two# text #three#
2 some #two# test #one# text #three#

then after replace by s/#(one|two|three)#/(.+?)/

1 some (.+?) test (.+?) text (.+?)
2 some (.+?) text (.+?) text (.+?)

then if I'm trying to match it with

some O!N!E test T!W!O text T!H!R!E!E

script will work fine only for 1st file line
#one# should match O!N!E for instance

anno4000 · Jul 25, 2007

There is an issue when, theres two lines in file

1 some #one# test #two# text #three#
2 some #two# test #one# text #three#

Okay, these are data from a file.

then after replace by s/#(one|two|three)#/(.+?)/

1 some (.+?) test (.+?) text (.+?)
2 some (.+?) text (.+?) text (.+?)

Now you generate what looks like regex patterns out of the first
two lines from your file.

then if I'm trying to match it with

some O!N!E test T!W!O text T!H!R!E!E

Where is this from? Is that another line from the file?

script will work fine only for 1st file line
#one# should match O!N!E for instance

It becomes increasingly unclear how you would distinguish normal
text from markers such as "#one#" or "O!N!E". How is the program
supposed to identify them?

That said, the first of your patterns at least makes an honorable
attempt at matching and even capturing the mutated markers:

my $str = 'some O!N!E test T!W!O text T!H!R!E!E';
$str =~ /some (.+?) test (.+?) text (.+?)/ or die "no match\n";
print "$_\n" for $1, $2, $3;

That prints

O!N!E
T!W!O
T

The last capture is incomplete. That may take some fiddling to correct.

I am still at a loss guessing what you are really up to.

Anno

frytaz · Jul 26, 2007

Okay, these are data from a file.

Now you generate what looks like regex patterns out of the first
two lines from your file.

Where is this from? Is that another line from the file?

It becomes increasingly unclear how you would distinguish normal
text from markers such as "#one#" or "O!N!E". How is the program
supposed to identify them?

That said, the first of your patterns at least makes an honorable
attempt at matching and even capturing the mutated markers:

my $str = 'some O!N!E test T!W!O text T!H!R!E!E';
$str =~ /some (.+?) test (.+?) text (.+?)/ or die "no match\n";
print "$_\n" for $1, $2, $3;

That prints

O!N!E
T!W!O
T

The last capture is incomplete. That may take some fiddling to correct.

I am still at a loss guessing what you are really up to.

Anno

OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)

Martien verbruggen · Jul 26, 2007

On Thu, 26 Jul 2007 02:21:42 -0000,

[huge snip]

[snip]

I am still at a loss guessing what you are really up to.

Click to expand...

OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

You should really consider using real code. Something that compiles and
runs, under strict and warnings.

The above doesn't.

Allowing that maybe $regex_patern and $regex_pattern are suppose dto be
the same, what is $pattern doing in there? It isn't a pattern as far as
Perl is concerned, and it's actually not used anywhere.

and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)

It's still completely unlcear to me what you want, and what $pattern has
to do with it. The only thing i can partly parse of the above is that
you're trying to get the three fields title, price and section from a
string, and that they could be in any of the 'slots' in $line. One way
that would work is something like:

#!/usr/bin/perl
use strict;
use warnings;

my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
"price 3 - title BARF BANANA - section RANDOM",
);

my $kv_pattern = qr/(title|price|section) (.+?)/;

for my $line (@lines)
{
print "$line\n";
my %kv = $line =~ /$kv_pattern - $kv_pattern - $kv_pattern/;
while (my ($key, $value) = each %kv)
{
print "\t$key -> $value\n";
}
}

As you can see, after a successful match, the has %kv contains the keys
and values as stored in the line you put in. Simply access them as
$kv{section}, $kv{price} and $kv{title}. This still requires a fairly
rigorous input, but allows the order to be different, which SEEMS to be
what you're asking, although I'm not 100% certain.

If you need to support more or different fields, simply change
$kv_pattern.

Martien

Gunnar Hjalmarsson · Jul 26, 2007

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)

my @lines = (
'section BOOKS - title SOME_BOOK_TITLE - price 20',
'title CD_TITLE - price 50 - section MUSIC',
);
my @items;

foreach ( @lines ) {
my $item;
( $item->{section} ) = /section (.+?)(?: -|$)/;
( $item->{title} ) = /title (.+?)(?: -|$)/;
( $item->{price} ) = /price (.+?)(?: -|$)/;
push @items, $item;
}

foreach my $item ( @items ) {
print "Section: $item->{section}\n";
print "Title: $item->{title}\n";
print "Price: $item->{price}\n\n";
}

Gunnar Hjalmarsson · Jul 26, 2007

Martien said:
my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
"price 3 - title BARF BANANA - section RANDOM",
);

my $kv_pattern = qr/(title|price|section) (.+?)/;

for my $line (@lines)
{
print "$line\n";
my %kv = $line =~ /$kv_pattern - $kv_pattern - $kv_pattern/;
while (my ($key, $value) = each %kv)
{
print "\t$key -> $value\n";
}
}

Did you run it?

anno4000 · Jul 26, 2007

On Jul 25, 11:26 pm, (e-mail address removed)-berlin.de wrote:
[...]

I am still at a loss guessing what you are really up to.

Anno

Click to expand...

OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
[...]

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";

The technique has been shown to you repeatedly. Here it is
again, adapted to your newest variant of the problem:

my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
);

my $mark = qr/section|title|price|$/;

for ( @lines ) {
print "$_\n";
my %h = /($mark)(.+?)(?=$mark)/g;
my ( $section, $title, $price) = @h{ qw( section title price)};
print "section: $section, title: $title, price: $price\n";
}

Anno

frytaz · Jul 26, 2007

[email protected] said:
[email protected] said:

OK, I'll try to explain it

Click to expand...

for instance we parse http web page

Click to expand...

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20"; [snip]

now we try to parse other page where

Click to expand...

$line = "title CD_TITLE - price 50 - section MUSIC";
[snip]

in this example, need to put different order of section,title,price

Click to expand...

I'm a little lost trying to figure out what you want to do also, but I'm
going to guess that you want to extract the title, price, and section from
each line, no matter what order they are in.

Continuing on with what Anno showed you earlier, how about something like this:

---------------------------------------------------------------
use strict;
use warnings;
#use Data:umper;

my @items;
while ( <DATA> ) {
my %hash = /(section|title|price)\s+?(.+?)\s*?[-\n]/g;
push(@items,\%hash) if keys(%hash);}

#print Dumper(\@items);

#Now you have a list of items, each of which is a hash ref
#that you can access to grab the info:

foreach (@items) {
my ($title, $price, $section) = @{$_}{'title','price','section'};
print
"Title: '$title'\n".
"Price: '$price'\n".
"Section: '$section'\n\n";

}

__DATA__
section BOOKS - title Green Eggs and Ham - price 6.95
price 6.95 - section BOOKS - title The Cat In The Hat

I'm up to create script which will listen to irc channel messages, and
then log three values Title, Price, Section but only msg which will
match defined regex like

$msg = 'Some MyTitleToLog price blah text MyPriceToLog with some text
MySectionToLog-';
$pattern = 'Some #TITLE# price blah text #PRICE# with some text
#SECTION#-';
then after match ill have it in $title $price $section or in array,
#$title -> MyTitleToLog
#$price -> MyPriceToLog
#section -> MySectionToLog

Patterns are stored in local file which I'm reading first in script,
and when $msg changes and mixup Title,Price,Section it will try other
pattern to match looking by irc nick

Martien verbruggen · Jul 26, 2007

Did you run it?

I did, but I didn't study the output well enough

One line change should fix the problem you spotted.

my %kv = $line =~ /^$kv_pattern - $kv_pattern - $kv_pattern$/;

Sorry,
Martien

Big problem I need to solve with some unix utils	1	Jun 19, 2022
Processing in Python help	0	Aug 31, 2022
pattern match of possibly "dangerous" strings	3	Jun 28, 2007
FAQ 6.12 Can I use Perl regular expressions to match balanced text?	0	Jan 9, 2011
Re for Apache log file format	4	Oct 8, 2013
Quotemeta & Regex question re-posted as plain text	1	Jan 26, 2011
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Re-using copyrighted code	2	Jun 8, 2013

match string by re using some pattern

frytaz

anno4000

Paul Lalli

anno4000

frytaz

frytaz

Paul Lalli

frytaz

anno4000

frytaz

Martien verbruggen

Gunnar Hjalmarsson

Gunnar Hjalmarsson

anno4000

frytaz

Martien verbruggen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads