match string by re using some pattern

F

frytaz

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
....
$re =~ s/#one#/\(\.\+\?\)/;
$re =~ s/#two#/\(\.\+\?\)/;
$re =~ s/#three#/\(\.\+\?\)/;

if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;
}

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this :)

Please help
Thanks
 
A

anno4000

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
...
$re =~ s/#one#/\(\.\+\?\)/;
$re =~ s/#two#/\(\.\+\?\)/;
$re =~ s/#three#/\(\.\+\?\)/;

What did $re contain before the replacements?

Why do you have to apply substitutions on the regex instead of
specifying it right away?
if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;
}

Since we don't know what $re contains at this point, there's no way
to assess what it might be doing.
It's all right for first file line when #one# #two# #three# are in
order,

Please explain what "all right" means. I'm sure it's obvious to you,
but it isn't to us.
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this :)

What is "this"?

You have given us (incomplete) code that, you say, doesn't do what
you want. Without an explanation from you, how are we supposed to
guess what you want?

Anno
 
P

Paul Lalli

Hi, I'm trying to match few strings using some pattern
for instance i have in my file

#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

and in perl
...
$re =~ s/#one#/\(\.\+\?\)/;
$re =~ s/#two#/\(\.\+\?\)/;
$re =~ s/#three#/\(\.\+\?\)/;

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.
if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

}

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this :)

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;
}
while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;
}
my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;
}
my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;
}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};


Hope that helps,
Paul Lalli
 
A

anno4000

Paul Lalli said:
No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.


You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

Ah, you decoded the mystery question. Congrats.
One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'.

Once you have the hash, you can even set the variables. Nothing wrong
with that.
Here's a short-but-complete example:

[snip]

Here is my take. I'm assuming that the delimiter "#" can't appear in
the text:

while ( <DATA> ) {
print;
my %h = /#(one|two|three)#([^#\n]*)/g;
my ( $one, $two, $three) = @h{ qw( one two three)};
# do things with $one, $two, $three
print "one: '$one', two: '$two', three: '$three'\n";
}

__DATA__
#one# some text #two# more t*xt #three#
#two# text #one# some text #three#

Anno
 
F

frytaz

Hi, I'm trying to match few strings using some pattern
for instance i have in my file
#one# some text #two# more t*xt #three#
#two# text #one# some text #three#
and in perl
...
$re =~ s/#one#/\(\.\+\?\)/;
$re =~ s/#two#/\(\.\+\?\)/;
$re =~ s/#three#/\(\.\+\?\)/;

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.


if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this :)

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;}

while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;}

my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;}

my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;

}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};

Hope that helps,
Paul Lalli

Thank You very much Paul Lalli
 
F

frytaz

Hi, I'm trying to match few strings using some pattern
for instance i have in my file
#one# some text #two# more t*xt #three#
#two# text #one# some text #three#
and in perl
...
$re =~ s/#one#/\(\.\+\?\)/;
$re =~ s/#two#/\(\.\+\?\)/;
$re =~ s/#three#/\(\.\+\?\)/;

No need for all those backslashes. The replacement part of a s/// is
just a string, and none of those characters are special in a string.


if ($line =~ m/^$re/) {
$one = $1;
$two = $2;
$three = $3;

It's all right for first file line when #one# #two# #three# are in
order,
How i can check which word was it in line before replacing it by (.
+?) ??
Or maybe there is some different way to do this :)

You're basically trying to use a variable as a variable name, albeit
in a different way than most people attempt to do so. Regardless, the
advice in `perldoc -q "variable name"` still holds. Please read it.

One solution would be to save the ordering of "one", "two", "three" as
you're replacing them, and then use that ordering in your
replacement. Rather than setting the variables $one, $two, $three,
instead create a hash that has corresponding keys 'one', 'two', and
'three'. Here's a short-but-complete example:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my $re = '#one# some text #two# more t*xt #three#';
my $re2 = '#two# text #one# some text #three#';

my (@pos, @pos2);
while ($re =~ s/#(one|two|three)#/(.+?)/) {
push @pos, $1;}

while ($re2 =~ s/#(one|two|three)#/(.+?)/) {
push @pos2, $1;}

my (%h1, %h2);
my $line = "foo some text bar more tttxt baz";
if ($line =~ m/^$re$/) {
$h1{$pos[0]} = $1;
$h1{$pos[1]} = $2;
$h1{$pos[2]} = $3;}

my $line2 = "foo text bar some text baz";
if ($line2 =~ m/^$re2$/) {
$h2{$pos2[0]} = $1;
$h2{$pos2[1]} = $2;
$h2{$pos2[2]} = $3;

}

print Dumper(\%h1, \%h2);
__END__

$VAR1 = {
'three' => 'baz',
'one' => 'foo',
'two' => 'bar'
};
$VAR2 = {
'three' => 'baz',
'one' => 'bar',
'two' => 'foo'
};

Hope that helps,
Paul Lalli

How I could recognize those patterns #one# #two# #three#

Can i do it like in c#, for instance

$match =~ s/#one#/(?<one>.*?)/;
$match =~ s/#two#/(?<two>.*?)/;
$match =~ s/#three#/(?<three>.*?)/;

then

$one = ${one};
$two = ${two};
$three = ${three};

But that doesn't work...

Thanks
 
P

Paul Lalli

How I could recognize those patterns #one# #two# #three#

Can i do it like in c#, for instance

$match =~ s/#one#/(?<one>.*?)/;
$match =~ s/#two#/(?<two>.*?)/;
$match =~ s/#three#/(?<three>.*?)/;

then

$one = ${one};
$two = ${two};
$three = ${three};

But that doesn't work...

Not until Perl 5.10, no it doesn't.

We've already given you an implementable solution. What issue are you
still having with it?

Paul Lalli
 
F

frytaz

Not until Perl 5.10, no it doesn't.

We've already given you an implementable solution. What issue are you
still having with it?

Paul Lalli

There is an issue when, theres two lines in file

1 some #one# test #two# text #three#
2 some #two# test #one# text #three#

then after replace by s/#(one|two|three)#/(.+?)/

1 some (.+?) test (.+?) text (.+?)
2 some (.+?) text (.+?) text (.+?)

then if I'm trying to match it with

some O!N!E test T!W!O text T!H!R!E!E

script will work fine only for 1st file line
#one# should match O!N!E for instance
 
A

anno4000

There is an issue when, theres two lines in file

1 some #one# test #two# text #three#
2 some #two# test #one# text #three#

Okay, these are data from a file.
then after replace by s/#(one|two|three)#/(.+?)/

1 some (.+?) test (.+?) text (.+?)
2 some (.+?) text (.+?) text (.+?)

Now you generate what looks like regex patterns out of the first
two lines from your file.
then if I'm trying to match it with

some O!N!E test T!W!O text T!H!R!E!E

Where is this from? Is that another line from the file?
script will work fine only for 1st file line
#one# should match O!N!E for instance

It becomes increasingly unclear how you would distinguish normal
text from markers such as "#one#" or "O!N!E". How is the program
supposed to identify them?

That said, the first of your patterns at least makes an honorable
attempt at matching and even capturing the mutated markers:

my $str = 'some O!N!E test T!W!O text T!H!R!E!E';
$str =~ /some (.+?) test (.+?) text (.+?)/ or die "no match\n";
print "$_\n" for $1, $2, $3;

That prints

O!N!E
T!W!O
T

The last capture is incomplete. That may take some fiddling to correct.

I am still at a loss guessing what you are really up to.

Anno
 
F

frytaz

Okay, these are data from a file.



Now you generate what looks like regex patterns out of the first
two lines from your file.



Where is this from? Is that another line from the file?


It becomes increasingly unclear how you would distinguish normal
text from markers such as "#one#" or "O!N!E". How is the program
supposed to identify them?

That said, the first of your patterns at least makes an honorable
attempt at matching and even capturing the mutated markers:

my $str = 'some O!N!E test T!W!O text T!H!R!E!E';
$str =~ /some (.+?) test (.+?) text (.+?)/ or die "no match\n";
print "$_\n" for $1, $2, $3;

That prints

O!N!E
T!W!O
T

The last capture is incomplete. That may take some fiddling to correct.

I am still at a loss guessing what you are really up to.

Anno

OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)
 
M

Martien verbruggen

On Thu, 26 Jul 2007 02:21:42 -0000,

[huge snip]
[snip]
I am still at a loss guessing what you are really up to.
OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

You should really consider using real code. Something that compiles and
runs, under strict and warnings.

The above doesn't.

Allowing that maybe $regex_patern and $regex_pattern are suppose dto be
the same, what is $pattern doing in there? It isn't a pattern as far as
Perl is concerned, and it's actually not used anywhere.
and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)

It's still completely unlcear to me what you want, and what $pattern has
to do with it. The only thing i can partly parse of the above is that
you're trying to get the three fields title, price and section from a
string, and that they could be in any of the 'slots' in $line. One way
that would work is something like:

#!/usr/bin/perl
use strict;
use warnings;

my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
"price 3 - title BARF BANANA - section RANDOM",
);

my $kv_pattern = qr/(title|price|section) (.+?)/;

for my $line (@lines)
{
print "$line\n";
my %kv = $line =~ /$kv_pattern - $kv_pattern - $kv_pattern/;
while (my ($key, $value) = each %kv)
{
print "\t$key -> $value\n";
}
}

As you can see, after a successful match, the has %kv contains the keys
and values as stored in the line you put in. Simply access them as
$kv{section}, $kv{price} and $kv{title}. This still requires a fairly
rigorous input, but allows the order to be different, which SEEMS to be
what you're asking, although I'm not 100% certain.

If you need to support more or different fields, simply change
$kv_pattern.

Martien
 
G

Gunnar Hjalmarsson

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
$pattern = "section #SECTION# * title #TITLE# - price #PRICE#";
$regex_patern = "section (.+?) . title (.+?) - price (.+?)";
if ($line =~ m/^$regex_pattern/) {
$section = $1;
#$section -> BOOKS
$title = $2;
#$title -> SOME_BOOK_TITLE
$price = $3;
#$price -> 20
}

and its all fine because its in order section,title,price

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";
$pattern = "title #TITLE# - price #PRICE# - section #SECTION#";
$regex_pattern = "title (.+?) - price (.+?) - section (.+?)";

if ($line =~ m/^$regex_pattern/) {
#now its wrong
$section = $1;
#$section -> CD_TITLE
$title = $2;
#$title -> 50
$price = $3;
#$price -> MUSIC
}

in this example, need to put different order of section,title,price

I want to make script know pattern when replacing #TITLE# make it a
title match pattern, but its in all cases (.+?)

my @lines = (
'section BOOKS - title SOME_BOOK_TITLE - price 20',
'title CD_TITLE - price 50 - section MUSIC',
);
my @items;

foreach ( @lines ) {
my $item;
( $item->{section} ) = /section (.+?)(?: -|$)/;
( $item->{title} ) = /title (.+?)(?: -|$)/;
( $item->{price} ) = /price (.+?)(?: -|$)/;
push @items, $item;
}

foreach my $item ( @items ) {
print "Section: $item->{section}\n";
print "Title: $item->{title}\n";
print "Price: $item->{price}\n\n";
}
 
G

Gunnar Hjalmarsson

Martien said:
my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
"price 3 - title BARF BANANA - section RANDOM",
);

my $kv_pattern = qr/(title|price|section) (.+?)/;

for my $line (@lines)
{
print "$line\n";
my %kv = $line =~ /$kv_pattern - $kv_pattern - $kv_pattern/;
while (my ($key, $value) = each %kv)
{
print "\t$key -> $value\n";
}
}

Did you run it?
 
A

anno4000

On Jul 25, 11:26 pm, (e-mail address removed)-berlin.de wrote:
[...]
I am still at a loss guessing what you are really up to.

Anno

OK, I'll try to explain it

for instance we parse http web page

$line = "section BOOKS - title SOME_BOOK_TITLE - price 20";
[...]

now we try to parse other page where

$line = "title CD_TITLE - price 50 - section MUSIC";

The technique has been shown to you repeatedly. Here it is
again, adapted to your newest variant of the problem:

my @lines = (
"section BOOKS - title SOME_BOOK_TITLE - price 20",
"title CD_TITLE - price 50 - section MUSIC",
);

my $mark = qr/section|title|price|$/;

for ( @lines ) {
print "$_\n";
my %h = /($mark)(.+?)(?=$mark)/g;
my ( $section, $title, $price) = @h{ qw( section title price)};
print "section: $section, title: $title, price: $price\n";
}

Anno
 
F

frytaz

OK, I'll try to explain it
for instance we parse http web page
$line = "section BOOKS - title SOME_BOOK_TITLE - price 20"; [snip]

now we try to parse other page where
$line = "title CD_TITLE - price 50 - section MUSIC";
[snip]

in this example, need to put different order of section,title,price

I'm a little lost trying to figure out what you want to do also, but I'm
going to guess that you want to extract the title, price, and section from
each line, no matter what order they are in.

Continuing on with what Anno showed you earlier, how about something like this:

---------------------------------------------------------------
use strict;
use warnings;
#use Data::Dumper;

my @items;
while ( <DATA> ) {
my %hash = /(section|title|price)\s+?(.+?)\s*?[-\n]/g;
push(@items,\%hash) if keys(%hash);}

#print Dumper(\@items);

#Now you have a list of items, each of which is a hash ref
#that you can access to grab the info:

foreach (@items) {
my ($title, $price, $section) = @{$_}{'title','price','section'};
print
"Title: '$title'\n".
"Price: '$price'\n".
"Section: '$section'\n\n";

}

__DATA__
section BOOKS - title Green Eggs and Ham - price 6.95
price 6.95 - section BOOKS - title The Cat In The Hat

I'm up to create script which will listen to irc channel messages, and
then log three values Title, Price, Section but only msg which will
match defined regex like

$msg = 'Some MyTitleToLog price blah text MyPriceToLog with some text
MySectionToLog-';
$pattern = 'Some #TITLE# price blah text #PRICE# with some text
#SECTION#-';
then after match ill have it in $title $price $section or in array,
#$title -> MyTitleToLog
#$price -> MyPriceToLog
#section -> MySectionToLog

Patterns are stored in local file which I'm reading first in script,
and when $msg changes and mixup Title,Price,Section it will try other
pattern to match looking by irc nick
 
M

Martien verbruggen

Did you run it?

I did, but I didn't study the output well enough

One line change should fix the problem you spotted.

my %kv = $line =~ /^$kv_pattern - $kv_pattern - $kv_pattern$/;

Sorry,
Martien
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top