filemask to regex

G

George Mpouras

I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
$mask=~s|\.|\\.|g; # . -> \.
$mask=~s|\?|.|g; # ? -> .
$mask=~s|\*|.*?|g; # * -> .*?
$mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
$mask = qr/^$mask$/i;
 
R

Rainer Weikusat

George Mpouras said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
[...]

$mask=~s|\*|.*?|g; # * -> .*?

This sequence of conversion is wrong because it will translate *.* to
..*?, ie, something which matches a string with not . in it.
 
R

Rainer Weikusat

George Mpouras said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
$mask=~s|\.|\\.|g; # . -> \.
$mask=~s|\?|.|g; # ? -> .
$mask=~s|\*|.*?|g; # * -> .*?
$mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
$mask = qr/^$mask$/i;

I think I would again prefer to do a part-by-part lexical analysis of
the input, mainly because this means that quotemeta can be used to
quote metacharacters in the 'text' parts:

------------------
sub xlate_tin_pattern
{
my $out;

for ($_[0]) {
/\G(\?+)/gc && do {
$out .= '.' x length($1);
redo;
};

/\G\*+/gc && do {
$out .= '.*?';
redo;
};

/\G([^?*]+)/g && do {
$out .= quotemeta($1);
redo;
};
}

return $out;
}

print(xlate_tin_pattern($_), "\n") for @ARGV;
 
G

George Mpouras

Στις 25/8/2013 7:52 μμ, ο/η Rainer Weikusat έγÏαψε:
George Mpouras said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
$mask=~s|\.|\\.|g; # . -> \.
$mask=~s|\?|.|g; # ? -> .
$mask=~s|\*|.*?|g; # * -> .*?
$mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
$mask = qr/^$mask$/i;

I think I would again prefer to do a part-by-part lexical analysis of
the input, mainly because this means that quotemeta can be used to
quote metacharacters in the 'text' parts:

------------------
sub xlate_tin_pattern
{
my $out;

for ($_[0]) {
/\G(\?+)/gc && do {
$out .= '.' x length($1);
redo;
};

/\G\*+/gc && do {
$out .= '.*?';
redo;
};

/\G([^?*]+)/g && do {
$out .= quotemeta($1);
redo;
};
}

return $out;
}

print(xlate_tin_pattern($_), "\n") for @ARGV;


very good !
but the line
/\G(\?+)/gc && do { $out .= '.' x length($1); redo };
it fries my brain.
So I think I stick with the equivelant f1()











print xlate_tin_pattern('@s??im..pl%e.???a'), "\n";
print f1('@s??im..pl%e.???a'), "\n";


sub xlate_tin_pattern
{
my $out;
for ($_[0]){
/\G(\?+)/gc && do { $out .= '.' x length($1); redo };
/\G\*+/gc && do { $out .= '.*?'; redo };
/\G([^?*]+)/g && do { $out .= quotemeta($1); redo }}
$out
}


sub f1
{
$out=$_[0];
$out=~s/([^?*]+)/\Q$1\E/g;
$out=~s|\?|.|g;
$out=~s|\*+|.*?|g;
$out
}
 
B

Ben Bacarisse

Rainer Weikusat said:
George Mpouras said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
[...]

$mask=~s|\*|.*?|g; # * -> .*?

This sequence of conversion is wrong because it will translate *.* to
.*?, ie, something which matches a string with not . in it.

I think that may be deliberate. I was going to ask "what OS>", but when
I saw that, I remembered that in MS-DOS (and maybe others), *.* means
all files. Similarly X*.* means all file beginning with X. (The reason
being that the . is not in the file name, just in the presentation of
it, though I still think that's a weak argument.)

I'm not saying the translation is correct -- I can't remember all of
MS-DOS's rules, and it's likely to be wrong of the target is not an
MS-DOS-like OS.
 
R

Rainer Weikusat

Ben Bacarisse said:
Rainer Weikusat said:
George Mpouras said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
[...]

$mask=~s|\*|.*?|g; # * -> .*?

This sequence of conversion is wrong because it will translate *.* to
.*?, ie, something which matches a string with not . in it.

I think that may be deliberate. I was going to ask "what OS>", but when
I saw that, I remembered that in MS-DOS (and maybe others), *.* means
all files. Similarly X*.* means all file beginning with X. (The reason
being that the . is not in the file name, just in the presentation of
it, though I still think that's a weak argument.)

'DOS filenames'' (and very likely VMS filenames as well) are not plain
strings but consist of two components, a 'name' part and a 'type'
part, and because of this, *.* means 'all names and all types', ie
'every file'. If the input these patterns are supposed to be matched
against is really a list of 'DOS filenames', translating *.* to .*\..*
(or .+\..+) instead of .* (or .+) will make no difference because the
extension is always going to be there. But when it was just a list of
strings, making '*.*' match both abc and abc.def is IMHO
counterintuitive. It also precludes some possibly useful applications
such as 'match everything which has an extension'.
 
G

George Mpouras

Στις 26/8/2013 1:50 πμ, ο/η Ben Bacarisse έγÏαψε:
I'm not saying the translation is correct -- I can't remember all of
MS-DOS's rules, and it's likely to be wrong of the target is not an
MS-DOS-like OS.

yes you are corrrect it was intented, at windows the *.* means * !
but the Rainer aproach at his other answer is very clever and correct
 
G

George Mpouras

if we forget the windows at bash there is also the interesting range
operator !

ls -l somefile{01,02,03,07}
ls -l somefile{01..05}
 
B

Ben Bacarisse

Rainer Weikusat said:
Ben Bacarisse said:
Rainer Weikusat said:
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *

[...]

$mask=~s|\*|.*?|g; # * -> .*?

This sequence of conversion is wrong because it will translate *.* to
.*?, ie, something which matches a string with not . in it.

I think that may be deliberate. I was going to ask "what OS>", but when
I saw that, I remembered that in MS-DOS (and maybe others), *.* means
all files. Similarly X*.* means all file beginning with X. (The reason
being that the . is not in the file name, just in the presentation of
it, though I still think that's a weak argument.)

'DOS filenames'' (and very likely VMS filenames as well) are not plain
strings but consist of two components, a 'name' part and a 'type'
part, and because of this, *.* means 'all names and all types', ie
'every file'. If the input these patterns are supposed to be matched
against is really a list of 'DOS filenames', translating *.* to .*\..*
(or .+\..+) instead of .* (or .+) will make no difference because the
extension is always going to be there.

I don't follow. If I get a list of DOS file names using, say, DIR,
those with no extension have no dot. .*\.\* won't match them but .*
will. You can write a file with no extension as "XYZ." as well as "XYZ"
but, IIRC, many programs dropped the '.' if there was no extension.
But when it was just a list of
strings, making '*.*' match both abc and abc.def is IMHO
counterintuitive. It also precludes some possibly useful applications
such as 'match everything which has an extension'.

I must be missing your point because I don't follow this either. The
DOS way to match names with an extension was to write *.?*, and the
suggested translation will work for that.
 
D

Dr.Ruud

I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
$mask=~s|\.|\\.|g; # . -> \.
$mask=~s|\?|.|g; # ? -> .
$mask=~s|\*|.*?|g; # * -> .*?
$mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
$mask = qr/^$mask$/i;

Also checkout `perldoc -f glob`.
 
R

Rainer Weikusat

[...]
I was under the not entirely correct expression that

[...]

Personal remark which seems appropriate here: Except when tightly
supervised, my mind has a tendency to invert things, as can be seen
here where I thought 'under the impression' and thus - without
noticing that - typed 'under the expression'. It's been a while since
I accidentally implemented an algortihm doing the exact opposite of
what it was supposed to do, but these days, I spend more time thinking
about the code I'm planning to write and less typing away and hashing
stuff out as the need arises which is probably the reason for
that. But it still happens fairly often in 'ordinary text'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top