reverse a glob expansion

T

topher67

Hello,

I need a piece of code that can "unexpand" a glob pattern. For
example, given the following list:

Foo-1-Bar
Foo-2-Bar
Foo-3-Bar

I would like to get back:

Foo-{1,2,3}-Bar

Any help would be greatly appreciated.

Thanks,
-Topher
 
X

xhoster

topher67 said:
Hello,

I need a piece of code that can "unexpand" a glob pattern. For
example, given the following list:

Foo-1-Bar
Foo-2-Bar
Foo-3-Bar

I would like to get back:

Foo-{1,2,3}-Bar

Any help would be greatly appreciated.

This could could be either very easy or very hard.

Are curlies the only specials allowed, and are the things in the curly
always to be exactly one character long, and is there only going to be
exactly one set of curlies per pattern?

Xho
 
T

topher67

This could could be either very easy or very hard.

Are curlies the only specials allowed, and are the things in the curly
always to be exactly one character long, and is there only going to be
exactly one set of curlies per pattern?

Xho

Let's assume the following:
* curlies are the only specials allowed
* the substrings inside the curlies can be of differing lengths
* there may be more than one expanded set in the input list
* we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar )

Here's another example:

FooZZZBar
FooYBar
FooXXBar
Baz11
Baz222
Nop

Becomes:

Foo{ZZZ,Y,XX}Bar
Baz{11,222}
Nop

I realize that this is a hard problem to solve. Any help is greatly
appreciated.
 
T

topher67

I realize that this is a hard problem to solve. Any help is greatly
appreciated.

I think I might be able to make use of this module:

Regexp::List - builds regular expressions out of a list of words
 
X

xhoster

topher67 said:
Let's assume the following:
* curlies are the only specials allowed
* the substrings inside the curlies can be of differing lengths

Ah, that makes it harder than I had hoped...
* there may be more than one expanded set in the input list

Do you mean like "abc{d,e,f}ghi{j,k,l}mn" where you have a cartesian join,
or do you mean like in your example below, where there is more than one
"lines" of pattern but any given one of them has at most one set of
curlies?
* we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar )

Nesting actually probably wouldn't be so bad to implement, at least
compared to Cartesian joins. In fact, the example you give below is just a
special kind of nesting, equivalent to {Foo{ZZZ,Y,XX}Bar,Baz{11,222},Nop}.
A special kind because you can only have two levels, and the outer level
cannot have any fixed characters in before or after--but still it is
nested.
Here's another example:

FooZZZBar
FooYBar
FooXXBar
Baz11
Baz222
Nop

Becomes:

Foo{ZZZ,Y,XX}Bar
Baz{11,222}
Nop

I realize that this is a hard problem to solve. Any help is greatly
appreciated.

There are many possible solutions, and it is not obvious how to assign a
score to each so that we can choose a single best one. Also, once a
scoring system is designed, it maybe computationally expensive to achieve.
So some kind of heuristic is probably needed. In the example you give, the
best matching at the front (Foo) corresponds to the best matching at the
rear (Bar). Is that likely to be a common occurrence in your data, or was
it just a coincident?

Does Regexp::List come up with a regex which matches all of the given words
*and nothing else*? The docs didn't seem to address that issue.

Anyway, if your goal is condense, say, a large directory listing down to a
handful of patterns that human could easily discern, I'm not sure that
something optimized for a regex engine would do a good job. (Although
looking at the techniques used by it could certainly be informative.)

If this is for human consumption, I would have a preference for patterns
in which the curlies occur at natural boundaries, such as transitions
from letter to number or number to letter or punctuation to
non-punctuation, etc.

As someone who frequently looks at very long directory listings of
computer-generated file names, this is something I've often thought about,
but never actually attempted.

Xho
 
T

topher67

While it may not always return a human friendly result, it does seem
to work:

# refactor a glob
sub reglob {
my($pat) = @_;

# glob2list
my @list;
my @glob = bsd_glob($pat, GLOB_NOCHECK | GLOB_BRACE);
if (@glob) {
for my $glob (@glob) {
push @list, $glob;
}
}
else {
push @list, $pat;
}
# list2re
my $rl = Regexp::List->new(lookahead => 0, quotemeta => 0);
my $re = $rl->list2re( @list );
# re2glob
$re =~ s/\(\?-xism:(.*)\)/$1/g;
$re =~ s/\(\?:/(/g;
$re =~ s/^\(// and $re =~ s/\)$//;
$re =~ tr/()|/{},/;

$re;
}

Sample in: aaa{11,22},aaa33

Sample out: aaa{11,22,33}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top