reverse a glob expansion

topher67 · May 25, 2007

Hello,

I need a piece of code that can "unexpand" a glob pattern. For
example, given the following list:

Foo-1-Bar
Foo-2-Bar
Foo-3-Bar

I would like to get back:

Foo-{1,2,3}-Bar

Any help would be greatly appreciated.

Thanks,
-Topher

xhoster · May 25, 2007

topher67 said:
Hello,

I need a piece of code that can "unexpand" a glob pattern. For
example, given the following list:

Foo-1-Bar
Foo-2-Bar
Foo-3-Bar

I would like to get back:

Foo-{1,2,3}-Bar

Any help would be greatly appreciated.

This could could be either very easy or very hard.

Are curlies the only specials allowed, and are the things in the curly
always to be exactly one character long, and is there only going to be
exactly one set of curlies per pattern?

Xho

topher67 · May 25, 2007

This could could be either very easy or very hard.

Are curlies the only specials allowed, and are the things in the curly
always to be exactly one character long, and is there only going to be
exactly one set of curlies per pattern?

Xho

Let's assume the following:
* curlies are the only specials allowed
* the substrings inside the curlies can be of differing lengths
* there may be more than one expanded set in the input list
* we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar )

Here's another example:

FooZZZBar
FooYBar
FooXXBar
Baz11
Baz222
Nop

Becomes:

Foo{ZZZ,Y,XX}Bar
Baz{11,222}
Nop

I realize that this is a hard problem to solve. Any help is greatly
appreciated.

topher67 · May 25, 2007

I realize that this is a hard problem to solve. Any help is greatly

appreciated.

I think I might be able to make use of this module:

Regexp::List - builds regular expressions out of a list of words

xhoster · May 25, 2007

topher67 said:
Let's assume the following:
* curlies are the only specials allowed
* the substrings inside the curlies can be of differing lengths

Ah, that makes it harder than I had hoped...

* there may be more than one expanded set in the input list

Do you mean like "abc{d,e,f}ghi{j,k,l}mn" where you have a cartesian join,
or do you mean like in your example below, where there is more than one
"lines" of pattern but any given one of them has at most one set of
curlies?

* we won't handle nested curlies (e.g. Foo{A{1,2,3}Z,XY}Bar )

Nesting actually probably wouldn't be so bad to implement, at least
compared to Cartesian joins. In fact, the example you give below is just a
special kind of nesting, equivalent to {Foo{ZZZ,Y,XX}Bar,Baz{11,222},Nop}.
A special kind because you can only have two levels, and the outer level
cannot have any fixed characters in before or after--but still it is
nested.

Here's another example:

FooZZZBar
FooYBar
FooXXBar
Baz11
Baz222
Nop

Becomes:

Foo{ZZZ,Y,XX}Bar
Baz{11,222}
Nop

I realize that this is a hard problem to solve. Any help is greatly
appreciated.

There are many possible solutions, and it is not obvious how to assign a
score to each so that we can choose a single best one. Also, once a
scoring system is designed, it maybe computationally expensive to achieve.
So some kind of heuristic is probably needed. In the example you give, the
best matching at the front (Foo) corresponds to the best matching at the
rear (Bar). Is that likely to be a common occurrence in your data, or was
it just a coincident?

Does Regexp::List come up with a regex which matches all of the given words
*and nothing else*? The docs didn't seem to address that issue.

Anyway, if your goal is condense, say, a large directory listing down to a
handful of patterns that human could easily discern, I'm not sure that
something optimized for a regex engine would do a good job. (Although
looking at the techniques used by it could certainly be informative.)

If this is for human consumption, I would have a preference for patterns
in which the curlies occur at natural boundaries, such as transitions
from letter to number or number to letter or punctuation to
non-punctuation, etc.

As someone who frequently looks at very long directory listings of
computer-generated file names, this is something I've often thought about,
but never actually attempted.

Xho

topher67 · May 26, 2007

While it may not always return a human friendly result, it does seem
to work:

# refactor a glob
sub reglob {
my($pat) = @_;

# glob2list
my @list;
my @glob = bsd_glob($pat, GLOB_NOCHECK | GLOB_BRACE);
if (@glob) {
for my $glob (@glob) {
push @list, $glob;
}
}
else {
push @list, $pat;
}
# list2re
my $rl = Regexp::List->new(lookahead => 0, quotemeta => 0);
my $re = $rl->list2re( @list );
# re2glob
$re =~ s/\(\?-xism

.*)\)/$1/g;
$re =~ s/$\?:/(/g;
$re =~ s/^\(// and $re =~ s/$$//;
$re =~ tr/()|/{},/;

$re;
}

Sample in: aaa{11,22},aaa33

Sample out: aaa{11,22,33}

Python glob and raw string	6	Jan 16, 2014
Average of MultiMode of a list of a list	1	Oct 28, 2022
macro expansion	2	Aug 23, 2007
While loop query	2	Aug 8, 2022
Code to fill a form	1	Dec 2, 2021
FAQ 7.22 What's the difference between calling a function as &foo and foo()?	0	Feb 15, 2011
Bash-like brace expansion	6	Mar 24, 2009
reverse-function with regex	4	Jan 25, 2007

reverse a glob expansion

topher67

xhoster

topher67

topher67

xhoster

topher67

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads