Bash-like brace expansion

P

Peter Waller

Okay, I got fed up with there not being any (obvious) good examples of
how to do bash-like brace expansion in Python, so I wrote it myself.
Here it is for all to enjoy!

If anyone has any better solutions or any other examples of how to do
this, I'd be glad to hear from them.

#~ BraceExpand.py - Bash-like brace expansion in Python
#~ Copyright (C) 2009 <[email protected]>

#~ This program is free software: you can redistribute it and/or
modify
#~ it under the terms of the GNU Affero General Public License as
#~ published by the Free Software Foundation, either version 3 of the
#~ License, or (at your option) any later version.

#~ This program is distributed in the hope that it will be useful,
#~ but WITHOUT ANY WARRANTY; without even the implied warranty of
#~ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
#~ GNU Affero General Public License for more details.

#~ You should have received a copy of the GNU Affero General Public
License
#~ along with this program. If not, see <http://www.gnu.org/licenses/

import re
from itertools import imap

class NoBraces(Exception): pass

def BraceExpand(text, ordering = []):
"""Recursively brace expand text in a bash-like fashion.

See 'man 2 bash' under the heading 'Brace Expansion'

Example: "/data/file{1..10}.root" expands to file1.root ...
file9.root

Ordering allows changing the order of iteration. The rules for
this are
a little complicated. Ordering is a list of boolean values.
If it is possible to expand the postamble, the first value from
the list
is removed, and used to determine whether the postamble should be
iterated
over before iterating over the amble. The list is then passed
down
recursively to the next Brace.

What does this code do?

It is simpler than it looks.

There are three main steps:
1) Split the string into three parts, consisting of pre-amble,
amble and
post-amble. (This requires keeping track of nested {}'s.)
2) In the amble, figure out which commas are not stuck in {}'s,
and
split the string by those commas.
3) For each part of this split string, Add together the pre-
amble, the
string and the post-amble. Call BraceExpand() on this string
to deal
with nested braces and any expansion required in the
postamble

Other things this code does which make it look more complicated:
* Expand ranges along the lines of {1..10}
* Allow for re-arranging the order of iteration

Todo/Not Implemented:
* Escaping/quoting

Example C implementation from bash (inspiration for this):
http://www.oldlinux.org/Linux.old/bin/old/bash-1.11/braces.c
"""

def FindMatchedBraces(text, position = -1):
"Search for nested start and end brace in text starting from
position"
braceDepth = 0
nextClose = -1
first = True

# Search for a {
# is it closer than the nearest } ?
# Yes : increase brace depth
# No : decrease brace depth
# When we reach braceDepth == 0, we have found our close brace
while braceDepth or first:

nextOpen = text.find("{", position+1)
nextClose = text.find("}", position+1)

if first and nextOpen >= 0:
startBrace = nextOpen
first = False

if nextOpen < nextClose and nextOpen >= 0:
braceDepth += 1
position = nextOpen
elif nextClose >= 0:
braceDepth -= 1
position = nextClose
else:
raise NoBraces()

return startBrace, position

try: start, end = FindMatchedBraces(text)
except NoBraces:
# There are no braces! Nothing to expand!
return [text]

# Split the text into three bits, '{pre,,post}amble'. The 'pre' is
anything
# before expansion, the '' is the bit that needs expanding and
gluing to
# the pre and the post. After gluing together, we can recursively
expand
# again
preamble = text[:start]
amble = text[start+1:end]
postamble = text[end+1:]

def BareCommaSearcher(amble):
"Search for commas which are not encapsulated in {}"

haveBraces = True
try: start, end = FindMatchedBraces(amble)
except NoBraces: haveBraces = False

position = -1
while True:
position = amble.find(",", position+1)

if position < 0:
# We didn't find any comma after 'position', finish
searching.
break

if haveBraces and start < position < end:
# We're inside some braces, skip to the end of them,
find the
# next set.
position = end

haveBraces = True
try: start, end = FindMatchedBraces(amble, position)
except NoBraces: haveBraces = False

continue

yield position

# Reached the end of the string. (Conveniently
"text"[pos:None] is the
# same as "text"[pos:]
yield None

parts = []
rangeMatcher = re.compile(r"^(\d+)..(\d+)$")

# Find each segment of the amble, splitting by 'bare' commas
position = 0
for commaPos in BareCommaSearcher(amble):
# Get a bit between bare commas
part = amble[position:commaPos]

# Found a matched range, expand it!
matchedRange = rangeMatcher.match(part)
if matchedRange:
matchedIndices = map(int, matchedRange.groups())
parts.extend(imap(str,xrange(*matchedIndices)))
else:
parts.append(part)

if not commaPos:
# Avoid None + 1 at the end of iteration
break

position = commaPos + 1

# Is the postamble expandable? (Does it contain braces?)
postambleExpandable = True
try: start, end = FindMatchedBraces(postamble)
except NoBraces: postambleExpandable = False

# Check to see if we should iterate over postamble first, also
sort out
# iteration ordering
postambleFirst = ordering
thisPostambleFirst = False
if postambleExpandable and postambleFirst:
postambleFirst = list(postambleFirst)
thisPostambleFirst = postambleFirst.pop(0)

result = []

if postambleExpandable and thisPostambleFirst:
# Iterate of expanded postamble first
for postpart in BraceExpand(postamble):
for part in parts:
part = "".join([preamble, part, postpart])
result.extend(BraceExpand(part, postambleFirst))

else:
for part in parts:
part = "".join([preamble, part, postamble])
result.extend(BraceExpand(part, postambleFirst))

return result

if __name__ == "__main__":
from pprint import pprint
pprint(BraceExpand("electron_{n,{pt,eta,phi}[{1,2}]}", ordering =
[True]))

#~ pprint(BraceExpand("Myfile{1,3..10}.root"))

#~ pprint(BraceExpand("{pre,,post}amble"))

#~ pprint(BraceExpand("amble{a,b,}}"))
 
P

Peter Waller

Okay, yuck. I didn't realise that posting would mangle the code so
badly. Is there any better way to attach code? I'm using google
groups.
 
T

Tino Wildenhain

Peter said:
Okay, I got fed up with there not being any (obvious) good examples of
how to do bash-like brace expansion in Python, so I wrote it myself.
Here it is for all to enjoy!

If anyone has any better solutions or any other examples of how to do
this, I'd be glad to hear from them.

It may be a funny experiment but I really fail to see the value in your
proposal.

The simple {foo} expansion you mention should be quite easily handled
with re.sub and a function as argument. So not much more then a few
lines of code.

Interesting could be to have {foo#bar} and {foo%bar} as well but again
I don't think the whole stuff would be very usefull anyway given the
%(foo)s form works quite well and has a host of options (for aligning
for example).

Cheers
Tino
 
P

Peter Waller

Heh, thanks :)

Unit tests did cross my mind. I was kicking myself for not starting
out with them, there were several regressions during development, and
there could well still be lurking corner cases ;)

I've since heard that a 'better way' would be to use pyparsing. Also,
I saw that python has dropped the idea of having recursive regular
expressions at the moment.

http://bugs.python.org/msg83993

Maybe I might re-implement this with pyparsing and some unit tests.
 
P

Peter Waller

I've since heard that a 'better way' would be to use pyparsing. Also,
I saw that python has dropped the idea of having recursive regular
expressions at the moment.

I've written a pyparsing parser which is very much shorter and nicer
than the original parser. I have not yet figured out how to process
the result yet though, I'm having a hard time dealing with the
recursion in my brain. Here it is anyway, in case someone smarter than
me can figure out how to turn the output of this into the relevant
list of expansions.

from pyparsing import (Suppress, Optional, CharsNotIn, Forward,
Group,
ZeroOrMore, delimitedList)

lbrack, rbrack = map(Suppress, "{}")

optAnything = Group(Optional(CharsNotIn("{},")))



braceExpr = Forward()

listItem = optAnything ^ Group(braceExpr)

bracketedList = Group(lbrack + delimitedList(listItem) + rbrack)

braceExpr << optAnything + ZeroOrMore(bracketedList + optAnything)



result = braceExpr.parseString(text).asList()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top