RegEx for matching brackets

N

NevilleDNZ

Below is a (flawed) one line RegEx that checks curly brackets (from
awk/c/python input) are being matched. Is there a one liner for doing
this in python?

ThanX
N

re_open_close="(((\{))[^{}]*((?(0)\})))+"
re_open_close=re.compile(re_open_close)
tests="""
{ this is a test BAD
{ this is a test } OK
{ this is a test } { this is a test } OK
{ this is a test } { this { this is a test } is a test } OK
{ this is a test { this { this is a test } is a test } missing
close BAD
""".splitlines()[1:]
for test in tests:
if bool(re_open_close.search(test)) == bool(re.search("OK",test)):
print "DETECTED:",test
else:
print "MISSED:",test
[linux]$ python ./re_matching.py
DETECTED: { this is a test BAD
DETECTED: { this is a test } OK
DETECTED: { this is a test } { this is a test } OK
DETECTED: { this is a test } { this { this is a test } is a test }
OK
MISSED: { this is a test { this { this is a test } is a test }
missing close BAD
 
J

John Machin

Below is a (flawed) one line RegEx that checks curly brackets (from
awk/c/python input) are being matched. Is there a one liner for doing
this in python?

ThanX
N

re_open_close="(((\{))[^{}]*((?(0)\})))+"
re_open_close=re.compile(re_open_close)
tests="""
{ this is a test BAD
{ this is a test } OK
{ this is a test } { this is a test } OK
{ this is a test } { this { this is a test } is a test } OK
{ this is a test { this { this is a test } is a test } missing
close BAD
""".splitlines()[1:]
for test in tests:
if bool(re_open_close.search(test)) == bool(re.search("OK",test)):
print "DETECTED:",test
else:
print "MISSED:",test
[linux]$ python ./re_matching.py
DETECTED: { this is a test BAD
DETECTED: { this is a test } OK
DETECTED: { this is a test } { this is a test } OK
DETECTED: { this is a test } { this { this is a test } is a test }
OK
MISSED: { this is a test { this { this is a test } is a test }
missing close BAD

Regexes can't count.

To check for balanced braces, you need to write code e.g.

# untested
def balanced_braces(s):
n = 0
for c in s:
if c == '{':
n += 1
elif c == '}':
n -= 1
if n < 0:
return False
return n == 0

HTH,
John
 
G

George Sakkis

Below is a (flawed) one line RegEx that checks curly brackets (from
awk/c/python input) are being matched. Is there a one liner for doing
this in python?

There is not even a 1000-liner regular expression for this; it's a
context-free language [1], not a regular one [2]. Either do it
manually or use a parser generator [3].

George

[1] http://en.wikipedia.org/wiki/Context-free_language
[2] http://en.wikipedia.org/wiki/Regular_language
[3] http://wiki.python.org/moin/LanguageParsing
 
N

NevilleDNZ


Thanx for the link to these parsers. ANTLR looks interesting.
Yoyo: http://www-users.cs.york.ac.uk/~fisher/software/yoyovwg/readme

I figured out a way to do it in python.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import re
tests="""
{ a test BAD
{ a test } OK
{ a test } { a test } OK
{ a test } { this { a test } is a test } OK
{ a test { this { a test } is a test } missing close BAD
a test } { this { a test } is a test } BAD
{ a test } this { a test } is a test } BAD
""".splitlines()[1:]

def referee(str):
return bool(re.search("OK",test))

def check_open_close(str):
try:
eval("".join({"{":"[","}":"],"}[c] for c in re.findall( "([{}])|(?:
[^{}]+)", str) if c))
except SyntaxError:
return False
else:
return True

for test in tests:
if check_open_close(test) == referee(test):
print "DETECTED:",test
else:
print "MISSED:",test
[linux]$ ./matchall_open_close.py
DETECTED: { a test BAD
DETECTED: { a test } OK
DETECTED: { a test } { a test } OK
DETECTED: { a test } { this { a test } is a test } OK
DETECTED: { a test { this { a test } is a test } missing close BAD
DETECTED: a test } { this { a test } is a test } BAD
DETECTED: { a test } this { a test } is a test } BAD

It's not exactly what I planned, but as a python one/two-liner it
works fine.
eval("".join({"{":"[","}":"],"}[c] for c in re.findall( "([{}])|(?:
[^{}]+)", str) if c))

ThanX again
N
 
D

Derek Martin

Thanx for the link to these parsers. ANTLR looks interesting.
Yoyo: http://www-users.cs.york.ac.uk/~fisher/software/yoyovwg/readme

I figured out a way to do it in python. [...]

def check_open_close(str):
try:
eval("".join({"{":"[","}":"],"}[c] for c in re.findall( "([{}])|(?:
[^{}]+)", str) if c))

Ouchie... It may be short, but if I had to maintain this code, and I
saw this, your name would be used in sentences with lots of curse
words. ;-) That's one hard-to-read line of code... Also this may or
may not do what you want it to do -- I think it doesn't...

This problem is the classical example of when to use a stack (Last-In,
First-Out) data structure. If all you want to do is make sure that
the line has the same number of opens and closes in a line, your code
does that. But if you actually want to verify the syntax (i.e. make
sure that there are the same number of open brackets as close
brackets, AND make sure that they occur in the correct order, opens
first, closes last, AND that the closes come in the same (reverse)
order as the opens), your code does not do that.

I changed the tests in your code (specifically the brackets, and
nothing else) to demonstrate this:

DETECTED: { a test BAD
DETECTED: { a test } OK
# This should fail, because the closing ] comes before the open [
MISSED: { a test ] [ a test } BAD
DETECTED: { a test } { this { a test } is a test } OK
# this should fail, for the above reason, and because the order is wrong
MISSED: { a test { this { a test ] is a test } missing close [}} BAD
DETECTED: { a test { this { a test ] is a test } missing close } BAD
# this should also fail for both reasons
MISSED: { a test ] this { a test } is a test } missing close [ BAD
DETECTED: a test } { this { a test } is a test } BAD
DETECTED: { a test } this { a test } is a test } BAD

It doesn't detect the brackets in the right order (opens before
closes), nor does it detect that they occur in the correct sequence.

Clever code isn't always so clever... I think there's something to be
said for writing code that's a little bit more lengthy, but easier to
understand. This version is only a few lines longer than yours (in
total, not counting the comments), but it is a bit clearer and easier
to follow. Note that I added angle brackets and mixed the bracket
types in the tests. I also didn't use your referee... In my code,
the test simply succeeds if the brackets match, and fails if they are
unbalanced or out of order.

#!/usr/bin/python

# define the matching pairs
bm = { '}': '{',
']': '[',
')': '(',
'>': '<' }

def bracket_balance(str):
# initialize the stack to an empty list
blist = []
for c in str:
# If c is an open bracket of any type, place on stack
if c in bm.values():
blist.append(c)
# If it is a close bracket, pull one off the stack and
# see if they are matching open-close pairs. If the stack
# is empty, there was no matching open. Return false in that
# case, or if they don't match.
if c in bm.keys():
try:
foo = blist.pop()
except IndexError:
return False
if foo != bm[c]:
return False
# End of the line: if we still have brackets on the stack, we
# didn't have enough close brackets. Return false.
if blist != []:
return False
# If we got here, the stack is empty, and there are no brackets
# left unmatched. we're good!
return True

tests="""
{ this is a test BAD
< this is a test > OK
{ this is a test } { this is a test } OK
{ this is a test } [ this { this is a test } is a test ] OK
{ this is a test { this { this is a test } is a test } missing close BAD
""".splitlines()[1:]

for test in tests:
print "Testing %s:" % test
if bracket_balance(test):
print "-> OK"
else:
print "-> FAILED"

Testing with your original set of tests:

$ ./brackets.py
Testing { this is a test BAD:
-> FAILED
Testing < this is a test > OK:
-> OK
Testing { this is a test } { this is a test } OK:
-> OK
Testing { this is a test } [ this { this is a test } is a test ] OK:
-> OK
Testing { this is a test { this { this is a test } is a test } missing close BAD:
-> FAILED

Testing with my modified set of tests:

$ ./brackets.py
Testing { a test BAD:
-> FAILED
Testing { a test } OK:
-> OK
Testing { a test ] [ a test } BAD:
-> FAILED
Testing { a test } { this { a test } is a test } OK:
-> OK
Testing { a test { this { a test ] is a test } missing close [}} BAD:
-> FAILED
Testing { a test { this { a test ] is a test } missing close } BAD:
-> FAILED
Testing { a test ] this { a test } is a test } missing close [ BAD:
-> FAILED
Testing a test } { this { a test } is a test } BAD:
-> FAILED
Testing { a test } this { a test } is a test } BAD:
-> FAILED

In all cases, this code correctly identifies when the brackets are out
of order, or unbalanced.

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFIG+6xHEnASN++rQIRAnVjAKCbJ+bBdkVLOkNACyXPEEl9Z/X/EQCfev+k
e+Bqu8InxKKD01TNJdputKo=
=gjS7
-----END PGP SIGNATURE-----
 
N

NevilleDNZ

To check a complete python expression use:

def check_open_close(expr):
try:
eval(expr)
except SyntaxError:
return False
else:
return True

This also ignores brackets in quotes, and checks <= & >= operators are
syntatically correct etc...
But is may have side effects... ;-)
eg.
check_open_close('__import__("os").system("echo rm -rf /") # OK')

c.f http://en.wikipedia.org/wiki/Scope_creep

N
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top