Parsing Strings in Enclosed in Curly Braces

X

xamalek

How do you parse a string enclosed in Curly Braces?

For instance:

x = "{ABC EFG IJK LMN OPQ}"

I want to do x.split('{} ') and it does not work. Why does it not work
and what are EXCEPTIONS to using the split method?

That I want to split based on '{', '}' and WHITESPACE.

Please advise.

Regards,
Xav
 
C

Chris Rebert

How do you parse a string enclosed in Curly Braces?

For instance:

x = "{ABC EFG IJK LMN OPQ}"

I want to do x.split('{} ') and it does not work. Why does it not work
and what are EXCEPTIONS to using the split method?

..split() takes a *substring* to split on, *not* a set of individual
characters to split on. Read the Fine Docs.
That I want to split based on '{', '}' and WHITESPACE.

Well, you could use a regex, or you could just .find() where the
braces are, slice them off the ends of the string, and then split the
result on space.

Cheers,
Chris
 
M

Mike Driscoll

How do you parse a string enclosed in Curly Braces?

For instance:

x = "{ABC EFG IJK LMN OPQ}"

I want to do x.split('{} ') and it does not work. Why does it not work
and what are EXCEPTIONS to using the split method?

That I want to split based on '{', '}' and WHITESPACE.

Please advise.

Regards,
Xav

The reason it doesn't work is that you don't have a string with "{}".
Instead, you have one of each. If your string was like this, then it
would split:

x = "{} ABC EFG"

In the mean time, you can just use some string slicing like this:

y = x[1:-1]

That will remove the braces and allow you to manipulate the text
therein.

- Mike
 
M

MRAB

Chris said:
.split() takes a *substring* to split on, *not* a set of individual
characters to split on. Read the Fine Docs.


Well, you could use a regex, or you could just .find() where the
braces are, slice them off the ends of the string, and then split the
result on space.
Here's a function which splits on multiple characters:

def split_many(string, delimiters):
parts = [string]
for d in delimiters:
parts = sum((p.split(d) for p in parts), [])
return parts
 
K

kay

How do you parse a string enclosed in Curly Braces?

For instance:

x = "{ABC EFG IJK LMN OPQ}"

I want to do x.split('{} ') and it does not work. Why does it not work
and what are EXCEPTIONS to using the split method?

That I want to split based on '{', '}' and WHITESPACE.

Please advise.

Regards,
Xav

import string
ttable = string.maketrans("{} ", "\1\1\1")
print x.translate(ttable).split("\1") # -> ['', 'ABC', 'EFG', 'IJK',
'LMN', 'OPQ', '']

The validity of the translation+split depends on the presence of \1 in
the original string of course.
 
P

Peter Otten

ttable = string.maketrans("{} ", "\1\1\1")
print x.translate(ttable).split("\1") # -> ['', 'ABC', 'EFG', 'IJK',
'LMN', 'OPQ', '']

The validity of the translation+split depends on the presence of \1 in
the original string of course.

Keep one of the splitting characters to avoid that (theoretical) risk:

ttable = string.maketrans("{} ", "{{{")

Peter
 
J

John Machin

.split() takes a *substring* to split on, *not* a set of individual
characters to split on. Read the Fine Docs.
Well, you could use a regex, or you could just .find() where the
braces are, slice them off the ends of the string, and then split the
result on space.

Here's a function which splits on multiple characters:

def split_many(string, delimiters):
     parts = [string]
     for d in delimiters:
         parts = sum((p.split(d) for p in parts), [])
     return parts

Neat trick. However, from 2.6.2:
Help on built-in function sum in module __builtin__:

sum(...)
sum(sequence[, start]) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the
value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.

Since when is a list a number? Perhaps the help needs clarification,
in line with the docs.

Cheers,
John
 
R

Rhodri James

How do you parse a string enclosed in Curly Braces?

For instance:

x = "{ABC EFG IJK LMN OPQ}"

I want to do x.split('{} ') and it does not work. Why does it not work
and what are EXCEPTIONS to using the split method?

Other people have already done this bit :)
That I want to split based on '{', '}' and WHITESPACE.

This is not the same as what you first said, and I have a horrid feeling
you're trying to conflate two steps into one. That way lies madness.

First, a question. Is 'x' truly a good representation of your original
data? Could it instead look more like:

x = "Ignore this {but not this} and completely forget about this"

Can you have braces legitmately lying around in the string, escaped
somehow:

x = r"{Parse including the \} escaped close brace}"

Can you have nested braces, and what are you supposed to do with them:

x = "{Some text to parse {as well as an aside} and so on}"

Parsing is not an entirely trivial subject, particularly when users
can futz about with the strings you're parsing. That's one reason there
are so many lexers and parsers out there!
 
A

Aahz

Neat trick. However, from 2.6.2:
Help on built-in function sum in module __builtin__:

sum(...)
sum(sequence[, start]) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the
value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.

Since when is a list a number? Perhaps the help needs clarification,
in line with the docs.

The primary use-case for sum() is numbers, with a special exception to
prohibit using strings. Only strings are prohibited to allow using sum()
with user-defined classes. That makes it a little difficult to document
precisely in a summary; unless you can come up with a specific better
wording, the docs will probably stay as-is. If you do come up with
something better, please file it on bugs.python.org.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"In 1968 it took the computing power of 2 C-64's to fly a rocket to the moon.
Now, in 1998 it takes the Power of a Pentium 200 to run Microsoft Windows 98.
Something must have gone wrong." --/bin/fortune
 
J

John Machin

Neat trick. However, from 2.6.2:
Help on built-in function sum in module __builtin__:
sum(...)
   sum(sequence[, start]) -> value
   Returns the sum of a sequence of numbers (NOT strings) plus the
value
   of parameter 'start' (which defaults to 0).  When the sequence is
   empty, returns start.
Since when is a list a number? Perhaps the help needs clarification,
in line with the docs.

The primary use-case for sum() is numbers, with a special exception to
prohibit using strings.  Only strings are prohibited to allow using sum()
with user-defined classes.  That makes it a little difficult to document

Non sequitur.
precisely in a summary; unless you can come up with a specific better
wording, the docs will probably stay as-is.  If you do come up with
something better, please file it on bugs.python.org.

OK let's start with getting the docs right and then summarise that for
the help.
URI: http://docs.python.org/library/functions.html#sum
Contents:
"""
sum(iterable[, start])¶

Sums start and the items of an iterable from left to right and
returns the total. start defaults to 0. The iterable‘s items are
normally numbers, and are not allowed to be strings. The fast, correct
way to concatenate a sequence of strings is by calling ''.join
(sequence). Note that sum(range(n), m) is equivalent to reduce
(operator.add, range(n), m) To add floating point values with extended
precision, see math.fsum().

New in version 2.3
"""
Suggestions:
(1) fix what should be an apostrophe in "iterable's"
(2) s/sequence/iterable/g
(3) Either replace the sentence about "reduce" by "Note that sum
(iterable, start) is equivalent to reduce(operator.add, iterable,
start) except for the prohibition of strings" or explain why not or
what's so special about range(n).
(4) Add a full stop after the sentence about "reduce".

Help from 2.6.2:
"""
sum(sequence[, start]) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the
value
of parameter 'start' (which defaults to 0). When the sequence is
empty, returns start.
"""
Suggested replacement:
"""
sum(iterable[, start]) -> value

Returns the sum of the contents of the iterable plus the value
of parameter 'start' (which defaults to 0). When the iterable is
empty, returns start. Strings are disallowed; use
''.join(iterable) to concatenate strings efficiently.
"""

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top