splitting a string into 2 new strings

M

Mark Light

Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module -
but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.
 
T

trp

Mark said:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module
- but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.

I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.

Here's how I'd do it

import re

re_pat = re.compile('([A-Z]+)(\d+)')
text = 'C6 H12 O6'

# find each component, returns list of tuples (e.g. [('C', '6'), ...]
component = re_pat.findall(text)

#split into separate lists
symbols, counts = zip(*component)

# create the strings
symbols = ' '.join(symbols)
counts = ' '.join(counts)

--Andy
 
M

Mark Light

that works great - many thanks.

trp said:
Mark said:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module
- but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.

I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.

Here's how I'd do it

import re

re_pat = re.compile('([A-Z]+)(\d+)')
text = 'C6 H12 O6'

# find each component, returns list of tuples (e.g. [('C', '6'), ...]
component = re_pat.findall(text)

#split into separate lists
symbols, counts = zip(*component)

# create the strings
symbols = ' '.join(symbols)
counts = ' '.join(counts)

--Andy
 
P

P

Mark said:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module -
but can't quite get there.

Any help would be greatly appreciated.

import re

molecule_re = re.compile("(.+?)([0-9]+)")
def processMolecule(molecule):
elements=[]
numbers=[]

for item in molecule.split():
element, number = molecule_re.findall(item)[0]
elements.append(element)
numbers.append(number)

elements = ' '.join(elements)
numbers = ' '.join(numbers)

return (elements, numbers)

print processMolecule('C6 H12 O6')
 
A

Andrew Dalke

trp:
I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.

The problem is underspecified. Usually 2-character (or 3-character for some
elements with high atomic number, and not assuming the newer IUPAC names
like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
letter
capitalized and the rest in lower case.
re_pat = re.compile('([A-Z]+)(\d+)')

So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
both lower and upper case trailing letters to be more accepting. (In some
systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
to
write "1 copper". Though I suspect it's not allowed in the OP's problem.)

Andrew
(e-mail address removed)
 
A

Andrew Dalke

Anton Vredegoor:
The issue seems to be resolved already, but I haven't seen the split
and strip combination:

from string import letters,digits

Use "ascii_letters" instead of "letters". The latter is based on the locale
so
might not work on some machines where "C" (or rather, byte 67) isn't
a letter in the local alphabet.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top