splitting a string into 2 new strings

Discussion in 'Python' started by Mark Light, Jul 2, 2003.

  1. Mark Light

    Mark Light Guest

    Hi,
    I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
    strings
    'C H O' and '6 12 6'. I have played with string.split() and the re module -
    but can't quite get there.

    Any help would be greatly appreciated.

    Thanks,

    Mark.
    Mark Light, Jul 2, 2003
    #1
    1. Advertising

  2. Mark Light

    trp Guest

    Mark Light wrote:

    > Hi,
    > I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
    > strings
    > 'C H O' and '6 12 6'. I have played with string.split() and the re module
    > - but can't quite get there.
    >
    > Any help would be greatly appreciated.
    >
    > Thanks,
    >
    > Mark.


    I'm, assuming that these are chemical compounds, so you're not limited to
    one-character symbols.

    Here's how I'd do it

    import re

    re_pat = re.compile('([A-Z]+)(\d+)')
    text = 'C6 H12 O6'

    # find each component, returns list of tuples (e.g. [('C', '6'), ...]
    component = re_pat.findall(text)

    #split into separate lists
    symbols, counts = zip(*component)

    # create the strings
    symbols = ' '.join(symbols)
    counts = ' '.join(counts)

    --Andy
    trp, Jul 2, 2003
    #2
    1. Advertising

  3. Mark Light

    Mark Light Guest

    that works great - many thanks.

    "trp" <> wrote in message
    news:...
    > Mark Light wrote:
    >
    > > Hi,
    > > I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
    > > strings
    > > 'C H O' and '6 12 6'. I have played with string.split() and the re

    module
    > > - but can't quite get there.
    > >
    > > Any help would be greatly appreciated.
    > >
    > > Thanks,
    > >
    > > Mark.

    >
    > I'm, assuming that these are chemical compounds, so you're not limited to
    > one-character symbols.
    >
    > Here's how I'd do it
    >
    > import re
    >
    > re_pat = re.compile('([A-Z]+)(\d+)')
    > text = 'C6 H12 O6'
    >
    > # find each component, returns list of tuples (e.g. [('C', '6'), ...]
    > component = re_pat.findall(text)
    >
    > #split into separate lists
    > symbols, counts = zip(*component)
    >
    > # create the strings
    > symbols = ' '.join(symbols)
    > counts = ' '.join(counts)
    >
    > --Andy
    >
    >
    >
    >
    >
    Mark Light, Jul 2, 2003
    #3
  4. Mark Light

    Guest

    Mark Light wrote:
    > Hi,
    > I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
    > strings
    > 'C H O' and '6 12 6'. I have played with string.split() and the re module -
    > but can't quite get there.
    >
    > Any help would be greatly appreciated.


    import re

    molecule_re = re.compile("(.+?)([0-9]+)")
    def processMolecule(molecule):
    elements=[]
    numbers=[]

    for item in molecule.split():
    element, number = molecule_re.findall(item)[0]
    elements.append(element)
    numbers.append(number)

    elements = ' '.join(elements)
    numbers = ' '.join(numbers)

    return (elements, numbers)

    print processMolecule('C6 H12 O6')
    , Jul 2, 2003
    #4
  5. Mark Light

    Andrew Dalke Guest

    trp:
    > I'm, assuming that these are chemical compounds, so you're not limited to
    > one-character symbols.


    The problem is underspecified. Usually 2-character (or 3-character for some
    elements with high atomic number, and not assuming the newer IUPAC names
    like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
    your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
    letter
    capitalized and the rest in lower case.

    > re_pat = re.compile('([A-Z]+)(\d+)')


    So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
    both lower and upper case trailing letters to be more accepting. (In some
    systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
    to
    write "1 copper". Though I suspect it's not allowed in the OP's problem.)

    Andrew
    Andrew Dalke, Jul 2, 2003
    #5
  6. Mark Light

    Andrew Dalke Guest

    Anton Vredegoor:
    > The issue seems to be resolved already, but I haven't seen the split
    > and strip combination:
    >
    > from string import letters,digits


    Use "ascii_letters" instead of "letters". The latter is based on the locale
    so
    might not work on some machines where "C" (or rather, byte 67) isn't
    a letter in the local alphabet.

    Andrew
    Andrew Dalke, Jul 2, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page