Spliting a string on non alpha characters

S

stdazi

Hello!

I'm relatively new to python but I already noticed that many lines of
python code can be simplified to a oneliner by some clever coder. As
the topics says, I'm trying to split lines like this :

'foo bar- blah/hm.lala' -> [foo, bar, blah, hm, lala]

'foo////bbbar.. xyz' -> [foo, bbbar, xyz]

obviously a for loop catching just chars could do the trick, but I'm
looking for a more elegant way. Anyone can help?
 
T

Tim Chase

I'm relatively new to python but I already noticed that many lines of
python code can be simplified to a oneliner by some clever coder. As
the topics says, I'm trying to split lines like this :

'foo bar- blah/hm.lala' -> [foo, bar, blah, hm, lala]

'foo////bbbar.. xyz' -> [foo, bbbar, xyz]

obviously a for loop catching just chars could do the trick, but I'm
looking for a more elegant way. Anyone can help?

1st, I presume you mean that you want back

['foo', 'bar', 'blah', 'hm', 'lala']

instead of

[foo, bar, blah, hm, lala]

(which would presume you have variables named as such, which is
kinda funky)

That said...

Well, I'm sure there are scads of ways to do this. I know
regexps can do it fairly cleanly:
>>> import re
>>> r = re.compile(r'\w+')
>>> s = 'foo bar- blah/hm.lala'
>>> s2 = 'foo////bbbar.. xyz'
>>> r.findall(s) ['foo', 'bar', 'blah', 'hm', 'lala']
>>> r.findall(s2)
['foo', 'bbbar', 'xyz']

The regexp in question (r'\w+') translates to "one or more 'word'
character". The definition of a 'word' character depends on your
locale/encoding, but would at a minimum include your standard
alphabet, and digits.

If you're not interested in digits, and only want 26*2 letters,
you can use
>>> r = re.compile(r'[a-zA-Z]+')

instead (which would be "one or more letters in the set [a-zA-Z]").

-tkc
 
M

Mark Peters

I'm relatively new to python but I already noticed that many lines of
python code can be simplified to a oneliner by some clever coder. As
the topics says, I'm trying to split lines like this :

'foo bar- blah/hm.lala' -> [foo, bar, blah, hm, lala]

'foo////bbbar.. xyz' -> [foo, bbbar, xyz]

obviously a for loop catching just chars could do the trick, but I'm
looking for a more elegant way. Anyone can help?

A simple regular expression would work:['foo', 'bar', 'blah', 'hm', 'lala']
 
B

bearophileHUGS

stdazi:

The RE-based solutions look good. Here is a pair of alternative
solutions:

s1 = 'foo bar- blah/hm.lala'
r1 = ['foo', 'bar', 'blah', 'hm', 'lala']

s2 = 'foo////bbbar.. xyz'
r2 = ['foo', 'bbbar', 'xyz']

table = "".join((c if c.isalpha() else " " for c) in map(chr,
range(256)))
#table = "".join((" "+c)[c.isalpha()] for c in map(chr, range(256))) #
Py2.4
print s1.translate(table).split()
print s2.translate(table).split()

Or:

from itertools import groupby
print ["".join(gr) for he,gr in groupby(s1, str.isalpha) if he]
print ["".join(gr) for he,gr in groupby(s2, str.isalpha) if he]

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top