Trouble splitting strings with consecutive delimiters

D

deuteros

I'm using regular expressions to split a string using multiple delimiters.
But if two or more of my delimiters occur next to each other in the
string, it puts an empty string in the resulting list. For example:

re.split(':|;|px', "width:150px;height:50px;float:right")

Results in

['width', '150', '', 'height', '50', '', 'float', 'right']

Is there any way to avoid getting '' in my list without adding px; as a
delimiter?
 
J

Jussi Piitulainen

deuteros said:
I'm using regular expressions to split a string using multiple
delimiters. But if two or more of my delimiters occur next to each
other in the string, it puts an empty string in the resulting
list. For example:

re.split(':|;|px', "width:150px;height:50px;float:right")

Results in

['width', '150', '', 'height', '50', '', 'float', 'right']

Is there any way to avoid getting '' in my list without adding px;
as a delimiter?

You could use a sequence of such delimiters.
['width', '150', 'height', '50', 'float', 'right']

Consider splitting twice instead: first into key-value substrings at
semicolons, and those into key-value pairs at colons. Here as a dict.
Better handle the units after that.
{'width': '150px', 'float': 'right', 'height': '50px'}

You might also want to accept whitespace as part of the delimiters.

(There might be a parser for such data formats somewhere in the
library already. CSV?)
 
S

Steven D'Aprano

I'm using regular expressions to split a string using multiple
delimiters. But if two or more of my delimiters occur next to each other
in the string, it puts an empty string in the resulting list.

As I would expect. After all, there *is* an empty string between two
delimiters.

For example:

re.split(':|;|px', "width:150px;height:50px;float:right")

Results in

['width', '150', '', 'height', '50', '', 'float', 'right']

Is there any way to avoid getting '' in my list without adding px; as a
delimiter?

Probably. But why not do it the easy way?


items = re.split(':|;|px', "width:150px;height:50px;float:right")
items = filter(None, item)

In Python 3, the second line will need to be list(filter(None, item)).
 
R

rusi

I'm using regular expressions to split a string using multiple delimiters..
But if two or more of my delimiters occur next to each other in the
string, it puts an empty string in the resulting list. For example:

        re.split(':|;|px', "width:150px;height:50px;float:right")

Results in

        ['width', '150', '', 'height', '50', '', 'float', 'right']

Is there any way to avoid getting '' in my list without adding px; as a
delimiter?

Are you parsing css?
If so have you tried things like cssutils http://cthedot.de/cssutils/?
[There are other such... And I dont know which is best...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top