beginners question about return value of re.split

K

klaus

Hello,

I have a question regarding the return value of re.split() since I have
been unable to find any answers in the regular sources of documentation.

Please consider the following:

#!/usr/bin/env python

import re

if __name__ == "__main__":
datum = "2008-03-14"
the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
print the_date

Now the result that is printed is:
['', '2008', '03', '14', '']

My question: what are the empty strings doing there in the beginning and
in the end ? Is this due to a faulty regular expression ?

Thank you !

KL.
 
D

Diez B. Roggisch

klaus said:
Hello,

I have a question regarding the return value of re.split() since I have
been unable to find any answers in the regular sources of documentation.

Please consider the following:

#!/usr/bin/env python

import re

if __name__ == "__main__":
datum = "2008-03-14"
the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
print the_date

Now the result that is printed is:
['', '2008', '03', '14', '']

My question: what are the empty strings doing there in the beginning and
in the end ? Is this due to a faulty regular expression ?

Read the manual:

"""
split( pattern, string[, maxsplit = 0])
Split string by the occurrences of pattern. If capturing
parentheses are used in pattern, then the text of all groups in the
pattern are also returned as part of the resulting list. If maxsplit is
nonzero, at most maxsplit splits occur, and the remainder of the string
is returned as the final element of the list. (Incompatibility note: in
the original Python 1.5 release, maxsplit was ignored. This has been
fixed in later releases.)

"""

The Key issue here being "If capturing parentheses are used in pattern,
then the text of all groups in the pattern are also returned as part of
the resulting list."

Consider this:
>>> re.compile("a").split("bab") ['b', 'b']
>>> re.compile("(a)").split("bab") ['b', 'a', 'b']
>>>

Consider using match or search if split isn't what you actually want.

Diez
 
T

Tim Chase

datum = "2008-03-14"
the_date = re.split('^([0-9]{4})-([0-9]{2})-([0-9]{2})$', datum, 3)
print the_date

Now the result that is printed is:
['', '2008', '03', '14', '']

My question: what are the empty strings doing there in the beginning and
in the end ? Is this due to a faulty regular expression ?


I think in this case, you just want the standard string .split()
method:

the_date = datum.split('-')

which will return you

['2008', '03', '14']

The re.split() splits your string using your regexp as the way to
find the divider. It finds emptiness before, emptiness after,
and returns the tagged matches for each part. It would be similar to
['', '']

only you get your tagged matches in there too. Or, if you need
more precision in your matching (in your case, ensuring that
they're digits, and with the right number of digits), you can do
something like
>>> r = re.compile('^([0-9]{4})-([0-9]{2})-([0-9]{2})$')
>>> m = r.match(datum)
>>> m.groups()
('2008', '03', '14')

-tkc
 
K

klaus

On Fri, 21 Mar 2008 10:31:20 -0500, Tim Chase wrote:

<..........>

Ok thank you !

I think I got a bit lost in all the possibilities python has to offer.
But your answers did the trick.

Thank you all again for responding and elaborating.

Cheers,

KL.
 
J

John Machin

On Fri, 21 Mar 2008 10:31:20 -0500, Tim Chase wrote:

<..........>

Ok thank you !

I think I got a bit lost in all the possibilities python has to offer.

IMHO you got more than a bit lost. You seem to have stumbled on a
possibly unintended side effect of re.split.

What is your underlying goal?

If you want merely to split on '-', use datum.split('-').

If you want to verify the split results as matching patterns (4
digits, 2 digits, 2 digits), use something like this:

| >>> import re
| >>> datum = '2008-03-14'
| >>> pattern = r'^(\d\d\d\d)-(\d\d)-(\d\d)\Z'
You may notice two differences between my pattern and yours ...
| >>> mobj = re.match(pattern, datum)
| >>> mobj.groups()
| ('2008', '03', '14')

But what are you going to do with the result? If the resemblance
between '2008-03-14' and a date is not accidental, you may wish to
consider going straight from a string to a datetime or date object,
e.g.

| >>> import datetime
| >>> dt = datetime.datetime.strptime(datum, '%Y-%m-%d')
| >>> dt
| datetime.datetime(2008, 3, 14, 0, 0)
| >>> d = datetime.datetime.date(dt)
| >>> d
| datetime.date(2008, 3, 14)

HTH,
John
 
K

klaus

IMHO you got more than a bit lost. You seem to have stumbled on a
possibly unintended side effect of re.split.

What is your underlying goal?

If you want merely to split on '-', use datum.split('-').

If you want to verify the split results as matching patterns (4 digits,
2 digits, 2 digits), use something like this:

| >>> import re
| >>> datum = '2008-03-14'
| >>> pattern = r'^(\d\d\d\d)-(\d\d)-(\d\d)\Z' You may notice two
differences between my pattern and yours ...
| >>> mobj = re.match(pattern, datum) | >>> mobj.groups()
| ('2008', '03', '14')

But what are you going to do with the result? If the resemblance between
'2008-03-14' and a date is not accidental, you may wish to consider
going straight from a string to a datetime or date object, e.g.

| >>> import datetime
| >>> dt = datetime.datetime.strptime(datum, '%Y-%m-%d')
| >>> dt
| datetime.datetime(2008, 3, 14, 0, 0)
| >>> d =
datetime.datetime.date(dt)
| >>> d
| datetime.date(2008, 3, 14)

HTH,
John

Ok, sorry for my late reply. I got caught up in a fight with easterbunnys
over some extraordinary large, fruitty and fertile eggs. Some creatures
take Easter just to serious and it is not even mating season ! Can you
believe that ?

:)

Anyway, the underlying goal was to verify user input and to split up the
date so that I could easily convert it to another format. Among others,
an url and for a database querry. And I have succeeded in that.

Thank you again; for taking the time to explain - and to question.

KL.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top