Converting DD MM YYYY into YYYY-MM-DD?

G

Gilles Ganault

Hello,

I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Août 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Thank you.
 
J

Jonathan Gardner

Hello,

        I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Août 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Thank you.
 
C

Che M

Hello,

        I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Août 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Thank you.

Likely this is not the best way, but I would do, for
the first one (and the same idea for the second):

def convert(date):
frenchdict = {'Mai':'May'} #etc...
day = mystring[:2]
month = frenchdict[ mystring[3:6] ]
year = mystring[7:11]
newdate = year+'-'+month+'-'+day
print 'newdate is ', newdate
 
J

Jonathan Gardner

        I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Août 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Unfortunately, there isn't any string to date parsers in the built-
ins. Not to worry, though, since writing your own is easy, especially
if you use regular expressions from the re module. I suggest using an
RE such as:

r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"

If you want to translate month names to month numbers, then you need
some sort of dict to do so. Unfortunately, there isn't a terrific
standard for this, so your best bet is to put it in some file
somewhere, or even hard-code it in your code. (Month names won't
change over the lifetime of your program, so it's reasonable to put
them in your code somewhere.)

month_names_to_numbers = {
'jan':1, ... }

Once you have the year, month, and date, formatting it is trivial with
the built-in formatter.

"%04d-%02d%02d %02d:%02d" % (year, month, date, hour, minute)

The variety of date formats out there have prevented a universal,
clean solution to this problem. Until we all start sticking to the
same conventions, we will always have to write code to translate dates
from one format to another.
 
R

Rami Chowdhury

Correct me if I'm wrong, but doesn't
http://docs.python.org/library/datetime.html#datetime.datetime.strptime do
this?
'2009-05-05 18:25'

If you're using a recent enough version of Python (2.5 and up) I'd imagine
that's the best way to do it?

Hello,

        I need to convert DD MM YYYY dates into the MySQL-friendly
YYYY-MM-DD, and translate the month name from literal French to its
numeric equivalent (eg. "Janvier" into "01").

Here's an example:

SELECT dateinscription, dateconnexion FROM membres LIMIT 1;
26 Mai 2007|17 Août 2009 - 09h20

I'd like to update the row into "2007-05-26" and "2009-08-17 09:20",
respectively.

What is the best way to do this in Python?

Thank you.

Likely this is not the best way, but I would do, for
the first one (and the same idea for the second):

def convert(date):
frenchdict = {'Mai':'May'} #etc...
day = mystring[:2]
month = frenchdict[ mystring[3:6] ]
year = mystring[7:11]
newdate = year+'-'+month+'-'+day
print 'newdate is ', newdate
 
J

Jonathan Gardner

Fortunately, Python 2.5 or later has the ‘datetime.strptime’ function..

Hate to weasel out of this one, but the language that strptime
provides is pretty limited. I don't find it useful except in the
trivial cases. Same goes for strftime. Also, both of these are very
Western European centric. Yes, Asian languages are supported but not
naturally.
 
J

Jonathan Gardner

locale is nice when you only have a single thread.

Webservers aren't single threaded. You can't serve up one page for one
locale and then another in another locale without seeing very, very
weird behavior.
 
R

Rami Chowdhury

----
My sample interactive session (locale.setlocale and all) was on a 32-bit Vista
install of Python 2.5, so it works on that...
 
G

Gilles Ganault

Thanks everyone for the help. This script is just a one-shot thingie
on my work host, not as a web script or anything professional.

Unfortunately, there isn't any string to date parsers in the built-
ins. Not to worry, though, since writing your own is easy, especially
if you use regular expressions from the re module. I suggest using an
RE such as:

r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)"

I've never seen regexes like this. I'm curious to know what those
mean:

r = Unicode?

(?P<date> = ? means that it shouldn't be greedy, what about P<date>?
 
G

Gilles Ganault

Luckily, you have access to the documentation to find out.

I never used groups before. Thanks for showing me.

At this point, the script is almost done, but the regex fails if the
month contains accented characters (eg. "Août", but fine if eg.
"Jan").

Adding a line to load the French locale doesn't help :-/

Any idea what I could do to keep the regex happy?

Thank you.

==============
import re
import apsw
import locale

#In case error due to accent in month name, but no soup 4 U
locale.setlocale(locale.LC_ALL, 'FR')

connection=apsw.Connection("test.sqlite")
cursor=connection.cursor()

re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)")

sql = 'SELECT id,dateinscription,dateconnexion FROM mytable'
rows=list(cursor.execute(sql))
for row in rows:
dateinscription = row[1]
dateconnexion = row[2]

#Prints OK
print dateinscription

m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "No go"
==============
 
R

Rami Chowdhury

Could you let me know which platform this is on (Windows, *nix)? It may be a
locale encoding issue -- the locale.setlocale() function allows the second
argument to be a tuple of (locale_code, encoding), as below:

locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

Since this is for a one-shot (and presumably threading-agnostic) program, and
a fairly trivially formatted date-string, I would suggest using
datetime.strptime
(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
not regular expressions (which IIRC have Issues with non-ASCII characters).


----
Rami Chowdhury
"Ninety percent of everything is crap." -- Sturgeon's Law
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Luckily, you have access to the documentation to find out.

I never used groups before. Thanks for showing me.

At this point, the script is almost done, but the regex fails if the
month contains accented characters (eg. "Août", but fine if eg.
"Jan").

Adding a line to load the French locale doesn't help :-/

Any idea what I could do to keep the regex happy?

Thank you.

==============
import re
import apsw
import locale

#In case error due to accent in month name, but no soup 4 U
locale.setlocale(locale.LC_ALL, 'FR')

connection=apsw.Connection("test.sqlite")
cursor=connection.cursor()

re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)")

sql = 'SELECT id,dateinscription,dateconnexion FROM mytable'
rows=list(cursor.execute(sql))
for row in rows:
dateinscription = row[1]
dateconnexion = row[2]

#Prints OK
print dateinscription

m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "No go"
==============
 
G

Gilles Ganault

Could you let me know which platform this is on (Windows, *nix)? It may be a
locale encoding issue -- the locale.setlocale() function allows the second
argument to be a tuple of (locale_code, encoding), as below:

locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

It's on XP, and I'm using ActivePython 2.5.1.1.
http://www.activestate.com/activepython/

Python doesn't like the above:

#locale.Error: unsupported locale setting
locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

Maybe it was introduced in more recent versions of Python?
Since this is for a one-shot (and presumably threading-agnostic) program, and
a fairly trivially formatted date-string, I would suggest using
datetime.strptime
(http://docs.python.org/library/datetime.html#datetime.datetime.strptime) and
not regular expressions (which IIRC have Issues with non-ASCII characters).

If the regex library can only handle basic latin characters, I'll wait
until a script I'm running is done, and I'll upgrade to the 2.6.2.2 to
see how it goes.

Thank you.
 
R

Rami Chowdhury

Python doesn't like the above:
#locale.Error: unsupported locale setting
locale.setlocale(locale.LC_ALL, ('FR', 'UTF-8'))

Maybe it was introduced in more recent versions of Python?
Hmm, that's odd. According to the docs
(http://docs.python.org/library/locale.html#locale.setlocale) it's been that
way since 2.0, but I've just checked this on my Windows (Vista) machine and
you're right, it returns an error.

This worked for me on 32-bit Vista:

locale.setlocale(locale.LC_ALL, 'FR')

It uses Windows-1252 for the encoding, but that seems to deal with the
circonflexe in 'Août' just fine, so it should work for this purpose.
 
G

Gilles Ganault

I find it odd that the regex library can't handle European characters
:-/

Ha, found it! :)

http://www.regular-expressions.info/python.html

=========
# -*- coding: latin-1 -*-

import locale
import re

locale.setlocale(locale.LC_ALL, 'FR')

re_inscription =
re.compile(r"(?P<date>\d+)\s+(?P<month>\w+)\s+(?P<year>\d+)",re.LOCALE)

dateinscription = "11 Août 2008"

m = re_inscription.search(dateinscription)
if m:
day = m.group("date")
month = m.group("month")
year = m.group("year")
print "%s-%s-%s" % (year,month,day)
else:
print "Yuck"
=========

Thanks everyone!
 
G

Gilles Ganault

The principles of handling text in Python: Get it to internal Unicode
objects as soon as possible, handle it as Unicode for as long as
possible, and only encode it to some byte stream for output as late as
possible.

Thanks much for the tip. I'll keep that in mind when I have strings
with accents.
 
S

Stefan Behnel

Ben said:
Again, note that these recommendations hold for *any* text in Python,
with or without accents; once you accept that text is best handled in
Unicode, there's little sense in making an exception for the limited
subset that happens to be representable in ASCII.

If the QOTW wasn't meant for fun, I'd vote for this. This is very good advice.

Stefan
 
M

Mark Tolonen

Gilles Ganault said:
I find it odd that the regex library can't handle European characters

It can. Read the documentation about the re.LOCALE flag.

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,048
Members
48,769
Latest member
Clifft

Latest Threads

Top