Problema con le RE....

Alessandro · Jan 9, 2006

Problema con le RE....
Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
La cosa mi viene molto con le RE...(inutile la premessa che sono molto
alle prime armi con RE e Python)
Qesito perchè se eseguo questo codice

>>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
>>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")

Click to expand...

Click to expand...

ottengo come output:

>>>> ['MINUTE', 'HOUR', 'SECOND']

Click to expand...

Click to expand...

e non come mi aspettavo:

>>>> ['3 MINUTE', '22 HOUR', '28 SECOND']

Click to expand...

Click to expand...

Saluti e grazie mille...
Alessandro

Xavier Morel · Jan 9, 2006

Alessandro said:
Problema con le RE....
Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
La cosa mi viene molto con le RE...(inutile la premessa che sono molto
alle prime armi con RE e Python)
Qesito perchè se eseguo questo codice

regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")

Click to expand...

Click to expand...

ottengo come output:

['MINUTE', 'HOUR', 'SECOND']

Click to expand...

Click to expand...

e non come mi aspettavo:

['3 MINUTE', '22 HOUR', '28 SECOND']

Click to expand...

Click to expand...

Saluti e grazie mille...
Alessandro

Would probably be slightly easier had you written it in english, but
basically the issue is the matching group.

A match group is defined by the parenthesis in the regular expression,
e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only
that will be returned by a findall.

You need to include the number as well, and you can use a non-grouping
match for the time (with (?: ) instead of () ) to prevent dirtying your
matched groups.

>>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))")

Click to expand...

Click to expand...

Other improvements:
* \d is a shortcut for "any digit" and is therefore equivalent to [0-9]
yet slightly clearer.
* You may use the re.I (or re.IGNORECASE) to match both lower and
uppercase times
* You can easily handle an optional "s"

Improved regex:

>>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I)
>>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS") ['3 HOURS', '22 MINUTES', '28 SECONDS']
>>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS")

Click to expand...

Click to expand...

['1 HOUR', '22 MINUTES', '28 SECONDS']

If you want to learn more about regular expressions, I suggest you to
browse and read http://regular-expressions.info/ it's a good source of
informations, and use the Kodos software which is a quite good Python
regex debugger.

Alessandro · Jan 9, 2006

Thanks for the reply it's ok!!!
The language? I selected the wrong newsgroup in my
newsreader!!!...sorry...

Thanks...

Alessandro...

problema con xslt e caratteri escape	0	Dec 16, 2005
AJAX-DOM: problemi con select dinamiche in IE7	1	Jan 16, 2007
vim e autoindentazione commenti	2	May 11, 2007
problema ultima pagina datagrid asp.net/problem last page datagrid	1	Feb 16, 2005
Master page & treeview opp AJAX ?	3	Jan 29, 2007
file di interscambio 3D	1	Jun 14, 2007
KO mafia CFR ILLUMINATI ! Global Democracy TRIVOLUZIONE ARTSENU COLD FUSION W post OPEC !	0	Feb 22, 2007
KO mafia CFR ILLUMINATI ! Global Democracy TRIVOLUZIONE ARTSENU COLD FUSION W post OPEC !	0	Feb 8, 2007

Problema con le RE....

Alessandro

Xavier Morel

Alessandro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads