Problema con le RE....

A

Alessandro

Problema con le RE....
Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
La cosa mi viene molto con le RE...(inutile la premessa che sono molto
alle prime armi con RE e Python)
Qesito perchè se eseguo questo codice
>>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
>>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")
ottengo come output:
>>>> ['MINUTE', 'HOUR', 'SECOND']

e non come mi aspettavo:
>>>> ['3 MINUTE', '22 HOUR', '28 SECOND']

Saluti e grazie mille...
Alessandro
 
X

Xavier Morel

Alessandro said:
Problema con le RE....
Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
La cosa mi viene molto con le RE...(inutile la premessa che sono molto
alle prime armi con RE e Python)
Qesito perchè se eseguo questo codice
regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")
ottengo come output:
['MINUTE', 'HOUR', 'SECOND']

e non come mi aspettavo:
['3 MINUTE', '22 HOUR', '28 SECOND']

Saluti e grazie mille...
Alessandro
Would probably be slightly easier had you written it in english, but
basically the issue is the matching group.

A match group is defined by the parenthesis in the regular expression,
e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only
that will be returned by a findall.

You need to include the number as well, and you can use a non-grouping
match for the time (with (?: ) instead of () ) to prevent dirtying your
matched groups.
>>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))")

Other improvements:
* \d is a shortcut for "any digit" and is therefore equivalent to [0-9]
yet slightly clearer.
* You may use the re.I (or re.IGNORECASE) to match both lower and
uppercase times
* You can easily handle an optional "s"

Improved regex:
>>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I)
>>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS") ['3 HOURS', '22 MINUTES', '28 SECONDS']
>>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS")
['1 HOUR', '22 MINUTES', '28 SECONDS']

If you want to learn more about regular expressions, I suggest you to
browse and read http://regular-expressions.info/ it's a good source of
informations, and use the Kodos software which is a quite good Python
regex debugger.
 
A

Alessandro

Thanks for the reply it's ok!!!
The language? I selected the wrong newsgroup in my
newsreader!!!...sorry...

Thanks...

Alessandro...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top