regular expression for integer and decimal numbers

G

gary

I want to pick all intergers and decimal numbers out of a string.
Would this be the most correct regular expression to use?

"\d+\.?\d*"
 
A

Andrew Durdin

I want to pick all intergers and decimal numbers out of a string.
Would this be the most correct regular expression to use?

"\d+\.?\d*"

That will work for numbers such as 0123 12.345 12. 0.5 -- but it
won't work for the following:
0x12AB .5 10e-3 -15 123L
If you want to handle some of those, then you'll need a more complicated regex.
If you want to accept numbers of the form .5 but don't care about 12.
then a better regex would be
\d*\.?\d+
 
A

Andrew Dalke

Andrew said:
That will work for numbers such as 0123 12.345 12. 0.5 -- but it
won't work for the following:
0x12AB .5 10e-3 -15 123L

This will handle the normal floats including a leading + or -
and trailing exponent, all optional.

r"[+-]?((\d+(\.\d*)?)|\.\d+)([eE][+-]?[0-9]+)?"

Andrew
(e-mail address removed)
 
P

Peter Hansen

gary said:
I want to pick all intergers and decimal numbers out of a string.
Would this be the most correct regular expression to use?

"\d+\.?\d*"

Examples, including the most extreme cases you want to handle,
are always a good idea.

-Peter
 
G

gary

Peter Hansen said:
Examples, including the most extreme cases you want to handle,
are always a good idea.

-Peter

Here is an example of what I will be dealing with:
"""
TOTAL FIRST DOWNS 19 21
By Rushing 11 6
By Passing 6 10
By Penalty 2 5
THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
TOTAL NET YARDS 379 271
Total Offensive Plays (inc. times thrown passing) 58 63
Average gain per offensive play 6.5 4.3
NET YARDS RUSHING 264 115
"""

I can only hope that they were nice and put a leading zero in front of
numbers less than 1.
 
B

Bengt Richter

Here is an example of what I will be dealing with:
"""
TOTAL FIRST DOWNS 19 21
By Rushing 11 6
By Passing 6 10
By Penalty 2 5
THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
TOTAL NET YARDS 379 271
Total Offensive Plays (inc. times thrown passing) 58 63
Average gain per offensive play 6.5 4.3
NET YARDS RUSHING 264 115
"""

I can only hope that they were nice and put a leading zero in front of
numbers less than 1.

Are you sure you want to throw away all the info implicit in the structure of that data?
How about the columns? Will you get other input with more columns? Otherwise if your
numeric fields are as they appear, maybe just
... for a in s.split():
... if not a[0].isdigit(): continue
... if a.endswith('%'):
... for i in map(int,a[:-1].split('-')): yield i
... elif '.' in a: yield float(a)
... else: yield int(a)
... ... """
... TOTAL FIRST DOWNS 19 21
... By Rushing 11 6
... By Passing 6 10
... By Penalty 2 5
... THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
... FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
... TOTAL NET YARDS 379 271
... Total Offensive Plays (inc. times thrown passing) 58 63
... Average gain per offensive play 6.5 4.3
... NET YARDS RUSHING 264 115
... """
... ) ...
19 21 11 6 6 10 2 5 4 11 36 6 14 43 0 1 0 0 0 0 379 271 58 63 6.5 4.3 264 115

But I doubt that's what you really want ;-)

Regards,
Bengt Richter
 
P

Peter Hansen

gary said:
Here is an example of what I will be dealing with:
"""
TOTAL FIRST DOWNS 19 21
By Rushing 11 6
By Passing 6 10
By Penalty 2 5
THIRD DOWN EFFICIENCY 4-11-36% 6-14-43%
FOURTH DOWN EFFICIENCY 0-1-0% 0-0-0%
TOTAL NET YARDS 379 271
Total Offensive Plays (inc. times thrown passing) 58 63
Average gain per offensive play 6.5 4.3
NET YARDS RUSHING 264 115
"""

I can only hope that they were nice and put a leading zero in front of
numbers less than 1.

Good example of the input. Now all you need to do is tell
us exactly what kind of output you would expect to come
from the routine which you seek. ;-)

-Peter
 
G

gary

Are you sure you want to throw away all the info implicit in the structure of that data?
How about the columns? Will you get other input with more columns?

There are several other instances in the files that I am extracting
data from where the numbers are not so nicely arranged in columns, so
I am really looking for something that could be used in all instances.
(http://www.nfl.com/gamecenter/gamebook/NFL_20020929_TEN@OAK)

I do however still need to convert everything from string to numbers.
I was thinking about using the following for that unless someone has a
better solution:
.... try: return int(str)
.... except ValueError:
.... try: return float(str)
.... except ValueError: return str
statlist = ['10', '6', '2002', 'tampa bay buccaneers', 'atlanta
falcons', 'the georgia dome', '1', '03', 'pm', 'est', 'artificial',
'0', '3', '7', '10', '0', '20', '3', '0', '3', '0', '0', '6', '15',
'14', '5', '2', '9', '10', '1', '2', '4', '13', '31', '3', '14', '21',
'1', '1', '100', '0', '1', '0', '327', '243', '59', '64', '5.5',
'3.8', '74', '70', '26', '22', '2.8', '3.2', '2', '3', '2', '3',
'253', '173', '2', '8', '4', '14', '261', '187', '31', '17', '1',
'38', '17', '4', '7.7', '4.1', '5', '3', '0', '3', '2', '2', '5',
'43.2', '5', '45.6', '0', '0', '0', '0', '0', '0', '31.2', '41.6',
'50', '40', '0', '0', '3', '40', '0', '0', '5', '120', '4', '50', '1',
'0', '6', '35', '6', '41', '1', '1', '0', '0', '2', '0', '0', '0',
'1', '0', '1', '0', '2', '2', '0', '0', '2', '2', '0', '0', '2', '2',
'2', '3', '0', '2', '0', '0', '2', '0', '0', '1', '0', '0', '0', '0',
'0', '0', '20', '6', '29', '34', '30', '26', '3', '37', '9', '59',
'9', '35', '6', '23', 0, 0, '11', '23', '5', '01', '5', '25', '8',
'37', 0, 0, '26']
[StrToNum(item) for item in statlist]
[10, 6, 2002, 'tampa bay buccaneers', 'atlanta falcons', 'the georgia
dome', 1, 3, 'pm', 'est', 'artificial', 0, 3, 7, 10, 0, 20, 3, 0, 3,
0, 0, 6, 15, 14, 5, 2, 9, 10, 1, 2, 4, 13, 31, 3, 14, 21, 1, 1, 100,
0, 1, 0, 327, 243, 59, 64, 5.5, 3.7999999999999998, 74, 70, 26, 22,
2.7999999999999998, 3.2000000000000002, 2, 3, 2, 3, 253, 173, 2, 8, 4,
14, 261, 187, 31, 17, 1, 38, 17, 4, 7.7000000000000002,
4.0999999999999996, 5, 3, 0, 3, 2, 2, 5, 43.200000000000003, 5,
45.600000000000001, 0, 0, 0, 0, 0, 0, 31.199999999999999,
41.600000000000001, 50, 40, 0, 0, 3, 40, 0, 0, 5, 120, 4, 50, 1, 0, 6,
35, 6, 41, 1, 1, 0, 0, 2, 0, 0, 0, 1, 0, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0,
2, 2, 2, 3, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 20, 6, 29, 34,
30, 26, 3, 37, 9, 59, 9, 35, 6, 23, 0, 0, 11, 23, 5, 1, 5, 25, 8, 37,
0, 0, 26]

Another thing was that I found a negative number which kinds screws up
the regex's previously disscussed. So I came up with a workaround
below:.... FGs - PATs Had Blocked 0-0 0-0
.... Net Punting Average -6.3 33.3
.... TOTAL RETURN YARDAGE (Not Including Kickoffs) 14 257
.... No. and Yards Punt Returns 1-14 2-157
.... """['0', '0', '0', '0', '-6.3', '33.3', '14', '257', '1', '14', '2',
'157']
[StrToNum(item) for item in teamstats]
[0, 0, 0, 0, -6.2999999999999998, 33.299999999999997, 14, 257, 1, 14,
2, 157]

Gary
 
G

gary

Peter Hansen said:
Good example of the input. Now all you need to do is tell
us exactly what kind of output you would expect to come
from the routine which you seek. ;-)

-Peter

Well for that particular example something of the form...

Cleveland at Cincinnati +8

would be nice ;-)
 
P

Peter Hansen

gary said:
Well for that particular example something of the form...

Cleveland at Cincinnati +8

would be nice ;-)

I know nothing about American football except that it
isn't played with a puck, so I don't think I get the joke...

-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top