extracting a substring

B

b83503104

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
....
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!
 
G

Gary Herron

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I know I should use regular expressions, but I'm not familar with
python, so any quick help would help, such as which commands or idioms
to use. Thanks a lot!
Try this:
>>> import re
>>> pattern = re.compile("a53bc_([0-9]*).txt")
>>>
>>> s = "a53bc_531.txt"
>>> match = pattern.match(s)
>>> if match:
.... print int(match.group(1))
.... else:
.... print "No match"
....
531
Hope that helps,
Gary Herron
 
F

Felipe Almeida Lessa

Em Ter, 2006-04-18 às 17:25 -0700, (e-mail address removed) escreveu:
Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

Some ways:

1) Regular expressions, as you said:
from re import compile
find = compile("a53bc_([1-9]*)\\.txt").findall
find('a53bc_531.txt\na53bc_2285.txt\na53bc_359.txt')
['531', '2285', '359']

2) Using ''.split:
[x.split('.')[0].split('_')[1] for x in 'a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

3) Using indexes (be careful!):
[x[6:-4] for x in 'a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt'.splitlines()]
['531', '2285', '359']

Measuring speeds:

$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = "a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt"' 'find(s)'
100000 loops, best of 3: 3.03 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x.split('.')[0].split('_')[1] for x in
s.splitlines()]"
100000 loops, best of 3: 7.64 usec per loop

$ python2.4 -m timeit -s 's = "a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n"[:-1]' "[x[6:-4] for x in s.splitlines()]"
100000 loops, best of 3: 2.47 usec per loop


$ python2.4 -m timeit -s 'from re import compile; find =
compile("a53bc_([1-9]*)\\.txt").findall; s = ("a53bc_531.txt
\na53bc_2285.txt\na53bc_359.txt\n"*1000)[:-1]' 'find(s)'
1000 loops, best of 3: 1.95 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x.split('.')[0].split('_')[1] for x
in s.splitlines()]"
100 loops, best of 3: 6.51 msec per loop

$ python2.4 -m timeit -s 's = ("a53bc_531.txt\na53bc_2285.txt
\na53bc_359.txt\n" * 1000)[:-1]' "[x[6:-4] for x in s.splitlines()]"
1000 loops, best of 3: 1.53 msec per loop


Summary: using indexes is less powerful than regexps, but faster.

HTH,
 
K

Kent Johnson

Hi,
I have a bunch of strings like
a53bc_531.txt
a53bc_2285.txt
...
a53bc_359.txt

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

In that case a fixed slice will do what you want:

In [1]: s='a53bc_531.txt'

In [2]: s[6:-4]
Out[2]: '531'

Kent
 
R

rx

and I want to extract the numbers 531, 2285, ...,359.

One thing for sure is that these numbers are the ONLY part that is
changing; all the other characters are always fixed.

I'm not sure about what you mean by "always fixed" but I guess it means that
you have n files with a fixed start and a changing ending, and m files with
a fixed start and a changing ending, ....

import re
filenames=['ac99_124.txt', 'ac99_344.txt', 'ac99_445.txt']
numbers=[]
for i in filenames:
numbers.append(int(re.compile('[^_]*_(?P<number>[^.]*).txt').match(i).group('number')))



this sets numbers to: [124, 344, 445]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top