Extract the numeric and alphabetic part from an alphanumeric string

  • Thread starter Sandhya Prabhakaran
  • Start date
S

Sandhya Prabhakaran

Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using123

To get the alphabetic part, I could doACTGAAC
But when I giveTraceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object

How do I blank out the initial numeric part so as to get just the
alphabetic part. The string is always in the same format.

Please help.

Regards,
Sandhya
 
P

Peter Brett

Sandhya Prabhakaran said:
Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using
123

To get the alphabetic part, I could do
ACTGAAC
But when I give
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object

How do I blank out the initial numeric part so as to get just the
alphabetic part. The string is always in the same format.

Firstly, you really should read the Regular Expression HOWTO:

http://docs.python.org/howto/regex.html#regex-howto

Secondly, is this what you wanted to do?
'ACTGAAC'

Regards,

Peter
 
A

Andreas Tawn

Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using
123

To get the alphabetic part, I could do
ACTGAAC
But when I give
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object

How do I blank out the initial numeric part so as to get just the
alphabetic part. The string is always in the same format.

Please help.

Regards,
Sandhya

If the format's always the same, you could use slicing instead.
s = '123ACTGAAC'
s[:3] '123'
s[3:]
'ACTGAAC'

BTW, you should avoid using built-ins like str for variable names. Bad
things will happen.

Cheers,

Drea
 
M

MRAB

Sandhya said:
Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using
123
[snip]

I get:

['123']

which is a _list_ of the strings found.
 
K

Kushal Kumaran

Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using
123

The docs for re.findall say that it returns a list of matches. So
'123' will be numer[0].
To get the alphabetic part, I could do
ACTGAAC
But when I give
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: expected a character buffer object

That's what would happen if you pass in a list instead of a string to replace.
 
D

Dennis Lee Bieber

Hi,

I have a string as str='123ACTGAAC'.

I need to extract the numeric part from the alphabetic part which I
did using
123

<snip>

Did you really cut&paste that from an interpreter window? I doubt
it...
str = "123ACTGAAC"
import re
numer = re.findall(r"\d+", str)
numer ['123']

Compare... YOU claim to have gotten an INTEGER (there are no quotes
around the output value). I get a one element LIST containing a STRING
value.
numer[0] '123'
int(numer[0]) 123
How do I blank out the initial numeric part so as to get just the
alphabetic part. The string is always in the same format.
And that format is?

Given just your example, one could interpret it to be: 3 digits
followed by 7 alphabetic characters. For that, I'd be using a simple

nmr = str[:3] #still in character representation
str = str[3:]

Or do you mean a variable width integer field followed by a variable
width alpha field?
str2 = "4328ABcde"
num2 = re.findall(r"\d+", str2)
num2 ['4328']
str[len(numer[0]):] 'ACTGAAC'
str2[len(num2[0]):] 'ABcde'
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
A

alex23

Sandhya Prabhakaran said:
I have a string as str='123ACTGAAC'.

You shouldn't use 'str' as a label like that, it prevents you from
using the str() function in the same body of code.
How do I blank out the initial numeric part so as to get just the
alphabetic part. The string is always in the same format.
('123', 'ACTGAAC')

If by 'always in the same format' you mean the positions of the
numbers & alphas,
you could slightly abuse the struct module:
('123', 'ACTGAAC')

But seriously, you should use slicing:
sample = '123ACTGAAC'
sample[0:3], sample[3:]
('123', 'CTGAAC')

You can also label the slices, which can be handy for self-documenting
your code:
num = slice(3)
alp = slice(4,10)
sample[num], sample[alp]
('123', 'CTGAAC')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,734
Messages
2,569,441
Members
44,832
Latest member
GlennSmall

Latest Threads

Top