lstrip problem - beginner question

M

mstagliamonte

Hi everyone,

I am a beginner in python and trying to find my way through... :)

I am writing a script to get numbers from the headers of a text file.

If the header is something like:
h01 = ('>scaffold_1')
I just use:
h01.lstrip('>scaffold_')
and this returns me '1'

But, if the header is:
h02: ('>contig-100_0')
if I use:
h02.lstrip('>contig-100_')
this returns me with: ''
....basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('>contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!

Why is this happening? What am I missing?

Thanks for your help and attention
Max
 
M

mstagliamonte

Hi everyone,



I am a beginner in python and trying to find my way through... :)



I am writing a script to get numbers from the headers of a text file.



If the header is something like:

h01 = ('>scaffold_1')

I just use:

h01.lstrip('>scaffold_')

and this returns me with '1'



But, if the header is:

h02: ('>contig-100_1')

if I use:

h02.lstrip('>contig-100_')

this returns me with: ''

...basically nothing. What surprises me is that if I do in this other way:

h02b = h02.lstrip('>contig-100')

I get h02b = ('_1')

and subsequently:

h02b.lstrip('_')

returns me with: '1' which is what I wanted!



Why is this happening? What am I missing?



Thanks for your help and attention

Max
 
M

mstagliamonte

Hi everyone,



I am a beginner in python and trying to find my way through... :)



I am writing a script to get numbers from the headers of a text file.



If the header is something like:

h01 = ('>scaffold_1')

I just use:

h01.lstrip('>scaffold_')

and this returns me '1'



But, if the header is:

h02: ('>contig-100_0')

if I use:

h02.lstrip('>contig-100_')

this returns me with: ''

...basically nothing. What surprises me is that if I do in this other way:

h02b = h02.lstrip('>contig-100')

I get h02b = ('_1')

and subsequently:

h02b.lstrip('_')

returns me with: '1' which is what I wanted!



Why is this happening? What am I missing?



Thanks for your help and attention

Max

edit: h02: ('>contig-100_1')
 
M

mstagliamonte

Hi everyone,



I am a beginner in python and trying to find my way through... :)



I am writing a script to get numbers from the headers of a text file.



If the header is something like:

h01 = ('>scaffold_1')

I just use:

h01.lstrip('>scaffold_')

and this returns me '1'



But, if the header is:

h02: ('>contig-100_0')

if I use:

h02.lstrip('>contig-100_')

this returns me with: ''

...basically nothing. What surprises me is that if I do in this other way:

h02b = h02.lstrip('>contig-100')

I get h02b = ('_1')

and subsequently:

h02b.lstrip('_')

returns me with: '1' which is what I wanted!



Why is this happening? What am I missing?



Thanks for your help and attention

Max

edit: h02= ('>contig-100_1')
 
M

mstagliamonte

Hi everyone,



I am a beginner in python and trying to find my way through... :)



I am writing a script to get numbers from the headers of a text file.



If the header is something like:

h01 = ('>scaffold_1')

I just use:

h01.lstrip('>scaffold_')

and this returns me '1'



But, if the header is:

h02: ('>contig-100_0')

if I use:

h02.lstrip('>contig-100_')

this returns me with: ''

...basically nothing. What surprises me is that if I do in this other way:

h02b = h02.lstrip('>contig-100')

I get h02b = ('_1')

and subsequently:

h02b.lstrip('_')

returns me with: '1' which is what I wanted!



Why is this happening? What am I missing?



Thanks for your help and attention

Max

edit:
h02= ('>contig-100_1')
 
F

Fábio Santos

edit: h02: ('>contig-100_1')

You don't have to use ('..') to declare a string. Just 'your string' will
do.

You can use str.split to split your string by a character.

(Not tested)

string_on_left, numbers = '>contig-100_01'.split('-')
left_number, right_number = numbers.split('_')
left_number, right_number = int(left_number), int(right_number)

Of course, you will want to replace the variable names.

If you have more advanced parsing needs, you will want to look at regular
expressions or blobs.
 
M

MRAB

Hi everyone,

I am a beginner in python and trying to find my way through... :)

I am writing a script to get numbers from the headers of a text file.

If the header is something like:
h01 = ('>scaffold_1')
I just use:
h01.lstrip('>scaffold_')
and this returns me '1'

But, if the header is:
h02: ('>contig-100_0')
if I use:
h02.lstrip('>contig-100_')
this returns me with: ''
...basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('>contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!

Why is this happening? What am I missing?
The methods 'lstrip', 'rstrip' and 'strip' don't strip a string, they
strip characters.

You should think of the argument as a set of characters to be removed.

This code:

h01.lstrip('>scaffold_')

will return the result of stripping the characters '>', '_', 'a', 'c',
'd', 'f', 'l', 'o' and 's' from the left-hand end of h01.

A simpler example:
'abc'

It strips the characters 'x' and 'y' from the string, not the string
'xy' as such.

They are that way because they have been in Python for a long time,
long before sets and such like were added to the language.
 
M

mstagliamonte

You don't have to use ('..') to declare a string. Just 'your string' willdo.

You can use str.split to split your string by a character.

(Not tested)

string_on_left, numbers = '>contig-100_01'.split('-')

left_number, right_number = numbers.split('_')

left_number, right_number = int(left_number), int(right_number)

Of course, you will want to replace the variable names.

If you have more advanced parsing needs, you will want to look at regularexpressions or blobs.

Thanks, I will try it straight away. Still, I don't understand why the original command is returning me with nothing !? Have you got any idea?
I am trying to understand a bit the 'nuts and bolts' of what I am doing andthis result does not make any sense to me

Regards
Max
 
P

Peter Otten

mstagliamonte said:
Hi everyone,

I am a beginner in python and trying to find my way through... :)

I am writing a script to get numbers from the headers of a text file.

If the header is something like:
h01 = ('>scaffold_1')
I just use:
h01.lstrip('>scaffold_')
and this returns me '1'

But, if the header is:
h02: ('>contig-100_0')
if I use:
h02.lstrip('>contig-100_')
this returns me with: ''
...basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('>contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!

Why is this happening? What am I missing?

"abba".lstrip("ab")

does not remove the prefix "ab" from the string "abba". Instead it removes
chars from the beginning until it encounters one that is not in "ab". So

t = s.lstrip(chars_to_be_removed)

is roughly equivalent to

t = s
while len(t) > 0 and t[0] in chars_to_be_removed:
t = t[1:]

If you want to remove a prefix use

s = "abba"
prefix = "ab"
if s.startswith(prefix):
s = s[len(prefix):]
 
M

mstagliamonte

The methods 'lstrip', 'rstrip' and 'strip' don't strip a string, they

strip characters.



You should think of the argument as a set of characters to be removed.



This code:



h01.lstrip('>scaffold_')



will return the result of stripping the characters '>', '_', 'a', 'c',

'd', 'f', 'l', 'o' and 's' from the left-hand end of h01.



A simpler example:




'abc'



It strips the characters 'x' and 'y' from the string, not the string

'xy' as such.



They are that way because they have been in Python for a long time,

long before sets and such like were added to the language.

Hey,

Great! Now I understand!
So, basically, it is also stripping the numbers after the '_' !!

Thank you, I know a bit more now!

Have a nice day everyone :)
Max
 
J

John Gordon

In said:
Hi everyone,
I am a beginner in python and trying to find my way through... :)
I am writing a script to get numbers from the headers of a text file.
If the header is something like:
h01 = ('>scaffold_1')
I just use:
h01.lstrip('>scaffold_')
and this returns me '1'
But, if the header is:
h02: ('>contig-100_0')
if I use:
h02.lstrip('>contig-100_')
this returns me with: ''
...basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('>contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!
Why is this happening? What am I missing?

It's happening because the argument you pass to lstrip() isn't an exact
string to be removed; it's a set of individual characters, all of which
will be stripped out.

So, when you make this call:

h02.lstrip('>contig-100_')

You're telling python to remove all of the characters in '>contig-100_' from
the base string, which leaves nothing remaining.

The reason it "worked" on your first example was that the character '1'
didn't occur in your sample header string 'scaffold_'.

If the underscore character is always the separating point in your headers,
a better way might be to use the split() method instead of lstrip().
 
M

Mark Lawrence

On 04/06/2013 16:49, mstagliamonte wrote:

[strip the double line spaced nonsense]

Can you please check your email settings. It's bad enough being plagued
with double line spaced mail from google, having it come from yahoo is
just adding insult to injury, thanks :)

--
"Steve is going for the pink ball - and for those of you who are
watching in black and white, the pink is next to the green." Snooker
commentator 'Whispering' Ted Lowe.

Mark Lawrence
 
M

mstagliamonte

Thanks to everyone! I didn't expect so many replies in such a short time!

Regards,
Max
 
D

Dave Angel

On 04/06/2013 16:49, mstagliamonte wrote:

[strip the double line spaced nonsense]

Can you please check your email settings. It's bad enough being plagued
with double line spaced mail from google, having it come from yahoo is
just adding insult to injury, thanks :)

Mark:
The OP is posting from googlegroups, just using a yahoo return address.
So you just have one buggy provider to hate, not two.

(e-mail address removed):

If you must use googlegroups, at least fix the double-posting and
double-spacing bugs it has. Start by reading:

http://wiki.python.org/moin/GoogleGroupsPython
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top