how to refer to partial list, slice is too slow?

  • Thread starter =?gb2312?B?yMvR1MLkyNXKx8zs0cSjrM37vKvM7NHEsru8+7z
  • Start date
?

=?gb2312?B?yMvR1MLkyNXKx8zs0cSjrM37vKvM7NHEsru8+7z

I'm a python newbie. It seems the slice operation will do copy.
for example:
a = [1,2,3,4,5,6,7,8,9,0]
b = a[7:]
b [8, 9, 0]
a.remove(9)
a [1, 2, 3, 4, 5, 6, 7, 8, 0]
b
[8, 9, 0]

if the list have large members, the slice operations will consume many
times.
for instance, I have a long string named it as S, the size is more
than 100K
I want to parser it one part-to-part. first, I process the first 100
byte, and pass the remainder to the next parser function. I pass the
S[100:] as an argument of the next parser function. but this operation
will cause a large bytes copy. Is there any way to just make a
reference to the remainder string not copy?
 
?

=?gb2312?B?yMvR1MLkyNXKx8zs0cSjrM37vKvM7NHEsru8+7z

I make a sample here for the more clearly explanation

s = " ..... - this is a large string data - ......."

def parser1(data)
# do some parser
...
# pass the remainder to next parser
parser2(data[100:])

def parser2(data)
# do some parser
...
# pass the remainder to next parser
parser3(data[100:])

def parser3(data)
# do some parser
...
# pass the remainder to next parser
parser4(data[100:])

....
 
?

=?GB2312?B?Ik1hcnRpbiB2LiBMbyJ3aXMi?=

ÈËÑÔÂäÈÕÊÇÌìÑÄ£¬Íû¼«ÌìÑIJ»¼û¼Ò said:
I'm a python newbie. It seems the slice operation will do copy.
for example:
a = [1,2,3,4,5,6,7,8,9,0]
b = a[7:]
b [8, 9, 0]
a.remove(9)
a [1, 2, 3, 4, 5, 6, 7, 8, 0]
b
[8, 9, 0]

if the list have large members, the slice operations will consume many
times.
for instance, I have a long string named it as S, the size is more
than 100K
I want to parser it one part-to-part. first, I process the first 100
byte, and pass the remainder to the next parser function. I pass the
S[100:] as an argument of the next parser function. but this operation
will cause a large bytes copy. Is there any way to just make a
reference to the remainder string not copy?

You can use itertools.islice:

py> a = [1,2,3,4,5,6,7,8,9,0]
py> b = itertools.islice(a, 7)
py> b
<itertools.islice object at 0xb7d9c34c>
py> b.next()
1
py> b.next()
2
py> b.next()
3
py> b.next()
4
py> b.next()
5
py> b.next()
6
py> b.next()
7
py> b.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

HTH,
Martin
 
M

Marc 'BlackJack' Rintsch

I make a sample here for the more clearly explanation

s = " ..... - this is a large string data - ......."

def parser1(data)
# do some parser
...
# pass the remainder to next parser
parser2(data[100:])

def parser2(data)
# do some parser
...
# pass the remainder to next parser
parser3(data[100:])

def parser3(data)
# do some parser
...
# pass the remainder to next parser
parser4(data[100:])

...

Do you need the remainder within the parser functions? If not you could
split the data into chunks of 100 bytes and pass an iterator from function
to function. Untested:

def iter_chunks(data, chunksize):
offset = chunksize
while True:
result = data[offset:eek:ffset + chunksize]
if not result:
break
yield result


def parser1(data):
chunk = data.next()
# ...
parser2(data)


def parser2(data):
chunk = data.next()
# ...
parser3(data)

# ...

def main():
# Read or create data.
# ...
parser1(iter_chunks(data, 100))

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top