Optimizing list processing

S

Steven D'Aprano

I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])
table = [i for x,i in table]

The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
for x, i in table:
table = x

For giant iterables (ten million items), this version is a big
improvement, about three times faster than the list comp version. Since
we're talking about the difference between 4 seconds and 12 seconds (plus
an additional 40-80 seconds of general slow-down as the computer pages
memory into and out of virtual memory), this is a good, solid
optimization.

Except that for more reasonably sized iterables, it's a pessimization.
With one million items, the ratio is the other way around: the list comp
version is 2-3 times faster than the in-place version. For smaller lists,
the ratio varies, but the list comp version is typically around twice as
fast. A good example of trading memory for time.

So, ideally I'd like to write my code like this:


table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x

where ????? no doubt will depend on how much memory is available in one
contiguous chunk.

Is there any way to determine which branch I should run, apart from hard-
coding some arbitrary and constant cut-off value?
 
M

MRAB

I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])
table = [i for x,i in table]

The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()

This looks wrong to me:
for x, i in table:
table = x


Couldn't it replace an item it'll need later on?

Let me see if I can find an example where it would fail.

Start with:

Sort it and you get:
[('a', 1), ('b', 0)]

Run that code:
table = x


Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
for x, i in table:
ValueError: need more than 1 value to unpack

Why did it fail?
[('a', 1), 'a']

The 2 methods give different results anyway: the first returns a list
of indexes, and the second returns a list of items from the iterable.

[snip]
 
D

duncan smith

I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])
table = [i for x,i in table]

The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
for x, i in table:
table = x

For giant iterables (ten million items), this version is a big
improvement, about three times faster than the list comp version. Since
we're talking about the difference between 4 seconds and 12 seconds (plus
an additional 40-80 seconds of general slow-down as the computer pages
memory into and out of virtual memory), this is a good, solid
optimization.

Except that for more reasonably sized iterables, it's a pessimization.
With one million items, the ratio is the other way around: the list comp
version is 2-3 times faster than the in-place version. For smaller lists,
the ratio varies, but the list comp version is typically around twice as
fast. A good example of trading memory for time.

So, ideally I'd like to write my code like this:


table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x

where ????? no doubt will depend on how much memory is available in one
contiguous chunk.

Is there any way to determine which branch I should run, apart from hard-
coding some arbitrary and constant cut-off value?


I had a slightly similar problem a while ago. I actually wanted to
process data from a large file in sorted order. In the end I read chunks
of data from the file, sorted them, then wrote each chunk of data to a
temporary file. Then I used heapq.merge to merge the data in the
temporary files. It vastly reduced memory consumption, and was 'quick
enough'. It was based on Guido's solution for sorting a million 32-bit
integers in 2MB of RAM
(http://neopythonic.blogspot.co.uk/2008/10/sorting-million-32-bit-integers-in-2mb.html).
Cheers.

Duncan
 
S

Steven D'Aprano

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()

This looks wrong to me:
for x, i in table:
table = x


Yes, you're right, I over-simplified the example, and in doing so
introduced a bug. What I actually use is:

for i, x in enumerate(table):
table = x[1]

modulo any further bugs introduced while copying and pasting :)
 
M

MRAB

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()

This looks wrong to me:
for x, i in table:
table = x


Yes, you're right, I over-simplified the example, and in doing so
introduced a bug. What I actually use is:

for i, x in enumerate(table):
table = x[1]

modulo any further bugs introduced while copying and pasting :)

How does this compare:

table = list(iterable)
indexes = list(range(len(table)))
indexes.sort(key=table.__getitem__)
 
T

Terry Reedy

I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom.

It is a non-standard version thereof, as DSU usually means to decorate
with a key that gets discarded.

A couple of examples of input and expected output would have been good ;-).
The algorithm looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])

This makes two temporaries when only one is needed, and shows why we
added generator expressions.

table = sorted((x, i) for i,x in enumerate(iterable))
is equivalent to the table; table.sort lines below.

The following (untested) should be faster as it avoids tuple unpacking
and repacking.

from itertools import count
table = sorted(t for t in zip(iterable, count))
table = [i for x,i in table]

Did your original un-simplified use zip instead enumerate? Table now has
the original index of the items in iterable sorted by the items value.
The use for this is not obvious.
The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

With 1/3 saved by using a genex, it saves 1/2 of the remainder.
table = [(x, i) for i,x in enumerate(iterable)]
table.sorted()

(You meant table.sort().) This is an expansion of sorted(genex). It
might be slightly faster as list comp may be faster than list(equivalent
genex).
for x, i in table:
table = x


I cannot understand what you are aiming for here. Besides the fact that
this does not work, it keeps x values rather than i indexes as before.
table = [i for x,i in table]

done in place is

for j, (x,i) in enumerate(table):
table[j] = i

I expect the list comp be faster than in-place as long as the result
list can be allocated and held in memory without paging. (This of course
depends on system memory and other memory uses.) List iterators have a
__length_hint__ method giving the length of the underlying list, so the
list comp result list can potentially be allocated once and then filled
in by enumeration and replacement, but in C rather than Python code.
 
S

Stefan Behnel

Terry Reedy, 12.12.2013 03:26:
from itertools import count
table = sorted(t for t in zip(iterable, count))

This is a useless use of a generator expression. sorted(zip(...)) is enough
(and likely to be substantially faster also).

Stefan
 
S

Steven D'Aprano

It is a non-standard version thereof, as DSU usually means to decorate
with a key that gets discarded.

I do decorate with a key that gets discarded. The actual values that are
being sorted are the enumerated values 0, 1, 2, ... and the temporary key
values come from the iterable.

Confused yet? :)

A couple of examples of input and expected output would have been good
;-).

Only if I wanted you to understand the algorithm, which isn't really
important. My question is about swapping between in-place modifications
and external copying. If I wanted to be really mysterious (annoying), I
could have just give pseudo-code :)

Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:

if some condition to do with the computer's available memory:
make modifications in place
else:
make a copy of the data containing the modifications


if only I knew what that condition was and how to test for it.

The algorithm looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])

This makes two temporaries when only one is needed, and shows why we
added generator expressions.

You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions. For example, when passed to str.join, list comps
are (on my computer) consistently 15% faster:

[steve@ando ~]$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 100 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join([chr(n) for n in range(1,
201)])"
10000 loops, best of 3: 94.1 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 116 usec per loop
[steve@ando ~]$ python3.3 -m timeit "''.join(chr(n) for n in range(1,
201))"
10000 loops, best of 3: 109 usec per loop


table = sorted((x, i) for i,x in enumerate(iterable)) is equivalent to
the table; table.sort lines below.

The following (untested) should be faster as it avoids tuple unpacking
and repacking.


I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.

[...]
I expect the list comp be faster than in-place as long as the result
list can be allocated and held in memory without paging. (This of course
depends on system memory and other memory uses.)
Exactly!


List iterators have a
__length_hint__ method giving the length of the underlying list, so the
list comp result list can potentially be allocated once and then filled
in by enumeration and replacement, but in C rather than Python code.

I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.

But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.



P.S. The algorithm I'm working on is a way of generating index and rank
tables. Not that it really matters -- what matters is determining whether
or not to shift from "make a copy of the list" to "modify the list in
place".
 
C

Chris Angelico

P.S. The algorithm I'm working on is a way of generating index and rank
tables. Not that it really matters -- what matters is determining whether
or not to shift from "make a copy of the list" to "modify the list in
place".

So you're currently looking at...

if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x


Can I throw a spanner [1] in the works with other suggestions to try timing?

table[:] = [i for x,i in table] # Does slice assignment get optimized?

SPLIT = 1048576 # Pick some useful cutoff
for part in range(0,len(table),SPLIT):
table[part:part+SPLIT] = [i for x,i in table[part:part+SPLIT]]

If slice assignment is reasonably fast (which I suspect it is), the
one-liner should be practically identical to your small-table
one-liner. Then if the million-record splits can be done inside
memory, it ought to be possible to do this in comparable time, even if
the total table length is huge. The choice of SPLIT would then matter
a lot less than the cutoff that you're trying to find; you might have
been able to do it in half the number of sections, but that won't make
as big a difference as suddenly paging out. Ideally, what I'd like to
see is that a higher SPLIT improves performance slightly until it gets
too high, at which point you go bust and the dealer wins... but the
critical word there is "slightly", meaning that it wouldn't cost too
much for SPLIT to be lower than it needs to be. That's the criteria
for the experiment; do you have the data on which to try it?

[1] Monkey wrench, for the Yanks in the room

ChrisA
 
M

MRAB

P.S. The algorithm I'm working on is a way of generating index and rank
tables. Not that it really matters -- what matters is determining whether
or not to shift from "make a copy of the list" to "modify the list in
place".

So you're currently looking at...

if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x


Can I throw a spanner [1] in the works with other suggestions to try timing?

table[:] = [i for x,i in table] # Does slice assignment get optimized?

[snip]

If you're trying that, you could also try:

table[:] = (i for x,i in table)
 
C

Chris Angelico

table[:] = [i for x,i in table] # Does slice assignment get optimized?
[snip]

If you're trying that, you could also try:

table[:] = (i for x,i in table)

Right, a tweak which could be applied also to the split version. Thanks MRAB.

ChrisA
 
P

Peter Otten

Steven said:
I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom. The algorithm
looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])
table = [i for x,i in table]

The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
for x, i in table:
table = x

For giant iterables (ten million items), this version is a big
improvement, about three times faster than the list comp version. Since
we're talking about the difference between 4 seconds and 12 seconds (plus
an additional 40-80 seconds of general slow-down as the computer pages
memory into and out of virtual memory), this is a good, solid
optimization.

Except that for more reasonably sized iterables, it's a pessimization.
With one million items, the ratio is the other way around: the list comp
version is 2-3 times faster than the in-place version. For smaller lists,
the ratio varies, but the list comp version is typically around twice as
fast. A good example of trading memory for time.

So, ideally I'd like to write my code like this:


table = [(x, i) for i,x in enumerate(iterable)]
table.sort()
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x

where ????? no doubt will depend on how much memory is available in one
contiguous chunk.

Is there any way to determine which branch I should run, apart from hard-
coding some arbitrary and constant cut-off value?


How about using two lists?

keys = list(iterable)
values = range(len(keys))
values.sort(key=keys.__getitem__)
del keys

The intention is to save memory used for the 2-tuples; I don't know if they
pop up elsewhere.
 
T

Terry Reedy

Terry Reedy, 12.12.2013 03:26:

This is a useless use of a generator expression. sorted(zip(...)) is enough
(and likely to be substantially faster also).

Yes, definitely, and thank you. I did not squeeze enough.
 
T

Terry Reedy

Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:

if some condition to do with the computer's available memory:
make modifications in place
else:
make a copy of the data containing the modifications

if only I knew what that condition was and how to test for it.

In other words, you want a magic formula that depends on just about
everything: the (static) memory in the machine, the other programs
running, the memory otherwise used by the current program, the number
items, the nature of the items, and possibly the memory fragmentation state.
You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions.

Stefan noted that neither is needed if zip produces the items wanted.
I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.

Why would you spend an hour or more of your time to save 5 seconds?
Theoretical interest or a practical problem with enough runs to turns
seconds into hours or days? A non-trivial practical answer will require
more info about the practical context, including the distribution of
machine memory sizes and of problem sizes.
I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.

But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.

The simplest answer is to always avoid catastrophe by always modifying
the one list. For 'short' lists, the time difference will be relatively
small.

If the fraction of short lists is large enough to make the cumulative
time difference large enough, and the list object sizes are more or less
constant or at least bounded, a possible answer is to pick a minimum
memory size, get a machine with that size, and work out a list size
small enough to avoid thrashing.
 
S

Steven D'Aprano

In other words, you want a magic formula that depends on just about
everything: the (static) memory in the machine, the other programs
running, the memory otherwise used by the current program, the number
items, the nature of the items, and possibly the memory fragmentation
state.

Yes. Surely there's a way to check whether allocating a list of N items
will succeed without paging, without actually having to allocate a list
of N items?

Back in the 1980s, when I was a Mac user and occasional programmer, there
were memory manager routines which (if I recall correctly) would tell you
whether or not an allocation would succeed or not. Macs of that vintage
didn't have virtual memory, it is true, but the principle remains sound:
"is there a single contiguous block of N bytes available? if so, use a
list comprehension, if not, switch to an in-place algorithm". Until
Python, I had never used a programming language that didn't give you at
least a rough-and-ready way to determine how much memory was available so
you could make an informed guess as to whether an expensive operation
would succeed before actually attempting it.

I will be very disappointed (but hardly surprised) to learn that
functionality which was standard in 1984 computers is not possible in
2013.

But I don't *specifically* care about measuring memory size, only as a
proxy for whether or not I should switch algorithms. That's why this post
is called "Optimizing list processing" rather than "Finding out how much
memory is available". If there's another way to solve the same problem,
I'm happy to hear it.


[...]
Why would you spend an hour or more of your time to save 5 seconds?

For a number of reasons:

- Trading off increased memory use for faster code is a good optimization
technique so long as it actually does result in faster code. I measured,
and discovered that there comes a point where it results in serious
performance degradation. Not just a little bit slower by 5% or 10%, but
nearly 300% slower. We're told not to optimize until we've measured. Well
I've measured. Swapping algorithms to in-place for large enough lists is
a BIG time saver.

- It's not just the five seconds that I save, but the 30 or 50 seconds of
general slowdown of *everything* on the computer as it frantically pages
blocks of memory. And not just on *this* computer, if I'm running the
code on a server it will effect all the other machines relying on that
server. That five seconds of time for a single process may affect all the
users of that machine for a minute or two before the system settles down
to normal performance again.

Paging is *deadly* for performance -- moving memory around is typically
O(N**2), and hard drive speeds are typically millions of times slower
than RAM speeds. Here is a graphical view of the latency of various media:

http://i.imgur.com/X1Hi1.gif


[...]
The simplest answer is to always avoid catastrophe by always modifying
the one list. For 'short' lists, the time difference will be relatively
small.

It seems perverse to avoid writing idiomatic, and usually faster, Python
code just because occasionally it performs badly. What you describe is a
last resort.
 
C

Chris Angelico

Back in the 1980s, when I was a Mac user and occasional programmer, there
were memory manager routines which (if I recall correctly) would tell you
whether or not an allocation would succeed or not. Macs of that vintage
didn't have virtual memory, it is true, but the principle remains sound:
"is there a single contiguous block of N bytes available? if so, use a
list comprehension, if not, switch to an in-place algorithm". Until
Python, I had never used a programming language that didn't give you at
least a rough-and-ready way to determine how much memory was available so
you could make an informed guess as to whether an expensive operation
would succeed before actually attempting it.

I will be very disappointed (but hardly surprised) to learn that
functionality which was standard in 1984 computers is not possible in
2013.

The situation is a bit different. Virtualization of memory makes it a
lot harder to figure out what can/can't be allocated without paging.
As long as you're okay with it being really rough, you could probably
get some info from the system. On my Linux box, /proc/meminfo tells
me:

MemTotal: 16452036 kB
MemFree: 1689636 kB
Buffers: 251240 kB
Cached: 6228724 kB

That suggests I can allocate 1.5GB + .25GB + 6.25GB = 8GB, roughly,
before hitting the swapper. (Buffers and Cached represent memory that
will be reclaimed to satisfy other requests, but is being used - most
of it caching stuff off the primary filesystem.) In very VERY round
figures, that would be how much it ought to be safe to allocate. Now,
how that compares with your list requirements is a separate question.
I've no idea how many list elements would fit in that 8GB, but it
probably won't be a billion (which would be the theoretical, one
64-bit pointer per element).

This might tie in nicely with the theory I suggested about splitting
the job. You could pick a SPLIT by figuring how much memory is free,
dividing that by your pointer size or other estimate of usage, halving
it because you need two lists, and then halving again for safety.

On this same 64-bit system, sys.getsizeof(list(range(N))) for various
values of N seems to suggest that it approaches 9*N. So by the
guesstimate above, I come up with a plausible SPLIT of 226933 ==
(1689636+251240+6228724)//9//2//2, not forgetting that this is scaled
by one SI unit (so 226 million elements would be safe). Alas, that's
not true, because I just tried that, and he hit the swapper... even
though sys.getsizeof() returned the expected N*9+112 figure.

Which is because I'm an idiot. Of course. The reason it's more is
because I'm creating all those integers too... actually, I'm quite
impressed with that, that's costing almost nothing more.
sys.getsizeof(list(range(100000000))) 900000112
sys.getsizeof([None]*100000000)
800000064

Not quite sure what's going on there. I know those integers are taking
up more than 1 byte of storage. Guess getsizeof() isn't the best way
to calculate. Anyway, a bit of experimentation should tell you how
much you can do safely, and as long as there's a good margin of error,
this would be an extremely rough guide.

Now, getting that info in a cross-platform way... that's the problem!

ChrisA
 
S

Steven D'Aprano

Steven D'Aprano wrote: [...]
So, ideally I'd like to write my code like this:


table = [(x, i) for i,x in enumerate(iterable)] table.sort()
if len(table) < ?????:
table = [i for x,i in table]
else:
for x, i in table:
table = x

where ????? no doubt will depend on how much memory is available in one
contiguous chunk.

Is there any way to determine which branch I should run, apart from
hard- coding some arbitrary and constant cut-off value?


How about using two lists?

keys = list(iterable)
values = range(len(keys))
values.sort(key=keys.__getitem__)
del keys

The intention is to save memory used for the 2-tuples; I don't know if
they pop up elsewhere.


They don't. On the other hand, you've now got two lists where before I
only had one.

In my post, I said that the code I showed was a simplification. What I
actually have is two branches. If the given iterable is a sequence type
(say, a list) I use something quite similar to your code above:

if isinstance(iterable, collections.Sequence):
getitem = iterable.__getitem__
if key is not None:
# Compose the given key with getitem.
f = lambda i: key(getitem(i))
else:
f = getitem
table = range(len(iterable)) # Python 2.x version
table.sort(key=f, reverse=reverse)


which avoids any temporary two-tuples, and creates only the one
additional list. I can't avoid making that list, since it's the result
required. I don't think that this branch of the code can be optimized in
any serious way, I can't see any additional fat to be trimmed off that.
If the input list is so large as to cause thrashing, the only solution is
to buy more memory.

The other branch handles iterators, and I have two versions. The
idiomatic version is faster for small data sets (small in this case
meaning "up to at least one million items on my computer") but ends up
building at least one extra temporary list which is thrown away:

else:
table = [(x, i) for (i, x) in enumerate(iterable)]
table.sort(key=key, reverse=reverse)
table = [i for x, i in table]

I've experimented with slight variations, e.g. using sorted() and a
generator expression instead of a list comp, and the differences are
trivial. The important thing here is that there's a temporary list which
in some cases is *very large* but destined to be thrown away. Having two
very large lists (for large enough input sizes) causes thrashing.

To avoid that thrashing, I have a second version which builds only one
list and then modifies it in place. For very large input sizes (ten
million items) this is significantly faster:

else:
table = [(x, i) for (i, x) in enumerate(iterable)]
table.sort(key=key, reverse=reverse)
for x, i in enumerate(table):
table = x

but much slower, and less idiomatic, for small input sizes.

I don't know of any reasonable way to tell at runtime which of the two
algorithms I ought to take. Hard-coding an arbitrary value
("if len(table) > 5000000") is not the worst idea I've ever had, but I'm
hoping for something more deterministic which can be cheaply calculated
at runtime.
 
R

rusi

I don't know of any reasonable way to tell at runtime which of the two
algorithms I ought to take. Hard-coding an arbitrary value
("if len(table) > 5000000") is not the worst idea I've ever had, but I'm
hoping for something more deterministic which can be cheaply calculated
at runtime.

Have you seen guppy and heapy?

http://guppy-pe.sourceforge.net/

Even if you dont use it directly, it should have syscalls that you can steal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top