How Python Implements "long integer"?

P

Pedram

Hello,
I'm reading about implementation of long ints in Python. I downloaded
the source code of CPython and will read the longobject.c, but from
where I should start reading this file? I mean which function is the
first?
Anyone can help?
Thanks

Pedram
 
M

Mark Dickinson

Hello,
I'm reading about implementation of long ints in Python. I downloaded
the source code of CPython and will read the longobject.c, but from
where I should start reading this file? I mean which function is the
first?

I don't really understand the question: what do you mean by 'first'?
It might help if you tell us what your aims are.

In any case, you probably also want to look at the Include/
longintrepr.h and Include/longobject.h files.

Mark
 
P

Pedram

I don't really understand the question:  what do you mean by 'first'?
It might help if you tell us what your aims are.

In any case, you probably also want to look at the Include/
longintrepr.h and Include/longobject.h files.

Mark

Thanks for reply,
Sorry I can't explain too clear! I'm not English ;)
But I want to understand the implementation of long int object in
Python. How Python allocates memory and how it implements operations
for this object?
Although, I'm reading the source code (longobject.c and as you said,
longintrepr.h and longobject.h) but if you can help me, I really
appreciate that.

Pedram
 
M

Mark Dickinson

Thanks for reply,
Sorry I can't explain too clear! I'm not English ;)

That's shocking. Everyone should be English. :)
But I want to understand the implementation of long int object in
Python. How Python allocates memory and how it implements operations
for this object?

I'd pick one operation (e.g., addition), and trace through the
relevant functions in longobject.c. Look at the long_as_number
table to see where to get started.

In the case of addition, that table shows that the nb_add slot is
given by long_add. long_add does any necessary type conversions
(CONVERT_BINOP) and then calls either x_sub or x_add to do the real
work.
x_add calls _PyLong_New to allocate space for a new PyLongObject, then
does the usual digit-by-digit-with-carry addition. Finally, it
normalizes
the result (removes any unnecessary zeros) and returns.

As far as memory allocation goes: almost all operations call
_PyLong_New at some point. (Except in py3k, where it's a bit more
complicated because small integers are cached.)

If you have more specific questions I'll have a go at answering them.

Mark
 
P

Pedram

That's shocking.  Everyone should be English. :)

Yes, I'm trying :)
I'd pick one operation (e.g., addition), and trace through the
relevant functions in longobject.c.  Look at the long_as_number
table to see where to get started.

In the case of addition, that table shows that the nb_add slot is
given by long_add.  long_add does any necessary type conversions
(CONVERT_BINOP) and then calls either x_sub or x_add to do the real
work.
x_add calls _PyLong_New to allocate space for a new PyLongObject, then
does the usual digit-by-digit-with-carry addition.  Finally, it
normalizes
the result (removes any unnecessary zeros) and returns.

As far as memory allocation goes: almost all operations call
_PyLong_New at some point.  (Except in py3k, where it's a bit more
complicated because small integers are cached.)

Oh, I didn't see long_as_number before. I'm reading it. That was very
helpful, thanks.
If you have more specific questions I'll have a go at answering them.

Mark

Thank you a million.
I will write your name in my "Specially thanks to" section of my
article (In font size 72) ;)

Pedram
 
P

Pablo Torres N.

I don't really understand the question:  what do you mean by 'first'?
It might help if you tell us what your aims are.

I think he means the entry point, problem is that libraries have many.
 
P

Pedram

Hello again,
This time I have a simple C question!
As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
found PyObject_NEW_VAR in objimpl.h header file. But I can't
understand the last line :( Here's the code:

#define PyObject_NEW_VAR(type, typeobj, n) \
( (type *) PyObject_InitVar( \
(PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
(n)) ),\
(typeobj), (n)) )

I know this will replace the PyObject_New_VAR(type, typeobj, n)
everywhere in the code and but I can't understand the last line, which
is just 'typeobj' and 'n'! What do they do? Are they make any sense in
allocation process?
 
A

Aahz

This time I have a simple C question!
As you know, _PyLong_New returns the result of PyObject_NEW_VAR. I
found PyObject_NEW_VAR in objimpl.h header file. But I can't
understand the last line :( Here's the code:

#define PyObject_NEW_VAR(type, typeobj, n) \
( (type *) PyObject_InitVar( \
(PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),
(n)) ),\
(typeobj), (n)) )

I know this will replace the PyObject_New_VAR(type, typeobj, n)
everywhere in the code and but I can't understand the last line, which
is just 'typeobj' and 'n'! What do they do? Are they make any sense in
allocation process?

Look in the code to find out what PyObject_InitVar() does -- and, more
importantly, what its signature is. The clue you're missing is the
trailing backslash on the third line, but that should not be required if
you're using an editor that shows you matching parentheses.
 
P

Pedram

Look in the code to find out what PyObject_InitVar() does -- and, more
importantly, what its signature is.  The clue you're missing is the
trailing backslash on the third line, but that should not be required if
you're using an editor that shows you matching parentheses.

No, they wrapped the 3rd line!

I'll show you the code in picture below:
http://lh3.ggpht.com/_35nHfALLgC4/SlDVMEl6oOI/AAAAAAAAAKg/vPWA1gttvHM/s640/Screenshot.png

As you can see the PyObject_MALLOC has nothing to do with typeobj and
n in line 4.
 
P

Pedram

OK, fine, I read longobject.c at last! :)
I found that longobject is a structure like this:

struct _longobject {
struct _object *_ob_next;
struct _object *_ob_prev;
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
digit ob_digit[1];
}

And a digit is a 15-item array of C's unsigned short integers.
Am I right? Or I missed something! Is this structure is constant in
all environments (Linux, Windows, Mobiles, etc.)?
 
M

Mark Dickinson

OK, fine, I read longobject.c at last! :)
I found that longobject is a structure like this:

struct _longobject {
    struct _object *_ob_next;
    struct _object *_ob_prev;

For current CPython, these two fields are only present in debug
builds; for a normal build they won't exist.
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;

You're missing an important field here (see the definition of
PyObject_VAR_HEAD):

Py_ssize_t ob_size; /* Number of items in variable part */

For the current implementation of Python longs, the absolute value of
this field gives the number of digits in the long; the sign gives the
sign of the long (0L is represented with zero digits).
    digit ob_digit[1];

Right. This is an example of the so-called 'struct hack' in C; it
looks as though there's just a single digit, but what's intended here
is that there's an array of digits tacked onto the end of the struct;
for any given PyLongObject, the size of this array is determined at
runtime. (C99 allows you to write this as simply ob_digit[], but not
all compilers support this yet.)
And a digit is a 15-item array of C's unsigned short integers.

No: a digit is a single unsigned short, which is used to store 15 bits
of the Python long. Python longs are stored in sign-magnitude format,
in base 2**15. So each of the base 2**15 'digits' is an integer in
the range [0, 32767). The unsigned short type is used to store those
digits.

Exception: for Python 2.7+ or Python 3.1+, on 64-bit machines, Python
longs are stored in base 2**30 instead of base 2**15, using a 32-bit
unsigned integer type in place of unsigned short.
Is this structure is constant in
all environments (Linux, Windows, Mobiles, etc.)?

I think it would be dangerous to rely on this struct staying constant,
even just for CPython. It's entirely possible that the representation
of Python longs could change in Python 2.8 or 3.2. You should use the
public, documented C-API whenever possible.

Mark
 
P

Pedram

Hello Mr. Dickinson. Glad to see you again :)

For current CPython, these two fields are only present in debug
builds;  for a normal build they won't exist.

I couldn't understand the difference between them. What are debug
build and normal build themselves? And You mean in debug build
PyLongObject is a doubly-linked-list but in normal build it is just an
array (Or if not how it'll store in this mode)?
You're missing an important field here (see the definition of
PyObject_VAR_HEAD):

    Py_ssize_t ob_size; /* Number of items in variable part */

For the current implementation of Python longs, the absolute value of
this field gives the number of digits in the long;  the sign gives the
sign of the long (0L is represented with zero digits).

Oh, you're right. I missed that. Thanks :)
    digit ob_digit[1];

Right.  This is an example of the so-called 'struct hack' in C; it
looks as though there's just a single digit, but what's intended here
is that there's an array of digits tacked onto the end of the struct;
for any given PyLongObject, the size of this array is determined at
runtime.  (C99 allows you to write this as simply ob_digit[], but not
all compilers support this yet.)

WOW! I didn't know anything about 'struct hacks'! I read about them
and they were very wonderful. Thanks for your point. :)
}
And a digit is a 15-item array of C's unsigned short integers.

No: a digit is a single unsigned short, which is used to store 15 bits
of the Python long.  Python longs are stored in sign-magnitude format,
in base 2**15.  So each of the base 2**15 'digits' is an integer in
the range [0, 32767).  The unsigned short type is used to store those
digits.

Exception: for Python 2.7+ or Python 3.1+, on 64-bit machines, Python
longs are stored in base 2**30 instead of base 2**15, using a 32-bit
unsigned integer type in place of unsigned short.
Is this structure is constant in
all environments (Linux, Windows, Mobiles, etc.)?

I think it would be dangerous to rely on this struct staying constant,
even just for CPython.  It's entirely possible that the representation
of Python longs could change in Python 2.8 or 3.2.  You should use the
public, documented C-API whenever possible.

Mark

Thank you a lot Mark :)
 
E

Eric Wong

Pedram said:
Hello Mr. Dickinson. Glad to see you again :)



I couldn't understand the difference between them. What are debug
build and normal build themselves? And You mean in debug build
PyLongObject is a doubly-linked-list but in normal build it is just an
array (Or if not how it'll store in this mode)?
we use the macro Py_TRACE_REFS to differ the code for debug build and
normal build, that's to say, in debug build and normal build the codes
are actually *different*. In debug build, not only PyLongObject but
all Objects are linked by a doubly-linked-list and it can make the
debug process less painful. But in normal build, objects are
seperated! After an object is created, it will never be moved, so we
can and should refer to an object only by it's address(pointer).
There's no one-big-container like a list or an array for objects.
 
M

Mark Dickinson

I couldn't understand the difference between them. What are debug
build and normal build themselves? And You mean in debug build
PyLongObject is a doubly-linked-list but in normal build it is just an
array (Or if not how it'll store in this mode)?

No: a PyLongObject is stored the same way (ob_size giving sign and
number of digits, ob_digit giving the digits themselves) whether or
not a debug build is in use.

A debug build does various things (extra checks, extra information) to
make it easier to track down problems. On Unix-like systems, you can
get a debug build by configuring with the --with-pydebug flag.

The _ob_next and _ob_prev fields have nothing particularly to do with
Python longs; for a debug build, these two fields are added to *all*
Python objects, and provide a doubly-linked list that links all 'live'
Python objects together. I'm not really sure what, if anything, the
extra information is used for within Python---it might be used by some
external tools, I guess.

Have you looked at the C-API documentation?

http://docs.python.org/c-api/index.html

_ob_next and _ob_prev are described here:

http://docs.python.org/c-api/typeobj.html#_ob_next

(These docs are for Python 2.6; I'm not sure what version you're
working with.)

Mark
 
P

Pedram

No:  a PyLongObject is stored the same way (ob_size giving sign and
number of digits, ob_digit giving the digits themselves) whether or
not a debug build is in use.

A debug build does various things (extra checks, extra information) to
make it easier to track down problems.  On Unix-like systems, you can
get a debug build by configuring with the --with-pydebug flag.

The _ob_next and _ob_prev fields have nothing particularly to do with
Python longs; for a debug build, these two fields are added to *all*
Python objects, and provide a doubly-linked list that links all 'live'
Python objects together.  I'm not really sure what, if anything, the
extra information is used for within Python---it might be used by some
external tools, I guess.

Have you looked at the C-API documentation?

http://docs.python.org/c-api/index.html

_ob_next and _ob_prev are described here:

http://docs.python.org/c-api/typeobj.html#_ob_next

(These docs are for Python 2.6;  I'm not sure what version you're
working with.)

Mark

It seems there's an island named Python!
Thanks for links, I'm on reading them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top