python coding contest

P

Peter Hansen

Has anyone studied if farmers like him are in general healthier ?

Yes, they have stronger arms, but some greater incidence of back
problems as well. Other than that they're basically just as healthy as
the rest of us who feed our pigs the normal way.

-Peter
 
C

Claudio Grondi

Claudio said:
It's a funny story :))

, but in my eyes not appropriate in given context, because my prior goal
is to understand some more about Python internals, i.e. what is and if
it is at all a differerence between the internal representation of a
string and a long integer.

I know, I should probably look into the C source of Python, but I
suppose it could be too hard for me to find the appropriate piece of
code, so I welcome any hints.

If I knew the internal representation of string and long integer objects
and were able to read/write to memory and point an identifier at a given
memory address, a conversion between long integer and string types were
probably nothing else as changing some bytes and repointing an
identifier (assuming that string and long integer values are in memory
the same data if they represent the same value). That it would save CPU
time is secondary here, but with increasing costs of energy making the
number on my electrical power bill higher each year due to higher power
consumption with increasing number of programs I run (it makes a
difference of 50 Watt between an algorithm keeping the CPU 100% busy and
an algorithm using only 1% of it) it is not necessarily paranoia driving
one to consider potential savings of CPU time.
In this context the example of the bigdec class comes to my mind, where
usage of another algorithm made it possible to cut down power
consumption and time of printing a decimal form of the largest known
prime number from 7 hours of a 100% busy CPU down to 7 seconds!

Claudio

From stringobject.h :
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;

From longobject.h :
typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */
From longintrepr.h :
typedef unsigned short digit;
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};

From this I mean to see, that the string data header is longer
containing an additional long and an int compared to long integer data
header. The long integer seem to be an array of unsigned short values
and the string an array of characters. My MSDN help tells me, that:
"Type short int (or simply short) is an integral type that is larger
than or equal to the size of type char, and shorter than or equal to the
size of type int." In Microsoft Visual C++ short is 2 bytes long, and
because I am on an Intel processor the sequence of bytes will differ
from what I will get when creating a binary string out of a long integer
due to swapping of byte order. Am I right here?

My conclusion is, that beeing on e.g. Motorola 68000 based processors
the actual data behind the string and the long integer types will be the
same, but beeing on an Intel 386 series processor based systems the
actual data stored in memory behind a string and long integer will differ.

So finally the idea of converting binary strings forth and back to long
integers without looping over the data content does for sure not work on
Intel processor based systems, but may work on processors which don't
need swapped bytes for integer operations ( I found this page
http://www.cs.princeton.edu/~kazad/resources/cs/endian.htm gives a good
overview about the problems with byte swapping, is there any better
reference related to this subject online? ).

As Python tries to be multiplatform, I don't understand why it stores
long integers as an array of shorts and not as an array of chars?
Historical reasons? Speed reasons? Any other reasons?

Wouldn't it be more intuitive to have in Python a data type beeing an
object with binary data which can be easily 'casted' to both string and
long integer depending how it should be processed or used?
Do you see any advantage to have such data type, or am I quite alone
with this idea?

Claudio
 
M

Michael Spencer

Claudio said:
...I analysed the outcome of it and have
come to the conclusion, that there were two major factors which
contributed to squeezing of code:

(1). usage of available variants for coding of the same thing
(2). sqeezing the size of used numeric and string literals
....not only squeezing the size of the literals, but the combined size of the
compressed data and the code to expand it. In this respect it turned out to be
a surprisingly rewarding challenge, and a nice reinforcement of the Pythonic
mantra of seeking performance gains by optimizing algorithms.

Michael
 
C

Claudio Grondi

Michael said:
....not only squeezing the size of the literals, but the combined size
of the compressed data and the code to expand it. In this respect it
turned out to be a surprisingly rewarding challenge, and a nice
reinforcement of the Pythonic mantra of seeking performance gains by
optimizing algorithms.

Michael
You are right. I have overseen that aspect which makes the contest a
kind of Python scripting language related compression challenge.

I haven't done it yet, but I consider it interesting to check out how
compression algorithms are doing on the three lines 7 segment LCD
mimicking text representations of a decimal number generated by
seven_seg() compared to size of input string plus the size of the
module. I suppose, as the input string is not of minimal possible size
compared to the information it carries, compression schemes should be
able to beat the 'compression' by seven_seg().

Interesting in this context for me is also, that if I understand it
right, according to Kolmogorov complexity law, it will be never possible
to state if the achieved solution is the shortest possible as it will be
also not possible for any other provided (shorter) solution.

Claudio
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top