Python obfuscation

P

petantik

Are there any commercial, or otherwise obfuscators for python source
code or byte code and what are their relative advantages or
disadvantages. I wonder because there are some byte code protection
available for java and .NET, although from what i've read these seem to
be not comprehensive as protection schemes







http://petantik.blogsome.com - Telling it like it is
 
F

Fredrik Lundh

petantik said:
Are there any commercial, or otherwise obfuscators for python source
code or byte code and what are their relative advantages or
disadvantages. I wonder because there are some byte code protection
available for java and .NET, although from what i've read these seem to
be not comprehensive as protection schemes

hmm. is google down today?

http://www.lysator.liu.se/~astrand/projects/pyobfuscate/

pyobfuscate is a source code obfuscator: It makes Python source code
hard to read for humans, while still being executable for the Python
interpreter.

</F>
 
S

Steve Holden

petantik said:
Are there any commercial, or otherwise obfuscators for python source
code or byte code and what are their relative advantages or
disadvantages. I wonder because there are some byte code protection
available for java and .NET, although from what i've read these seem to
be not comprehensive as protection schemes
Before adding complex protection mechanisms to your code you first need
some code worth protecting, which is to say it should have some novel
features or represent a lot of work that offers useful integrated
functionality for a task or a skill area.

Most inquiries of this nature appear to fall at that first hurdle.

There are things you can do, but I'm always keenly aware that very few
users of a program have both the skills and the inclination to rip off
the code even when the source is distributed as part of the product.
Personally I've never bothered with obfuscation, and prefer to rely on
copyright when I deliver code to customers.

regards
Steve
 
B

bonono

How effective can it be when python is designed to make writing this
kind of code hard(hopefully impossible) ? The most effective would be
renaming function and may be variables but if the functions are kept
short, they would at most looks like haskell ;-)
 
T

The Eternal Squire

Perhaps this could be a PEP:

1) Add a system path for decryption keys.
2) Add a system path for optional decryptors supplied by user
(to satisfy US Export Control)
3) When importing a module try: import routine except importation
error : for all decryptors present for all keys present run decryptor
upon module and retry, finally raise importation error.

With PGP encryption one could encrypt the pyc's with the private key
and sell a public key to the end user.

The Eternal Squire
 
Y

Yu-Xi Lim

Steve said:
Before adding complex protection mechanisms to your code you first need
some code worth protecting, which is to say it should have some novel
features or represent a lot of work that offers useful integrated
functionality for a task or a skill area.

Most inquiries of this nature appear to fall at that first hurdle.

There are things you can do, but I'm always keenly aware that very few
users of a program have both the skills and the inclination to rip off
the code even when the source is distributed as part of the product.
Personally I've never bothered with obfuscation, and prefer to rely on
copyright when I deliver code to customers.

As you said, if you have some novel features, you will need obfuscation.
Copyright doesn't protect the process and patents may take a while. In
the meanwhile, good obfuscation is reasonable protection, imho.

But I think you failed to note that it may not be a novel feature or
useful functionality. In fact, it might be the opposite: a function the
users want removed. A typical example would be a shareware registration
or nag screen. When the users have to start paying, they might then feel
inclied to "rip off the code", or in this case, rip out the code.
 
M

Mike Meyer

How effective can it be when python is designed to make writing this
kind of code hard(hopefully impossible) ? The most effective would be
renaming function and may be variables but if the functions are kept
short, they would at most looks like haskell ;-)

I haven't looked at obfuscator, so I have *no idea* how it works. The
following is how I'd do it.

Step one: globally replace all names in all python module withb names
that are composed of long strings of l, 1, 0 and 0. Fixing
cross-module references should be fun. Don't just make them random -
make them all start with the same sequence, and end with the same
sequence, having differences only in the middle.

Step two: repeat this process for the contents of binary modules, not
neglecting __builtins__. In this case, you probably can't remove the
old names, but you can add new things to the module, and make sure you
only reference those.

I'm not sure how to go about fixing things that are referenced by name
in binary modules. Maybe you'll have to leave those names in the
modules. But you an make sure that all references in Python source use
the new, binary-like names.

<mike
 
P

petantik

Yu-Xi Lim said:
As you said, if you have some novel features, you will need obfuscation.
Copyright doesn't protect the process and patents may take a while. In
the meanwhile, good obfuscation is reasonable protection, imho.

But I think you failed to note that it may not be a novel feature or
useful functionality. In fact, it might be the opposite: a function the
users want removed. A typical example would be a shareware registration
or nag screen. When the users have to start paying, they might then feel
inclied to "rip off the code", or in this case, rip out the code.

This is what I am talking about. If you look at programs written in C,
or others that compile into native binaries, there are many protection
schemes which are mainly used not to protect some novel process but to
ensure that their commercial software remains marketable.

People who download cracks/serial numbers rarely care about copyright.
So when python is used in more commercial software some sort of high
grade obfuscation may be needed. These packers sometimes also have an
embedded compression so that it can decompress the code 'on the fly'
reducing filesizes





http://petantik.blogsome.com - A Lucid Look at Reality
 
S

sjdevnull

Mike said:
Step one: globally replace all names in all python module withb names
that are composed of long strings of l, 1, 0 and 0. Fixing
cross-module references should be fun. Don't just make them random -
make them all start with the same sequence, and end with the same
sequence, having differences only in the middle.

Eliminating the original variable names may be useful in obfuscation,
but this doesn't seem to buy much over just replacing with random
strings; it's trivial to do a similar replacement to go from "10Oll10"
strings to "firstVariable", "secondVariable", etc strings.
 
A

Anand S Bisen

I dont know much !! But if somebody asks me this question my answer
would be to convert some of the meat inside my programs to C/C++ and
then provide the interface to those novel ideas to Python using swig.
And for another level of protection maybe use these offuscator on the
remaining Python source. What do you think ?

Anand S Bisen
 
G

Grant Edwards

I dont know much !! But if somebody asks me this question my
answer would be to convert some of the meat inside my programs
to C/C++ and then provide the interface to those novel ideas
to Python using swig. And for another level of protection
maybe use these offuscator on the remaining Python source.
What do you think ?

Um... sounds like an excellent way to burn hours while
introducing bugs and security problems?
 
C

Carl Friedrich Bolz

hi!

How effective can it be when python is designed to make writing this
kind of code hard(hopefully impossible) ? The most effective would be
renaming function and may be variables but if the functions are kept
short, they would at most looks like haskell ;-)

There just cannot be a python obfuscator that works for a general python
program. The problem is that on the one hand regular strings can be used
to lookup values in namespaces (e.g. with getattr) and on the other hand
the lookup of names can be controlled (e.g. with __getattr__ and
friends). Therefore any string can potentially contain a name that would
have to be changed to keep the code working after obfuscation. For
example how would you automatically obfuscate the following code:


class HelloWorld(object):
def hello(self):
return "world"

def world(self):
return "!"

if __name__ == '__main__':
h = HelloWorld()
s = "hello"
while 1:
f = getattr(h, s, None)
print s,
if f is None:
break
s = f()

While this is surely a contrived case that intentionally mixes names and
strings that are used for something in the application there are also
quite often legitimate use cases for this sort of behaviour. Duck typing
is basically based on this.

Cheers,

Carl Friedrich Bolz
 
A

Alex Martelli

Anand S Bisen said:
I dont know much !! But if somebody asks me this question my answer
would be to convert some of the meat inside my programs to C/C++ and
then provide the interface to those novel ideas to Python using swig.
And for another level of protection maybe use these offuscator on the
remaining Python source. What do you think ?

I think that's feeble protection. If you have valuable code, and
distribute it, people WILL crack it -- just check the warez sites for
experimental proof... EVERYTHING that people are really interested in
DOES get cracked, no matter what tricky machine-code the "protections"
are coded in.

There's ONE way to have uncrackable code -- don't distribute it, but
rather put it up on the net on a well-secured machine under your
control, available as (say) a webservice (subscription-only, pay per
use, or whatever business model you want). You can distribute all the
parts of your app that aren't worth protecting as a "fat client" app (in
Python or whatever) and keep those which ARE worth protecting on the
server that YOU control (and make sure it's very, VERY safe, of course);
and you may write the precious parts in Python, too, no problem.

This is (a minor) one of the many reasons that make webservices the way
of the future (hey, even *MSFT* noticed that recently, it seems...).
There are many other advantages, especially if you keep the clients
thin. The only issue is, your apps will require network connectivity to
execute... but these days, with airlines and train lines busy adding
wi-fi, and towns busily blanketing themselves with free wi-fi, etc, etc,
that's less and less likely to be a big problem...


Alex
 
T

The Eternal Squire

Two things:

1) The decrypted modules should only reside in RAM, never in virtual
memory. Those RAM locations should be rendered inaccessible to Python
code.

2) Only sell to an honest customer willing to be locked into
nondisclosure agreements. This goes back to the maxim of good
salesmanship: Know Your Customer.

By definition, a lock keeps honest people out. The object of a lock is
to make it too expensive for all but the most dishonest, desperate, or
nihilistic to get into the house, because they can always smash a
window or a door open.

IMHO, I have never encountered a dishonest developer or business owner
who at the same time possessed anything remotely resembling a rational
business model. A person who cannot afford to get tools honestly is
seldom able to accomplish anything significant or constructive from a
business point of view with tools obtained dishonestly.

Consider EDA software like Cadence, Matlab, or BEACON that is guarded
by network license servers. The temptation is very strong for an
individual to rip it off, but then consider all the user technical
support and bug fixes that go into the package. Most relatively honest
people see a strong lock and get the message not to try. The others
who may rip off a locked package, but then the package becomes
worthless not because it doesn't work, but because the thief has to
work completely outside the knowledge base that an honest copy has
access to.

I have heard of the warez culture, but it seems to be nihilistic in the
extreme. I don't search for warez, I don't touch warez, and I do not
recommend anyone else to do so, because using it is simply bad business
practice and will get one ostracised by the very people one wants to
sell to. But at the end of the day it seems to serve as an
unauthorized marketing and sales channel to whet the appetites for
people to try the real thing.

The Eternal Squire
 
Y

yepp

The said:
1) The decrypted modules should only reside in RAM, never in virtual
memory. Those RAM locations should be rendered inaccessible to Python
code.

I'm starting to understand why FOSS developers are said to be productive
above the average: they don't have to mess their brains with stuff like
that.
snip

IMHO, I have never encountered a dishonest developer or business owner
who at the same time possessed anything remotely resembling a rational
business model.
Ah, what was the name of that company in ... mh, was it Redmond?

Once you got the model of free and open source software you can't but shake
your head at obfuscating people treating their users as enemies.
Intellectual property suffers in most cases from a significant lack of the
intellectual part.
 
S

Steven D'Aprano

As you said, if you have some novel features, you will need obfuscation.
Copyright doesn't protect the process and patents may take a while. In
the meanwhile, good obfuscation is reasonable protection, imho.

But I think you failed to note that it may not be a novel feature or
useful functionality. In fact, it might be the opposite: a function the
users want removed. A typical example would be a shareware registration
or nag screen. When the users have to start paying, they might then feel
inclied to "rip off the code", or in this case, rip out the code.


Which leads to the important counter-question. Since there is a Python
obfuscator, is there a Python un-obfuscator? I am aware that not all
obfuscations can be reversed, but some can.
 
S

Steven D'Aprano

I'm starting to understand why FOSS developers are said to be productive
above the average: they don't have to mess their brains with stuff like
that.

That's not *quite* true. There are FOSS programs that actually do care
about security. For instance, if you are encrypting data, you don't want
the memory containing the plaintext to be swapped to your swap
partition, where raw disk tools can recover it.

But as a general rule, you're right. If you, the developer, don't have to
think of your users as the enemy, you'd be amazed the amount of make-work
you don't have to do.
 
S

skip

Steven> But as a general rule, you're right. If you, the developer,
Steven> don't have to think of your users as the enemy, you'd be amazed
Steven> the amount of make-work you don't have to do.

+1 QOTW.

Skip
 
P

petantik

Alex said:
I think that's feeble protection. If you have valuable code, and
distribute it, people WILL crack it -- just check the warez sites for
experimental proof... EVERYTHING that people are really interested in
DOES get cracked, no matter what tricky machine-code the "protections"
are coded in.

There's ONE way to have uncrackable code -- don't distribute it, but
rather put it up on the net on a well-secured machine under your
control, available as (say) a webservice (subscription-only, pay per
use, or whatever business model you want). You can distribute all the
parts of your app that aren't worth protecting as a "fat client" app (in
Python or whatever) and keep those which ARE worth protecting on the
server that YOU control (and make sure it's very, VERY safe, of course);
and you may write the precious parts in Python, too, no problem.

This is (a minor) one of the many reasons that make webservices the way
of the future (hey, even *MSFT* noticed that recently, it seems...).
There are many other advantages, especially if you keep the clients
thin. The only issue is, your apps will require network connectivity to
execute... but these days, with airlines and train lines busy adding
wi-fi, and towns busily blanketing themselves with free wi-fi, etc, etc,
that's less and less likely to be a big problem...


Alex

I think that is not workable because it is easy to say the the internet
is available everywhere.

It is not available in developing countries or in rural areas and so
these people who live/work there will never benefit from a webservice
type protection scheme, and what if the network in your area goes down?
bye bye app that I *really* need for tomorrow. Reliability is
important but so is protecting your code in an effective manner

I do believe that you are right about those that crack software for
kicks or money. If you look around at you local market place i'm sure
there are many 'discounted' commercial softwares/games sold. of course
the big software companies might say 'trusted computing will save us'
but I for one will never truly trust it.

Perhaps a comprehensive protection for interpreted languages can never
be built because of their high level nature?
 
B

Ben Sizer

Alex said:
If you have valuable code, and
distribute it, people WILL crack it -- just check the warez sites for
experimental proof... EVERYTHING that people are really interested in
DOES get cracked, no matter what tricky machine-code the "protections"
are coded in.

That is very black and white thinking. It may be true that everything
gets cracked, but there are different degrees to which it might harm
your business model. On top of that, some users may be reluctant to
install binary cracks from obviously disreputable sources. Who knows
what spyware or viruses you could catch? Compare that to the simplicity
and safety of someone posting instructions to "open secure.py in
notepad, and change the 'if license_found:' line to 'if 1:'", for
example. No risk and even less effort than applying a patch.

If someone wants to break into your house, they will get in. But it's
still worth taking some precautions (locks, alarms, whatever) to reduce
the probability.
There's ONE way to have uncrackable code -- don't distribute it, but
rather put it up on the net on a well-secured machine under your
control, available as (say) a webservice (subscription-only, pay per
use, or whatever business model you want).

This is all well and good when:
- web access is free (it's not if you're on dialup, or on a portable
device/phone)
- web access is fast enough (it's not if you're working with certain
types of real-time games or multimedia)
- web access is convenient (it's not if you're behind a restrictive
firewall, or your country/area is poorly connected)

For example, I'd like to write a game in Python. I'd like to give the
game away free and charge for extra content. In C++ I can make it
difficult for users to share content with others who haven't paid for
it, with cryptographic hashes and the like. No, not impossible, but
difficult enough to deter most people. In Python it's much harder, when
the end user can open up the relevant file and quickly remove the
license check. No doubt this is another of the reasons why Python isn't
catching on quickly for game development, sadly.

(I'm not saying this is a deficiency of Python as such. It's just a
comment on the situation.)
This is (a minor) one of the many reasons that make webservices the way
of the future (hey, even *MSFT* noticed that recently, it seems...).

But they are not suitable for all applications, and probably never will
be.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top