Python obfuscation

T

The Eternal Squire

Come on! I was only just trying to accomodate the OP with a plausible
method to fit his business model, based on techniques passed on to me
by my various teachers at school and my senseis at workplaces..
Please don't judge me for attempting to pass on experience. It's his
choice.

While I'd like to figure out myself a nice software package to write
and market and earn a good living now that I've walked away from the
rat race, I can also see myself having humanity as my client (the FOSS
model).

The Eternal Squire
 
B

Bill Mill

But they are not suitable for all applications, and probably never will
be.

Your only solution, then, is to write unpopular code. Because, as Alex
said, it will otherwise be broken into. Let's look at two very popular
pieces of code: Half-Life 2 and Windows XP. How are they secured?
Previous version of these software products used sophisticated
client-side programming to try and be secure, but the security was
nonexistant. Users share keys and cracks with each other.

Now, both of these programs require verification (phone and/or web) to
be used. The only truly secure method of assuring that they're not
used in ways you don't intend is to require the user to contact you to
use it, and that's a deal with the devil. One you might need to make
if security is that important to you, as Microsoft and Valve have
decided it is, but it's a deal with the devil nonetheless.

Peace
Bill Mill
bill.mill at gmail.com
 
M

Mike Meyer

petantik said:
Perhaps a comprehensive protection for interpreted languages can never
be built because of their high level nature?

Nah. Compiling/interpreting is an implementation detail, and
orthogonal to the issue of "high level". There are compilers for high
level languages, and interpreters for low level languages. At the
lowest level, a machine emulator is an interpreter for machine code,
which is the lowest level most programmers deal with (at least I think
it is....).

If you really wanted "compiler-like" security for Python, you could
write a Python compiler. There have been posts about a compiler that
generated C++ recently, though it's still under development, and I
haven't followed it closely. You might also consider retargetting one
of the existing Python compilers to your architecture of choice, or to
another language. You might also consider translating Python to a
language with similar capabilities for which a compiler exists, like
Common LISP.

Of course, once you've got machine code, it doesn't matter how high
level the source was. That may make getting the source back harder,
but people who are cracking your program don't want to do that - they
just want to find the place where the security happens, and either
figure out the input that will make it happy, or invert the behavior
after a test.

<mike
 
S

Steven D'Aprano

Perhaps a comprehensive protection for interpreted languages can never
be built because of their high level nature?

Dude, a comprehension protection for *any* software can never be built
because of the fundamental nature of computers. Trying to stop bytes from
being copyable is like trying to stop water from being wet, and once
copied, all copies are identical and therefore indistinguishable.

It isn't a matter of protecting software or data. It is a question of how
hard do you want to make it for people to copy/crack? That itself has
costs, costs of time, space, complexity, bugs, lost opportunities,
customer dissatisfaction, and even legality.

Sony has just found that out: having been caught installing root-kits on
peoples computers, they are now being sued.
 
T

Timothy Smith

Reliability is
important but so is protecting your code in an effective manner
there is no way to prevent people disassembling your code compiled or
otherwise. once you give then the program they can easily take it apart.
no if's, no but's; do NOT rely on binary's for security.
the big software companies might say 'trusted computing will save us'
but I for one will never truly trust it.
trusted computing it about your computer not trusting YOU. the computer
you pay for will decied based on some other company's whim's what you
are and are not allowed to do.
Perhaps a comprehensive protection for interpreted languages can never
be built because of their high level nature?
i repeat. there is no such thing as protected code. i've seen people de
construct exe's written in C.
 
M

Mike Meyer

Ben Sizer said:
For example, I'd like to write a game in Python. I'd like to give the
game away free and charge for extra content. In C++ I can make it
difficult for users to share content with others who haven't paid for
it, with cryptographic hashes and the like. No, not impossible, but
difficult enough to deter most people. In Python it's much harder, when
the end user can open up the relevant file and quickly remove the
license check. No doubt this is another of the reasons why Python isn't
catching on quickly for game development, sadly.

What makes you think this is the case? There are ways to distribute
Python modules so that the user can't just open them in a text
editor. There are also ways to get cryptographic security for
distributed modules. Yes, if you use the same methods you use in C++,
it's "much harder". But by the same token, if you tried to use the
methods you'd use in a Python program in C++, you'd find that the C++
version was "much harder".

Of course, as Alex pointed out, all of these are just keeping honest
people honest. The crooks have all the advantages in this game, so you
really can't expect to win.

Not that I'm convinced that putting everything on a "secure server" is
proof against getting your code stolen. Last time I was involved with
security people, it was commonly acknowledge that there were two types
of security people: those who knew when their systems were last broken
into, and those who didn't. Source - and other things - gets stolen
from "secure servers" on a regular basis, and those machines don't
have to provide some way for the potential thieves to execute the
code. But at least with this model, some of the advantages are on your
side, so you stand a fighting chance.

<mike
 
C

Carsten Haese

Dude, a comprehension protection for *any* software can never be built
because of the fundamental nature of computers. Trying to stop bytes from
being copyable is like trying to stop water from being wet, and once
copied, all copies are identical and therefore indistinguishable.

+1 QOTW!
 
Y

Yu-Xi Lim

Bill said:
Your only solution, then, is to write unpopular code. Because, as Alex
said, it will otherwise be broken into. Let's look at two very popular
pieces of code: Half-Life 2 and Windows XP. How are they secured?
Previous version of these software products used sophisticated
client-side programming to try and be secure, but the security was
nonexistant. Users share keys and cracks with each other.

and

Mike said:
> What makes you think this is the case? There are ways to distribute
> Python modules so that the user can't just open them in a text
> editor. There are also ways to get cryptographic security for
> distributed modules. Yes, if you use the same methods you use in C++,
> it's "much harder". But by the same token, if you tried to use the
> methods you'd use in a Python program in C++, you'd find that the C++
> version was "much harder".
>
> Of course, as Alex pointed out, all of these are just keeping honest
> people honest. The crooks have all the advantages in this game, so you
> really can't expect to win.


Funny you should mention Half-Life 2. I actually went out and bought
Half-Life 2 from the store instead of waiting for a crack to be released
(the unique scheme they used meant that crackers will take a little
longer than usual). I really wanted to play this game (i.e., it's very
popular) and couldn't wait.

My brother is bugged by Civilization IV's copy protection. A couple of
days ago, after consulting me on what other options he could try, he
finally said in frustration, "Maybe I should go buy the game."

This is a personal anecdote, but I'm sure it applies to at least some
people. Obviously I'm not an honest person. But I'm not so against
spending money on software that I won't buy it if there's a pretty good
copy protection system on it. The "keeping honest people honest"
argument is simplistic and as Ben said, "black and white thinking".

Ben's analogy of the house is not a perfect example, but it's still a
fair one. You know that if some one really wants to break into your
house, he will get in, regardless of your sophisticated laser trip wire
system, ex-SAS guards, and genetically-engineered guard dogs. But as
long as the cost of protection is less than the cost of the item you're
protecting (multiplied by the relevant probabilities, factoring
recurring costs, etc), it's worthwhile to spend money on protection. If
that fails, then you will of course fall back on the law, but you still
try to prevent it from happening in the first place.

I do believe that code obfuscation and copy protection measures work, to
a limited extent. Few software companies believe that their copy
protection will be uncrackable (though their marketing droids may say
otherwise), but are most willing to invest in it to at least temporarily
stave off the piracy.

Distribution of python modules as compiled bytecode is a limited form of
obfuscation. Some believe it's enough. But if there's a free obfuscator
out there than can increase the difficulty of reverse engineering, why
not use that too? Costs you nothing, and may get you a customer or two
more before some one manages to crack that.

Obfuscation has it's place. It's not the final solution for software
protection (and there probably isn't one), but it is one more lock you
can use to deter or delay theives. You can't expect to win against
determined theives, but you can remove as many advantages that they have.

> Now, both of these programs require verification (phone and/or web) to
> be used. The only truly secure method of assuring that they're not
> used in ways you don't intend is to require the user to contact you to
> use it, and that's a deal with the devil. One you might need to make
> if security is that important to you, as Microsoft and Valve have
> decided it is, but it's a deal with the devil nonetheless.

This seems to be opposite to what you said in the previous paragraph.
Contacting and verifying with the company every time you use the
software is obviously not "the only truly secure method", since there
are cracks and keys floating around. It is also not quite as evil as it
may seem, since authorization is only required on initial use (and
online gaming).
 
M

Mike Meyer

Yu-Xi Lim said:
This is a personal anecdote, but I'm sure it applies to at least some
people. Obviously I'm not an honest person. But I'm not so against
spending money on software that I won't buy it if there's a pretty
good copy protection system on it. The "keeping honest people honest"
argument is simplistic and as Ben said, "black and white thinking".

And how much software is out there that you actually want so badly
that you'll buy it rather than wait unti it's cracked? Does it make up
a significant portion of the software you use? If not, then you as an
example of not merely "keeping honest people honest" are that it's
difference from reality is insignificant.
Ben's analogy of the house is not a perfect example, but it's still a
fair one. You know that if some one really wants to break into your
house, he will get in, regardless of your sophisticated laser trip
wire system, ex-SAS guards, and genetically-engineered guard dogs. But
as long as the cost of protection is less than the cost of the item
you're protecting (multiplied by the relevant probabilities, factoring
recurring costs, etc), it's worthwhile to spend money on
protection. If that fails, then you will of course fall back on the
law, but you still try to prevent it from happening in the first place.

Sounds like you just said that manufacturers should improve their
protection until they aren't making any profit on the product. That's
silly. The goal isn't to maximize protection, it's to maximize
profit. That means it only makes sense to spend money on better
protection if the cost of the protection is less than the expected
profit from adding it. The cost of the item you're protecting is
irrelevant. The cost of adding copy protection is *noticably* more
than the cost of the copy protection bits. A recent, heavily
publicized case where Sony added copy protection to a product cost
them sales, and from what I've heard, even legal fees.
I do believe that code obfuscation and copy protection measures work,
to a limited extent. Few software companies believe that their copy
protection will be uncrackable (though their marketing droids may say
otherwise), but are most willing to invest in it to at least
temporarily stave off the piracy.

Anything at all acts in the "keeping honest people honest"
capacity. It also delays the inevitable cracking - which is all you
can do. The only thing spending more on it does is lengthen the
delay. Hard data on how many sales that extra delay is responsible for
is, by it's very nature, impossible to come by. You've provided
anecdotal evidence that copy protection can improve sales. I've
provided anecdotal evidence that adding copy protection cost sales.
Distribution of python modules as compiled bytecode is a limited form
of obfuscation. Some believe it's enough. But if there's a free
obfuscator out there than can increase the difficulty of reverse
engineering, why not use that too? Costs you nothing, and may get you
a customer or two more before some one manages to crack that.

Um, if you think adding steps to the release process costs you
nothing, you don't understand the release process. If you've got a way
to obfuscate the code that doesn't require extra steps in the release
or development process, I'd love to hear about it.

<mike
 
A

Alex Martelli

petantik said:
I think that is not workable because it is easy to say the the internet
is available everywhere.

This implies that, if it were difficult to say it, then the scheme WOULD
be workable... which I doubt is what you mean, of course;-)

It is not available in developing countries or in rural areas and so

Things are getting better all the time in these respects - and they will
keep getting better, quite apart from "web apps", because access to
information is MUCH more precious than mere computation.
these people who live/work there will never benefit from a webservice
type protection scheme,

It's debatable whether the customer BENEFITS from having their ability
to run an application RESTRICTED. It appears that the trend (in
developing countries even more than in rich ones) is towards using open
source, anyway.
and what if the network in your area goes down?
bye bye app that I *really* need for tomorrow. Reliability is

But the risk of your specific MACHINE going down is much higher than
that of the NET going down in all of its forms at once! If I rely on a
web app, and need to use it tonight to have something ready tomorrow,
then if my machine goes down (or I suffer a power brown-out in my area,
an occurrence that is frequent in many developing countries, and not
unheard of in developed ones), then I stand a chance to rush elsewhere,
to a library, town hall, internet cafe, or ANY other location where I
may be able to grab a machine, ANY machine, connect to the net, identify
and authenticate myself, and keep using that crucial web app. If said
app is well designed and mature, it will have autosaved most of my work
up to the point of my machine's crash (or the area brown-out, etc), too.

The importance of reliability speaks in FAVOUR of keeping important
stuff on the internet, rather than on unreliable, crash-prone local
machines (...and when's the last time you did a full backup of all of
your work with all proper precautions...? For most users, "never" --
for users of web apps hosted on well-maintained sites, on the other
hand, backups ARE taken care of, professionally and properly!).

important but so is protecting your code in an effective manner

There is no effective manner of protecting your code, except running it
only on well-secured machines you control yourself. If you distribute
your code, in ANY form, and it's at all interesting to people with no
interest in respecting the law, then, it WILL be cracked (and if users
choose to respect the law, then you need no "protecting").

I do believe that you are right about those that crack software for
kicks or money. If you look around at you local market place i'm sure
there are many 'discounted' commercial softwares/games sold. of course
the big software companies might say 'trusted computing will save us'
but I for one will never truly trust it.

Perhaps a comprehensive protection for interpreted languages can never
be built because of their high level nature?

Many, perhaps most, of those cracked commercial programs have NOT been
written in "interpreted languages" (whatever that means), but in
assembly code, C, C++... so your last paragraph is easily shown to be an
irrelevant aside -- it's not an issue of what language the code is in.


Alex
 
A

Alex Martelli

Yu-Xi Lim said:
My brother is bugged by Civilization IV's copy protection. A couple of
days ago, after consulting me on what other options he could try, he
finally said in frustration, "Maybe I should go buy the game."

It's interesting, in this context, that Civilization IV is mostly
written in Python (interfaced to some C++ via BoostPython).

It took me 12 seconds with a search engine to determine that CivIV's
protection uses "SafeDisc 4.60" and 30 more seconds to research that
issue enough to convince myself that there's enough information out
there that I could develop a crack for the thing (if I was interested in
so doing), quite apart from any consideration of the languages and
libraries used to develop it -- and I'm not even a particularly good
cracker, nor am I wired into any "underground channels", just looking at
information easily and openly available out on the web and in the index
of a major search engine.

Obfuscation has it's place.

What I think of this thesis is on a par of what I think of this way of
spelling the possessive adjective "its" (and equally unprintable in
polite company). If I could choose to eradicate only one of these two
from the world, I'd opt for the spelling -- the widespread and totally
unfounded belief in the worth of obfuscation is also damaging, but less
so, since it only steals some time and energy from developers who (if
they share this belief) can't be all that good anyway;-).


Alex
 
M

Magnus Lycka

The said:
Two things: ....
2) Only sell to an honest customer willing to be locked into
nondisclosure agreements. This goes back to the maxim of good
salesmanship: Know Your Customer.

If you have this, you don't need the obfuscation.
 
M

Magnus Lycka

petantik said:
....
I think that is not workable because it is easy to say the the internet
is available everywhere.

It is not available in developing countries...

Erh, the internet is certainly spreading to most of the
world, and there is an abundance of cracked and pirated
software in the poorer countries in the world, so the
obfuscation part has certainly proven not to work there.
 
B

Ben Sizer

Mike said:
Sounds like you just said that manufacturers should improve their
protection until they aren't making any profit on the product. That's
silly. The goal isn't to maximize protection, it's to maximize
profit. That means it only makes sense to spend money on better
protection if the cost of the protection is less than the expected
profit from adding it.

I agree with what you're saying, but it seems like you're arguing
against what was said rather than what was intended. Without wishing to
put words into anybody's mouths, I'm pretty sure what Yu-Xi Lim meant
was just that even imperfect protection is worthwhile if you estimate
that it will benefit you more than it will cost you. This is in
contrast to the opinion that any protection is useless because someone
will break it if they want to.
A recent, heavily
publicized case where Sony added copy protection to a product cost
them sales, and from what I've heard, even legal fees.

I think that's a poor example - the cost hasn't come from the mere act
of adding protection, but the method in which that protection operates.
I don't think anybody here - certainly not me - is talking about
infecting a user's system to protect our property, or taking any other
intrusive steps. I'd just like to make it non-trivial to make or use
additional copies.
 
B

Ben Sizer

Mike said:
There are ways to distribute
Python modules so that the user can't just open them in a text
editor. There are also ways to get cryptographic security for
distributed modules.

I know distributing as bytecode helps, but I was under the impression
that the disassembers worked pretty well. With the dynamic nature of
the language I expect that all the variable names are largely left
intact. You win some, you lose some, I guess.

As for cryptographic security, could you provide a link or reference
for this? I am quite interested for obvious reasons. I'd be concerned
that there's a weak link in there at the decoding stage, however.

I have considered distributing my program as open source but with
encrypted data. Unfortunately anyone can just read the source to
determine the decryption method and password. Maybe I could put that
into an extension module, but that just moves the weak link along the
chain.
Yes, if you use the same methods you use in C++,
it's "much harder". But by the same token, if you tried to use the
methods you'd use in a Python program in C++, you'd find that the C++
version was "much harder".

Well, I'm not sure what you mean here. A compiled C++ program is much
harder to extract information from than a compiled Python program.
That's without applying any special 'methods' on top of the normal
distribution process.
Of course, as Alex pointed out, all of these are just keeping honest
people honest. The crooks have all the advantages in this game, so you
really can't expect to win.

No, certainly not. But if you can mitigate your losses easily enough -
without infringing upon anyone else's rights, I must add - then why not
do so.
 
M

Mike Meyer

Ben Sizer said:
I think that's a poor example - the cost hasn't come from the mere act
of adding protection, but the method in which that protection operates.

That was sort of the point - that the effect on the bottom line of
adding copy protection is usually worse than just the cost of the
software, and can be much worse. This is a particularly egregious
example, but that just makes it an egregious example, not a poor one.
I don't think anybody here - certainly not me - is talking about
infecting a user's system to protect our property, or taking any other
intrusive steps. I'd just like to make it non-trivial to make or use
additional copies.

I've returned software that wouldn't run from a backup copy. Would I
return your software? If yes, have you factored the loss of sales to
people like me into your profit calculations?

<mike
 
P

petantik

Ben said:
I know distributing as bytecode helps, but I was under the impression
that the disassembers worked pretty well. With the dynamic nature of
the language I expect that all the variable names are largely left
intact. You win some, you lose some, I guess.

As for cryptographic security, could you provide a link or reference
for this? I am quite interested for obvious reasons. I'd be concerned
that there's a weak link in there at the decoding stage, however.

I have considered distributing my program as open source but with
encrypted data. Unfortunately anyone can just read the source to
determine the decryption method and password. Maybe I could put that
into an extension module, but that just moves the weak link along the
chain.


Well, I'm not sure what you mean here. A compiled C++ program is much
harder to extract information from than a compiled Python program.
That's without applying any special 'methods' on top of the normal
distribution process.


No, certainly not. But if you can mitigate your losses easily enough -
without infringing upon anyone else's rights, I must add - then why not
do so.

The economics of software distribution must certainly come into it,
doing a cost/benefit analysis of whether it's worth the effort to
protect your code from would be crackers.

The problem with code protection methodology in general is that once
its cracked everyone has access to code for, maybe, all software using
the particular protection scheme.

the argument that most people buy software rather than get a pirated
version depends on the country that they are in e.g. china's piracy
problem where shops sell pirated software with no retribution by the
state - remember china is about to be the worlds largest economic
superpower

The above problem illustrate why code needs to be protected in an
effective way, by law and code protection schemes

With python there is no comfort factor in knowing that your code is
being protected, well not than I can see, compared with protection
schemes for compiled code which are used by many commercial software
companies.

Of course, we know that there can never be a 100% way to protect code
that some pirate won't overcome but it still stops the casual user or
beginner 'crackers' from stealing the code and digging in to your
profit margin.

btw i'm no expert on copy protection mechanism but the question I
raised originally, i believe, is valid and should be discussed









http://petantik.blogsome.com - A Lucid Look at Reality
 
M

Mike Meyer

Ben Sizer said:
As for cryptographic security, could you provide a link or reference
for this? I am quite interested for obvious reasons. I'd be concerned
that there's a weak link in there at the decoding stage, however.

How about some ideas: Store your code in a zip file, and add it to the
search path. That immediately takes you out of the "just open the file
with a text editor" mode. For cryptographic security, use the ihooks
module to make "import" detect and decode encrypted modules before
actually importing them. Or digitally sign the modules, and check the
signature at import time. All of these are dead simple in Python.
I have considered distributing my program as open source but with
encrypted data. Unfortunately anyone can just read the source to
determine the decryption method and password. Maybe I could put that
into an extension module, but that just moves the weak link along the
chain.

This isn't aPython problem, it's a problem with what you're doing. Try
Alex's solution, and put the data on a network server that goes
through whatever authentication you want it to.
Well, I'm not sure what you mean here. A compiled C++ program is much
harder to extract information from than a compiled Python program.

It is? Is the Python disassembler so much advanced over the state of
the art of binary disassemblers, then? Or maybe it's the Python
decompilers that are so advanced? As far as I can tell, the only real
difference between Python bytecodes and x86 (for instance) binaries is
that Python bytecodes keep the variable names around so it can do
run-timme lookups. That's not that big a difference.

As for what I meant - Python has ihooks and imp, that make it simple
to customize import behavior. Doing those kinds of things with C++
code requires building the tools to do that kind of thing from
scratch.
No, certainly not. But if you can mitigate your losses easily enough -
without infringing upon anyone else's rights, I must add - then why not
do so.

Elsewhere in the thread, you said:
I'd just like to make it non-trivial to make or use additional copies.

How do you do that without infringing my fair use rights?

<mike
 
S

Steven D'Aprano

How do you do that without infringing my fair use rights?

And that is the million dollar question.

So-called "intellectual property" is a government-granted monopoly which
is not based on any principle of ownership. Ideas are not something you
can own in any real sense (as opposed to the legal fiction), ideas are
something that you can *have* -- but having had an idea, you can't
naturally prevent others from having the same idea independently, or
making use of your idea if you tell them about it -- and should you tell
them your idea so that now they have it as well, that does not diminish
the fact that you also have that idea.

Given the absolute lack of real evidence that strong "intellectual
property" laws are good for either innovation or the economy, and given
the absolute artificiality of treating ideas as if they were scarce goods,
I don't understand why the artificial monopoly rights of copyright holders
are allowed to trump the natural rights of copyright users.
 
S

Steven D'Aprano

the argument that most people buy software rather than get a pirated
version depends on the country that they are in e.g. china's piracy
problem where shops sell pirated software with no retribution by the
state - remember china is about to be the worlds largest economic
superpower

The above problem illustrate why code needs to be protected in an
effective way, by law and code protection schemes

I'm sorry, what problem? You haven't actually stated a problem -- in fact,
you have just given a perfect example of why the so-called "problem" is
not a problem at all. Let us see:

Historically, the UK had no concept of intellectual property rights until
very recently, and even when it was introduced, it was very limited until
the late 20th century.

Likewise for continental Europe.

Nevertheless, the UK and Europe became economic superpowers.

The USA, like China and Russia today, was a pirate nation for the first
century or two of its existence. American publishers simply reprinted
English books without paying royalties until well into the 20th century.
Hollywood got its start by fleeing the east coast to California, where
enforcement of Thomas Edison's patents on motion picture technology was
not enforced.

The USA has become an economic superpower.

China has little effective protection for artificial monopoly rights over
ideas. China is becoming an economic superpower.

So where is the problem?

Ah, now I understand it. Having become rich and powerful by ignoring
so-called intellectual property, the UK, Europe and especially the USA is
desperate to ensure that the developing world does not also become rich
and powerful. One way of doing so is to force a system of artificial
government-granted monopolies, together with all the proven economic
inefficiencies of such monopolies, on the developing world.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top