How protect proprietary Python code? (bytecode obfuscation?, what better?)

Discussion in 'Python' started by seberino@spawar.navy.mil, Apr 17, 2006.

  1. Guest

    How can a proprietary software developer protect their Python code?
    People often ask me about obfuscating Python bytecode. They don't want
    people to easily decompile their proprietary Python app.

    I suppose another idea is to rewrite entire Python app in C if compiled
    C code
    is harder to decompile.

    Any ideas?
     
    , Apr 17, 2006
    #1
    1. Advertising

  2. Terry Reedy Guest

    Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    <> wrote in message
    news:...
    > How can a proprietary software developer protect their Python code?
    > People often ask me about obfuscating Python bytecode. They don't want
    > people to easily decompile their proprietary Python app.
    >
    > I suppose another idea is to rewrite entire Python app in C if compiled
    > C code
    > is harder to decompile.
    >
    > Any ideas?


    Go to Google's newsgroup archives for c.l.p (accessible via google.com) and
    search for some of the numerous past threads on this issue, which give
    several ideas and viewpoints. There may or may not also be something in
    the Python FAQ or Wiki at python.com.
     
    Terry Reedy, Apr 17, 2006
    #2
    1. Advertising

  3. gangesmaster Guest

    well, you can do something silly: create a c file into which you embed
    your code, ie.,

    #include<python.h>

    char code[] = "print 'hello moshe'";

    void main(...)
    {
    Py_ExecString(code);
    }

    then you can compile the C file into an object file, and use regular
    obfuscators/anti-debuggers. of course people who really want to get the
    source will be able to do so, but it will take more time. and isn't
    that
    the big idea of using obfuscation?

    but anyway, it's stupid. why be a dick? those who *really* want to get
    to the source will be able to, no matter what you use. after all, the
    code is executing on their CPU, and if the CPU can execute it, so
    can really enthused men. and those who don't want to use your product,
    don't care anyway if you provide the source or not. so share.


    -tomer
     
    gangesmaster, Apr 17, 2006
    #3
  4. Serge Orlov Guest

    wrote:
    > How can a proprietary software developer protect their Python code?
    > People often ask me about obfuscating Python bytecode. They don't want
    > people to easily decompile their proprietary Python app.
    >
    > I suppose another idea is to rewrite entire Python app in C if compiled
    > C code
    > is harder to decompile.
    >
    > Any ideas?


    Shuffle opcode values in random order, recompile Python, recompile
    stdlib, recompile py2exe (or whatever you use for bundling). It will
    keep attacker busy for several hours
     
    Serge Orlov, Apr 17, 2006
    #4
  5. gangesmaster <> wrote:
    ...
    > but anyway, it's stupid. why be a dick? those who *really* want to get
    > to the source will be able to, no matter what you use. after all, the
    > code is executing on their CPU, and if the CPU can execute it, so
    > can really enthused men. and those who don't want to use your product,
    > don't care anyway if you provide the source or not. so share.


    Alternatively, if you have secrets that are REALLY worth protecting,
    keep a tiny part of your app, embedding all worthwhile secrets, on YOUR
    well-secured server -- expose it as a webservice, or whatever, so the
    "fat client" (most of the app) can get at it. This truly gives you
    complete control: you don't care any more if anybody decompiles the part
    you distribute (which may be 90% or 99% of the app), indeed you can
    publish the webservice's specs or some API to encourage more and more
    people to write to it, and make your money by whatever business model
    you prefer (subscription, one-off sale, pay-per-use, your choice!). If
    you keep your client thin rather than fat, the advantages increase (your
    app can be used much more widely, etc), but you may need substantial
    amounts of servers and other resources to support widespread use.

    When I started proposing this approach, years and years ago, the fact
    that your app can work only when connected to the net might be
    considered a real problem for many cases: but today, connectivity is SO
    pervasive, that all sort of apps require such connectivity anyway --
    e.g, look at Google Earth for a "fat client", Google Maps for a "thin"
    one accessing a subset of roughly the same data but running (the client
    side) inside a browser (with more limited functionality, to be sure).


    Alex
     
    Alex Martelli, Apr 18, 2006
    #5
  6. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    > #include<python.h>
    >
    > char code[] = "print 'hello moshe'";
    >
    > void main(...)
    > {
    > Py_ExecString(code);
    > }


    I don't get this, with python 2.4 there is no function called
    Py_ExecString in any of the header files. I found something that might
    do the job PyRun_SimpleString( ) in pythonrun.h, but couldn't get it
    to work either. So what is really the way to execute python code in a
    string from a C program?
     
    Daniel Nogradi, Apr 18, 2006
    #6
  7. gangesmaster Guest

    okay, i got the name wrong. i wasn't trying to provide production-level
    code, just a snippet. the function you want is
    PyRun_SimpleString( const char *command)

    #include <python.h>

    char secret_code[] = "print 'moshe'";

    int main()
    {
    return PyRun_SimpleString(secret_code);
    }

    and you need to link with python24.lib or whatever the object file is
    for your platform.



    -tomer
     
    gangesmaster, Apr 18, 2006
    #7
  8. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    > #include <python.h>
    >
    > char secret_code[] = "print 'moshe'";
    >
    > int main()
    > {
    > return PyRun_SimpleString(secret_code);
    > }
    >
    > and you need to link with python24.lib or whatever the object file is
    > for your platform.


    Are you sure? On a linux platform I tried linking with libpython2.4.so
    (I assume this is the correct object file) but it segfaults in
    PyImport_GetModuleDict( ).
     
    Daniel Nogradi, Apr 18, 2006
    #8
  9. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    "Daniel Nogradi" wrote:

    >> char secret_code[] = "print 'moshe'";
    >>
    >> int main()
    >> {
    >> return PyRun_SimpleString(secret_code);
    >> }
    >>
    >> and you need to link with python24.lib or whatever the object file is
    >> for your platform.

    >
    > Are you sure? On a linux platform I tried linking with libpython2.4.so
    > (I assume this is the correct object file) but it segfaults in
    > PyImport_GetModuleDict( ).


    I still don't understand why you think that embedding the *source code* in a variable
    named "secret" will do a better job than just putting the byte code in some non-obvious
    packaging, but if you insist on embedding the code, reading the documentation might
    help:

    http://docs.python.org/ext/embedding.html
    "At the very least, you have to call the function Py_Initialize()"

    http://docs.python.org/ext/high-level-embedding.html
    (minimal PyRun_SimpleString example)

    </F>
     
    Fredrik Lundh, Apr 18, 2006
    #9
  10. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    > >> char secret_code[] = "print 'moshe'";
    > >>
    > >> int main()
    > >> {
    > >> return PyRun_SimpleString(secret_code);
    > >> }
    > >>
    > >> and you need to link with python24.lib or whatever the object file is
    > >> for your platform.

    > >
    > > Are you sure? On a linux platform I tried linking with libpython2.4.so
    > > (I assume this is the correct object file) but it segfaults in
    > > PyImport_GetModuleDict( ).

    >
    > I still don't understand why you think that embedding the *source code* in a
    > variable
    > named "secret" will do a better job than just putting the byte code in some
    > non-obvious
    > packaging, but if you insist on embedding the code, reading the
    > documentation might
    > help:
    >
    > http://docs.python.org/ext/embedding.html
    > "At the very least, you have to call the function Py_Initialize()"
    >
    > http://docs.python.org/ext/high-level-embedding.html
    > (minimal PyRun_SimpleString example)


    Well, I was not the original poster in this thread I just picked up
    the idea of executing python code that is assigned to a string from
    within C and tried to do it with no particular goal, that's all. And
    thanks a lot for the links, the docs are pretty clear, I should have
    checked them before....
     
    Daniel Nogradi, Apr 18, 2006
    #10
  11. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    wrote:
    > How can a proprietary software developer protect their Python code?
    > People often ask me about obfuscating Python bytecode. They don't want
    > people to easily decompile their proprietary Python app.


    Do they ask the same thing for Java or .NET apps ?-)

    > I suppose another idea is to rewrite entire Python app in C if compiled
    > C code
    > is harder to decompile.


    Do you really think "native" code is harder to reverse-engineer than
    Python's byte-code ?

    > Any ideas?


    I'm afraid that the only *proven* way to protect code from
    reverse-engineering is to not distribute it *at all*.


    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    bruno at modulix, Apr 18, 2006
    #11
  12. "bruno at modulix" <> wrote in message
    news:4444c777$0$9453$...

    > Do they ask the same thing for Java or .NET apps ?-)


    If you Google for "bytecode obfuscation", you'll find a large number
    of products already exist for Java and .Net
     
    Richard Brodie, Apr 18, 2006
    #12
  13. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    Richard Brodie wrote:

    >> Do they ask the same thing for Java or .NET apps ?-)

    >
    > If you Google for "bytecode obfuscation", you'll find a large number
    > of products already exist for Java and .Net


    and if you google for "python obfuscator", you'll find tools for python. including
    tools that use "psychologically inspired techniques to produce extra confusion in
    human readers" (probably by inserting small snippets of Perl here and there...).

    </F>
     
    Fredrik Lundh, Apr 18, 2006
    #13
  14. Ben Sizer Guest

    bruno at modulix wrote:
    > wrote:
    > > I suppose another idea is to rewrite entire Python app in C if compiled
    > > C code
    > > is harder to decompile.

    >
    > Do you really think "native" code is harder to reverse-engineer than
    > Python's byte-code ?


    Yes, until there's a native code equivalent of "import dis" that
    telepathically contacts the original programmer to obtain variable
    names that aren't in the executable.

    --
    Ben Sizer
     
    Ben Sizer, Apr 19, 2006
    #14
  15. Re: How protect proprietary Python code? (bytecode obfuscation?,what better?)

    Ben Sizer wrote:
    > bruno at modulix wrote:
    >
    >> wrote:
    >>
    >>>I suppose another idea is to rewrite entire Python app in C if compiled
    >>>C code
    >>>is harder to decompile.

    >>
    >>Do you really think "native" code is harder to reverse-engineer than
    >>Python's byte-code ?

    >
    >
    > Yes, until there's a native code equivalent of "import dis" that
    > telepathically contacts the original programmer to obtain variable
    > names that aren't in the executable.


    Lol !-)

    Ok, granted. Let's rephrase it:
    "do you really think that native code is harder *enough* to
    reverse-engineer ?"

    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    bruno at modulix, Apr 19, 2006
    #15
  16. Ben Sizer Guest

    bruno at modulix wrote:
    > Let's rephrase it:
    > "do you really think that native code is harder *enough* to
    > reverse-engineer ?"


    I don't know. In terms of copy protection, popular off-the-shelf
    software is going to get cracked whether it's written in Python or x86
    ASM, that much is true. But in terms of perhaps protecting innovative
    algorithms from competitors, or something similar, compilation into
    native code does a great job of hiding your work. Not a perfect job,
    but a good enough job.

    I know some people talk a lot about using web services to keep the
    proprietary data behind a secure server, but there is a large number of
    applications where this is not practical - eg. image/audio processing,
    computer games, artificial intelligence, or several other applications
    with heavy real-time or cpu-intensive requirements, or embedded systems
    that don't have web access.

    Perhaps the inclusion of ctypes will make it more practical to migrate
    any sensitive code into native code libraries.

    --
    Ben Sizer
     
    Ben Sizer, Apr 20, 2006
    #16
  17. Ben Sizer <> wrote:

    > bruno at modulix wrote:
    > > Let's rephrase it:
    > > "do you really think that native code is harder *enough* to
    > > reverse-engineer ?"

    >
    > I don't know. In terms of copy protection, popular off-the-shelf
    > software is going to get cracked whether it's written in Python or x86
    > ASM, that much is true. But in terms of perhaps protecting innovative
    > algorithms from competitors, or something similar, compilation into
    > native code does a great job of hiding your work. Not a perfect job,
    > but a good enough job.


    If they're truly worth protecting, they're worth reverse engineering.

    Remember, the competition includes excellent programmers working in
    countries where $10 an hour's salary is luxury and IP law enforcements
    non-existent, so the cost to reveng is not as high as you might think.


    > I know some people talk a lot about using web services to keep the
    > proprietary data behind a secure server, but there is a large number of


    Ah yes, that would be me;-). Except that I don't limit my advice to
    proprietary DATA -- it also applies to CODE worth keeping secret.

    > applications where this is not practical - eg. image/audio processing,
    > computer games, artificial intelligence, or several other applications
    > with heavy real-time or cpu-intensive requirements, or embedded systems
    > that don't have web access.


    Fewer and fewer systems "intrinsically lack" net access. For example,
    good (costly) computer games more and more need net access to be played
    in the best way (multiplayer etc).

    "CPU intensive" is a weird reason to want to avoid keeping in a well
    protected environment any code that's really worth money -- if it IS
    worth that much you're no doubt charging enough for it to afford
    supplying the CPU power to your customers (whatever your business model,
    say pay-per-use or subscription levels with different maxima, etc etc).

    >
    > Perhaps the inclusion of ctypes will make it more practical to migrate
    > any sensitive code into native code libraries.


    Naah, ctypes shines when you access *pre-existing* dynamic libraries; if
    you're building those libraries yourself, it makes more sense to make
    them immediately usable from Python, e.g. via Pyrex, or SWIG, or SIP, or
    the C API, etc, etc. And if your secrets are truly valuable, none of
    those will really help keep them safe.

    If your secrets are worth diddlysquat, and the only reason to "protect"
    them is (e.g.) to keep some PHB happy (relying on the fact that he or
    she has no clue as to reality anyway), then go ahead -- use a Caesar
    cypher (as a just-arrested Mafia "capo di tutti i capi" appears to have
    done -- Italian police easily broke it, enabling it to arrest several
    other mafiosi!), or native code, or any other ineffectual approach. But
    if your wallet (or jailtime;-) is really on the line, do realize that
    they ARE ineffectual.


    Alex
     
    Alex Martelli, Apr 20, 2006
    #17
  18. Ben Sizer Guest

    Alex Martelli wrote:
    > Ben Sizer <> wrote:
    >
    > > I don't know. In terms of copy protection, popular off-the-shelf
    > > software is going to get cracked whether it's written in Python or x86
    > > ASM, that much is true. But in terms of perhaps protecting innovative
    > > algorithms from competitors, or something similar, compilation into
    > > native code does a great job of hiding your work. Not a perfect job,
    > > but a good enough job.

    >
    > If they're truly worth protecting, they're worth reverse engineering.


    It's a sliding scale though. You don't need to be able to stop
    everybody to make it worthwhile.

    > Remember, the competition includes excellent programmers working in
    > countries where $10 an hour's salary is luxury and IP law enforcements
    > non-existent, so the cost to reveng is not as high as you might think.


    Whether $10 is a lot or a little is not as important as whether that
    $10 could be better spent. It's easy to drill down far enough to break
    copy protection but nowhere near as easy to derive a high level
    algorithm from the assembly language. So in the latter case, a little
    protection goes a long way.

    > > I know some people talk a lot about using web services to keep the
    > > proprietary data behind a secure server, but there is a large number of

    >
    > Ah yes, that would be me;-). Except that I don't limit my advice to
    > proprietary DATA -- it also applies to CODE worth keeping secret.


    Code is data, data is code. :) I meant it to refer to all information
    stored that way.

    > > applications where this is not practical - eg. image/audio processing,
    > > computer games, artificial intelligence, or several other applications
    > > with heavy real-time or cpu-intensive requirements, or embedded systems
    > > that don't have web access.

    >
    > Fewer and fewer systems "intrinsically lack" net access. For example,
    > good (costly) computer games more and more need net access to be played
    > in the best way (multiplayer etc).


    Sure, but there's still many, many programs that don't fit that
    criteria. Nor are people generally happy about being compelled to use
    online services to 'activate' their games.

    > "CPU intensive" is a weird reason to want to avoid keeping in a well
    > protected environment any code that's really worth money -- if it IS
    > worth that much you're no doubt charging enough for it to afford
    > supplying the CPU power to your customers (whatever your business model,
    > say pay-per-use or subscription levels with different maxima, etc etc).


    Maybe I wasn't making myself clear - I just meant that you can't be
    doing round-trips to a web server for per-pixel calculations.

    --
    Ben Sizer
     
    Ben Sizer, Apr 20, 2006
    #18
  19. AdrianC

    Joined:
    Sep 19, 2010
    Messages:
    1
    embedding

    you know, embedding the python code as a string inside a C app is very easy to crack... just open the binary with "vi" (or any text editor that doesn't mind binary here and there :) )

    you could embed the pyc file though but still, can be reversed easily :)
     
    AdrianC, Sep 19, 2010
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Koleho
    Replies:
    28
    Views:
    1,473
    Mark Lambert
    Jul 21, 2003
  2. Anony!
    Replies:
    2
    Views:
    712
  3. snacktime

    bytecode obfuscation

    snacktime, Feb 3, 2005, in forum: Python
    Replies:
    6
    Views:
    439
    Adam DePrince
    Feb 7, 2005
  4. Joakim Persson
    Replies:
    8
    Views:
    1,389
    Paul Casteels
    Sep 21, 2005
  5. Biggmatt
    Replies:
    0
    Views:
    471
    Biggmatt
    Apr 19, 2006
Loading...

Share This Page