How to protect Python source from modification

F

Frank Millman

Hi all

I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program. As it is written in Python, with source available, this would
be quite easy. My target market extends well up into the mid-range, but
I do not think that any CFO would contemplate using a program that is
so open to manipulation.

The only truly secure solution I can think of would involve a radical
reorganisation of my program, so I am writing to see if anyone has a
simpler suggestion. Here is the idea.

1. Write a socket server program that runs on the server. The socket
server is the only program that connects to the database. The client
program connects to the server, which authenticates the client against
the database and then listens for requests from the client.

2. Devise my own protocol for communication between client and server.
For selects, the client sends a request, the server checks permissions,
then retrieves the data from the database and passes it to the client.
For updates, the client passes up the data to be updated, the server
checks it against its business logic, and then updates the database.

There is the question of where state should be maintained. If on the
server, I would have to keep all the client/server connections open,
and maintain the state of all the sessions, which would put quite a
load on the server. If on the client, I would have to reorganise my
thinking even more, but this would have an advantage - I will
eventually want to write a browser interface, and this would be much
closer in concept, so the two approaches would be quite similar.

This raises the question of whether I should even bother with a gui
client, or bite the bullet and only have a browser based front end.
Judging from recent comments about new technologies such as Ajax, a lot
of the disadvantages have been overcome, so maybe this is the way to
go.

It would be a shame to scrap all the effort I have put into my
wxPython-based front end. On the other hand, it would be pointless to
continue with an approach that is never going to give me what I want.
Any advice which helps to clarify my thinking will be much appreciated.

Thanks

Frank Millman
 
P

Peter Hansen

Frank said:
I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program. As it is written in Python, with source available, this would
be quite easy. My target market extends well up into the mid-range, but
I do not think that any CFO would contemplate using a program that is
so open to manipulation.

The only truly secure solution I can think of would involve a radical
reorganisation of my program

Please define what "truly secure" means to you.

I think you'll find that the only "truly secure" solution is to install
the critical authentication and business logic stuff that you want to
protect on a server to which the user does not have physical access.

People wanting to protect critical algorithms often conclude that they
need to have a "black box" server which cannot be physically opened by
the user.

Or do you think this issue is in some way unique to Python? You might
not realize that the only difference from a security point of view
between shipping such a program written in Python and one written in,
say, C++, is that "modifying the program" is somewhat more tedious with
C++. That's no better than security by obscurity; maybe it should be
called "security by adiposity". ;-)

But the real answer does depend a lot on *exactly* what kind of security
you want (or, ultimately, what it turns out you really need, once you've
clarified your thinking based on the feedback you do get here). Issues
like: are you more concerned about detecting changes, or in preventing
them in the first place? (the latter is much harder); what is the nature
of software that competes with yours? (is it really any more secure, or
only apparently so? maybe this is just a marketing issue); and is there
any intellectual property that you are trying to protect here, or are
you just interested in avoiding casual disruption of normal operation?

-Peter
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

Frank said:
Hi all

I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program. As it is written in Python, with source available, this would
be quite easy. My target market extends well up into the mid-range, but
I do not think that any CFO would contemplate using a program that is
so open to manipulation. [...]

My suggestion is to use py2exe or cx_Freeze to package your application.
It's then not as trivial to modify it. Btw. you don't need to ship the
..py source code files, it's enough to ship only .pyc bytecode files.

Using py2exe it's not even obvious that your application is written in
Python at all.

It's not a silver bullet, but at least it makes recompiling/modifiying
your app not easier than with Java (and/or .NET I suppose).

That being said, even if you continue with the GUI approach, it may
still be a good idea to factor out all the business logic in a separate
module so you can eventually switch to a web application or a three-tier
model without too much effort.

Also, there's no need at all to put in countless hours implementing your
own network protocol. If you really want to separate client and app
server, then why not use something simple as PyRO, or even XML/RPC.

HTH,

-- Gerhard
 
F

Frank Millman

Gerhard said:
Frank said:
Hi all

I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program. As it is written in Python, with source available, this would
be quite easy. My target market extends well up into the mid-range, but
I do not think that any CFO would contemplate using a program that is
so open to manipulation. [...]

My suggestion is to use py2exe or cx_Freeze to package your application.
It's then not as trivial to modify it. Btw. you don't need to ship the
.py source code files, it's enough to ship only .pyc bytecode files.

Using py2exe it's not even obvious that your application is written in
Python at all.

It's not a silver bullet, but at least it makes recompiling/modifiying
your app not easier than with Java (and/or .NET I suppose).

My problem is that, if someone has access to the network and to a
Python interpreter, they can get hold of a copy of my program and use
it to knock up their own client program that makes a connection to the
database. They can then execute any arbitrary SQL command.
That being said, even if you continue with the GUI approach, it may
still be a good idea to factor out all the business logic in a separate
module so you can eventually switch to a web application or a three-tier
model without too much effort.

Agreed

Also, there's no need at all to put in countless hours implementing your
own network protocol. If you really want to separate client and app
server, then why not use something simple as PyRO, or even XML/RPC.

Perhaps 'protocol' is the wrong word. I already have a simple socket
server program running. If explain how I do it, perhaps you can
indicate whether PyRO or XML/RPC would make my life easier.

The server program is currently programmed to accept a number of
message types from the client program. Each message's data string
starts with a numeric prefix, which indicates the type of message,
followed by a pickled tuple of arguments. The server program reads the
string, extracts the numeric prefix, and passes the rest of the string
to the appropriate function using a subthread.

For example, I keep track of who is currently logged in. On startup,
the client connects to my server and sends a '1' followed by their
userid and other information. The server receives this and passed the
data to a 'login' function, which uses a Python dictionary to store the
information. If the server detects that the user is already logged in,
it sends back an error code and the client program displays a message
and terminates. Otherwise it sends back an 'ok' code, and the client
can continue. When the client logs off, it sends a '2' followed by
their userid, which the server receives and passes it to a 'logoff'
function, which deletes the entry from the dictionary.

The system of numeric prefixes and associated data string making up a
message is what I mean by a protocol.
HTH,

-- Gerhard

Thanks

Frank
 
B

bruno modulix

Frank said:
Hi all

I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program.

If your program relies on a RDBMS, then it's the RDBMS job to enforce
security rules.
As it is written in Python, with source available, this would
be quite easy.

Then there's probably something wrong with the way you manage security.

NB: splitting business logic from the GUI is still a good idea anyway.
 
F

Frank Millman

Peter said:
Please define what "truly secure" means to you.

Fair question. I am not expecting 'truly' to mean 100% - I know that is
impossible. I will try to explain.

Here are some assumptions -
1. A system adminstrator is responsible for the system.
2. There is a single userid and password for connecting to the
database. This must be stored somewhere so that the client program can
read it to generate the appropriate connection string. The users do not
need to know this userid and password.
3. Each user has their own userid and password, which is stored in the
database in a 'users' table. I use this in my program for
authentication when a user tries to connect.
4. The client program can be run from anywhere on the network that has
access to the program and to a Python interpreter.

[snip]
But the real answer does depend a lot on *exactly* what kind of security
you want (or, ultimately, what it turns out you really need, once you've
clarified your thinking based on the feedback you do get here). Issues
like: are you more concerned about detecting changes, or in preventing
them in the first place? (the latter is much harder); what is the nature
of software that competes with yours? (is it really any more secure, or
only apparently so? maybe this is just a marketing issue); and is there
any intellectual property that you are trying to protect here, or are
you just interested in avoiding casual disruption of normal operation?

I am not concerned about anyone reading my code - in fact I am looking
forward to releasing the source and getting some feedback.

My concern is this. I have all this fancy authentication and business
logic in my program. If someone wants to bypass this and get direct
access to the database, it seems trivially easy. All they have to do is
read my source, find out where I get the connection string from, write
their own program to make a connection to the database, and execute any
SQL command they want.

If I move all the authentication and business logic to a program which
runs on the server, it is up to the system administrator to ensure that
only authorised people have read/write/execute privileges on that
program. Clients will have no privileges, not even execute. They will
have their own client program, which has to connect to my server
program, and communicate with it in predefined ways. I *think* that in
this way I can ensure that they cannot do anything outside the bounds
of what I allow them.

The only problem is that this is very different from the way my program
works at present, so it will be quite a bit of work to re-engineer it.
If someone can suggest a simpler solution obviously I would prefer it.
But if the consensus is that I am thinking along the right lines, I
will roll up my sleeves and get stuck in.

I hope this explains my thinking a bit better.

Thanks

Frank
 
B

bruno modulix

Frank said:
Fair question. I am not expecting 'truly' to mean 100% - I know that is
impossible. I will try to explain.

Here are some assumptions -
1. A system adminstrator is responsible for the system.
2. There is a single userid and password for connecting to the
database. This must be stored somewhere so that the client program can
read it to generate the appropriate connection string. The users do not
need to know this userid and password.
3. Each user has their own userid and password,
which is stored in the
database in a 'users' table. I use this in my program for
authentication when a user tries to connect.

Why not simply using the security system of your RDBMS ? If you set up
appropriate privileges in the RDBMS, you won't have to store any
userid/password in the program, and no user will be able to bypass
anything, even if connecting directly (like with a CLI DB client) to the
RDBMS.

[snip]

(snip more)

I am not concerned about anyone reading my code - in fact I am looking
forward to releasing the source and getting some feedback.

My concern is this. I have all this fancy authentication and business
logic in my program. If someone wants to bypass this and get direct
access to the database, it seems trivially easy. All they have to do is
read my source, find out where I get the connection string from, write
their own program to make a connection to the database, and execute any
SQL command they want.

That's why RDBMS have an authentication and security system. This
doesn't means your program doesn't have or cannot add it's own security
management, but it should be based on the RDBMS one.
 
D

Dennis Lee Bieber

My problem is that, if someone has access to the network and to a
Python interpreter, they can get hold of a copy of my program and use
it to knock up their own client program that makes a connection to the
database. They can then execute any arbitrary SQL command.
If your DBMS is directly accessible on the net, you're vulnerable
even without Python. Especially if you have "authentication" logic being
done at the client end. There is nothing to prevent someone using a
compatible query browser or command-line utility to make connection
attempts to the server, followed by classical username/password cracking
stuff.
The server program is currently programmed to accept a number of
message types from the client program. Each message's data string
starts with a numeric prefix, which indicates the type of message,
followed by a pickled tuple of arguments. The server program reads the
string, extracts the numeric prefix, and passes the rest of the string
to the appropriate function using a subthread.
Ah, okay -- you /do/ already have something running in the middle.
For example, I keep track of who is currently logged in. On startup,
the client connects to my server and sends a '1' followed by their
userid and other information. The server receives this and passed the
data to a 'login' function, which uses a Python dictionary to store the
information. If the server detects that the user is already logged in,
it sends back an error code and the client program displays a message
and terminates. Otherwise it sends back an 'ok' code, and the client
can continue. When the client logs off, it sends a '2' followed by
their userid, which the server receives and passes it to a 'logoff'
function, which deletes the entry from the dictionary.
Obscuring the Python stuff will only be a minor delay factor in
breaking that -- someone really serious could probably stick in a packet
sniffer and record a transaction sequence, eventually reverse mapping
back to the types of operations each code represents.

Database security? First step would be to USE the DBMS privilege
system to limit operations to only those SQL statements, tables, and
data columns that are needed for your client program; since you appear
to be using user/password information already, each such user could have
different privileges, limiting some to retrieval only, for example. As
for your "server", I'd probably start a thread for each connected user,
so that thread handles all communication. Your description sounds more
like a rudimentary proxy adding in a counting scheme, but not really
isolating separate client connections.
--
 
F

Frank Millman

bruno said:
If your program relies on a RDBMS, then it's the RDBMS job to enforce
security rules.

Two possible responses to this -

1. You are right (90% probability)

2. I have certain requirements which can not easily be expressed in the
RDBMS, so it is easier to use the application to enforce certain rules
(10% probability)

Unfortunately I am stuck with number 2 at present.
Then there's probably something wrong with the way you manage security.

Probably - I am learning the hard way said:
NB: splitting business logic from the GUI is still a good idea anyway.

I do have it fairly well split, but it all ends up being processed on
the client, which I think is the root of my problem.

Thanks

Frank
 
B

Bugs

As a side question Frank, how was your experiences using wxPython for
your GUI?
Any regrets choosing wxPyton over another toolkit?
Was it very buggy?
How was it to work with in general?
Any other real-world wxPython feedback you have is appreciated.

Frank said:
I am writing a multi-user accounting/business system. Data is stored in
a database (PostgreSQL on Linux, SQL Server on Windows). I have written
a Python program to run on the client, which uses wxPython as a gui,
and connects to the database via TCP/IP.
<snip>
 
S

Steve M

This is a heck of a can of worms. I've been thinking about these sorts
of things for awhile now. I can't write out a broad, well-structured
advice at the moment, but here are some things that come to mind.

1. Based on your description, don't trust the client. Therefore,
"security", whatever that amounts to, basically has to happen on the
server. The server should be designed with the expectation that any
input is possible, from slightly tweaked variants of the normal
messages to a robotic client that spews the most horrible ill-formed
junk frequently and in large volumes. It is the server's job to decide
what it should do. For example, consider a website that has a form for
users to fill out. The form has javascript, which executes on the
client, that helps to validate the data by refusing to submit the form
unless the user has filled in required fields, etc. This is client-side
validation (analagous to authentication). It is trivial for an attacker
to force the form to submit without filling in required fields. Now if
the server didn't bother to do its own validation but just inserted a
new record into the database with whatever came in from the form
submission, on the assumption that the client-side validation was
sufficient, this would constitute a serious flaw. (If you wonder then
why bother putting in client-side validation at all - two reasons are
that it enhances the user experience and that it reduces the average
load on the server.)

2. If you're moving security and business logic to the server you have
to decide how to implement that. It is possible to rely solely on the
RDBMS e.g., PostgreSQL. This has many consequences for deployment as
well as development. FOr example, if you need to restrict actions based
on user, you will have a different PgSQL user for every business user,
and who is allowed to modify what will be a matter of PgSQL
configuration. The PgSQL is mature and robust and well developed so you
can rely on things to work as you tell them to. On the other hand, you
(and your clients?) must be very knowledgeable about the database
system to control your application. You have to be able to describe
permissions in terms of the database. They have to be able to add new
users to PgSQL for every new business user, and be able to adjust
permissions if those change. You have to write code in the RDBMS
procedural language which, well, I don't know a lot about it but I'm
not to thrilled about the idea. Far more appealing is to write code in
Python. Lots of other stuff.
Imagine in contrast that user authentication is done in Python. In this
scenario, you can have just a single PgSQL user for the application
that has all access, and the Python always uses that database user but
decides internally whether a given action is permitted based on the
business user. Of course in this case you have to come up with your own
security model which I'd imagine isn't trivial. You could also improve
security by combining the approaches, e.g. have 3 database users for 3
different business "roles" with different database permissions, and
then in Python you can decide which role applies to a business user and
use the corresponding database user to send commands to the database.
That could help to mitigate the risks of a flaw in the Python code.

3. You should therefore have a layer of Python that runs on the server
and mediates between client and database. Here you can put
authentication, validation and other security. You can also put all
business logic. It receives all input with the utmost suspicion and
only if everything is in order will it query the database and send
information to the client. There is little or no UI stuff in this
layer. To this end, you should check out Dabo at www.dabodev.com. This
is an exciting Python project that I haven't used much but am really
looking forward to when I have the chance, and as it becomes more
developed. My impression is that it is useable right now. They
basically provide a framework for a lot of stuff you seem to have done
by hand, and it can give you some great ideas about how to structure
your program. You may even decide to port it to Dabo.
 
M

Michael Ekstrand

If I move all the authentication and business logic to a program which
runs on the server, it is up to the system administrator to ensure that
only authorised people have read/write/execute privileges on that
program. Clients will have no privileges, not even execute. They will
have their own client program, which has to connect to my server
program, and communicate with it in predefined ways. I *think* that in
this way I can ensure that they cannot do anything outside the bounds
of what I allow them.

I think you have no choice but to do this. Even if you package up the
program in an unmodifiable form, a competent user with a packet sniffer
or even standard OS utilities can determine where you are connecting
and bypass your security/logic. Only if the logic is implemented at a
point beyond the user's reach can you be ensured of logic integrity.

-Michael
 
B

Bruno Desthuilliers

Frank Millman a écrit :
Two possible responses to this -

1. You are right (90% probability)

2. I have certain requirements which can not easily be expressed in the
RDBMS, so it is easier to use the application to enforce certain rules
(10% probability)

easier, but with a somewhat annoying side-effect... Do you really mean
"easier", or do you think "impossible" ?
Unfortunately I am stuck with number 2 at present.
:-/


Probably - I am learning the hard way <g>

As most of us do :-/

Having jumped directly from 2-tiers fat client apps to web apps, I
really have no experience with adding a third tiers to a fat client app,
but AFAICT, Python seems to have a lot to offer here.

BTW, sorry if my answer seemed a bit rude, I didn't mean to be that critic.
 
S

Steven D'Aprano

The client program contains all the authentication and business logic.
It has dawned on me that anyone can bypass this by modifying the
program. As it is written in Python, with source available, this would
be quite easy. My target market extends well up into the mid-range, but
I do not think that any CFO would contemplate using a program that is
so open to manipulation.

Ha ha ha ha! Oh, you're a funny man! How many CFOs contemplate using
Windows, Internet Explorer, SQL Server, and all the other Microsoft
technologies that are "so open to manipulation" by spyware, viruses
and other malware?

What you do is don't tell them that they can modify the source code. They
won't think of it. And if they do, well, that isn't your problem. That's
an internal problem for their IT department, precisely as it would be if
they gave full read/write permission to everyone in the company instead of
restricting permissions to those who need them.
 
S

Steven D'Aprano

My problem is that, if someone has access to the network and to a
Python interpreter, they can get hold of a copy of my program and use
it to knock up their own client program that makes a connection to the
database. They can then execute any arbitrary SQL command.

Why is that your problem, instead of the company's problem? It is their
database server, yes? If they want to connect to it and execute arbitrary
SQL commands on their own database, (1) who are you to tell them they
can't? and (2) they hardly need your program to do it.
 
F

Frank Millman

Steven said:
Why is that your problem, instead of the company's problem? It is their
database server, yes? If they want to connect to it and execute arbitrary
SQL commands on their own database, (1) who are you to tell them they
can't? and (2) they hardly need your program to do it.

If they choose to give the userid and password to an individual, they
are obviously giving him permission to execute any command.

On the other hand, they can reasonably expect to set up users without
giving them direct access to the database, in which case I think they
would be upset if the users found this restriction easy to bypass.

Frank
 
R

Robert Kern

Frank said:
If they choose to give the userid and password to an individual, they
are obviously giving him permission to execute any command.

On the other hand, they can reasonably expect to set up users without
giving them direct access to the database, in which case I think they
would be upset if the users found this restriction easy to bypass.

Certainly, but that access control *shouldn't happen in the client*
whether the source is visible or not.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
B

Bryan Olson

Steve M wrote:
[...]
> 1. Based on your description, don't trust the client. Therefore,
> "security", whatever that amounts to, basically has to happen on the
> server.

That's the right answer. Trying to enforce security within your
software running the client machine does not work. Forget the
advice about shipping .py's or packaging the client in some way.
> The server should be designed with the expectation that any
> input is possible, from slightly tweaked variants of the normal
> messages to a robotic client that spews the most horrible ill-formed
> junk frequently and in large volumes. It is the server's job to decide
> what it should do. For example, consider a website that has a form for
> users to fill out. The form has javascript, which executes on the
> client, that helps to validate the data by refusing to submit the form
> unless the user has filled in required fields, etc. This is client-side
> validation (analagous to authentication). It is trivial for an attacker
> to force the form to submit without filling in required fields. Now if
> the server didn't bother to do its own validation but just inserted a
> new record into the database with whatever came in from the form
> submission, on the assumption that the client-side validation was
> sufficient, this would constitute a serious flaw. (If you wonder then
> why bother putting in client-side validation at all - two reasons are
> that it enhances the user experience and that it reduces the average
> load on the server.)

Good advice.
 
F

Frank Millman

Dennis said:
If your DBMS is directly accessible on the net, you're vulnerable
even without Python. Especially if you have "authentication" logic being
done at the client end. There is nothing to prevent someone using a
compatible query browser or command-line utility to make connection
attempts to the server, followed by classical username/password cracking
stuff.

Right - this is the conclusion I have come to.
Ah, okay -- you /do/ already have something running in the middle.

It is more on the side than in the middle at present - the client
connects to my server program, but also connects directly to the
database. My proposed change is to put it really in the middle - the
client connects to my server, and my server connects to the database.
Obscuring the Python stuff will only be a minor delay factor in
breaking that -- someone really serious could probably stick in a packet
sniffer and record a transaction sequence, eventually reverse mapping
back to the types of operations each code represents.

Would using SSL be a solution? This is on my to do list.
Database security? First step would be to USE the DBMS privilege
system to limit operations to only those SQL statements, tables, and
data columns that are needed for your client program; since you appear
to be using user/password information already, each such user could have
different privileges, limiting some to retrieval only, for example. As
for your "server", I'd probably start a thread for each connected user,
so that thread handles all communication. Your description sounds more
like a rudimentary proxy adding in a counting scheme, but not really
isolating separate client connections.
--

A number of replies have indicated that I should be using the DBMS
itself to manage security. I think there are some benefits to managing
it via the application. Admittedly I did not think it through when I
started, but now that I have a reasonable security model working, I
would not want to give it up. Here are some examples of what I can do.

1. Users and groups can be maintained by anyone using my app (with the
correct permissions, of course). You do not have to go through the
database adminstrator with all the complication or red tape that could
arise.

2. I am a great believer in 'field-by-field' validation when doing data
entry, instead of filling in the entire form, submitting it, and then
being informed of all the errors. I can inform a user straight away if
they try to do something they are not entitled to.

3. I can cater for the situation where a user may not have permission
to do something, but they can call a supervisor who can override this.
I have seen solutions which involve prompting for a password, but this
has to be programmed in at every place where it might be required. I
allow the supervisor to enter their userid and password, and my program
reads in their permissions, which become the active ones until
cancelled. I create a flashing red border around the window to warn
them not to forget.

There is some support for this approach given by MS SQL Server, which
has the concept of an Application Role. 'Application roles contain no
users, though you can still assign permissions to application roles.
They are designed to let an application take over the job of
authenticating users from SQL Server.'

Thanks for the interesting comments.

Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,119
Latest member
IrmaNorcro
Top