any compiler that takes encrypted input

S

sh.vipin

If you are willing to accept "add 10" to the character values as
your encryption technique, you don't need very complex encryption,
so I'm rather puzzled by this.

If weak encryption is OK, have you considered source code obfuscation?
This is scrambled code that is hard to reverse engineer.  Surely
much harder than subtracting 10 from each character.

Seehttp://www.semanticdesigns.com/Products/Obfuscators/CObfuscator.html
for a C source code obfuscator that works with GCC.

-- IDB

Obfuscated code, you bring into the debugger and you can easily come
to know where is the license check. that is the place where the exit
is happening. User can easily comment that. Obfuscated code can still
be reverse engineered for commenting license check, for identifying
common libraries and then making own libraries etc.
 
S

sh.vipin

If you insist that decrypted data never hits the file system, it
wouldn't even be enough to splice your decryption into gcc's input
routines.  Various passes of gcc create temporary files with
intermediate representations of the code, the most obvious being the
output of the preprocessor, which is almost as good as the original
source.  Using the -pipe option might help if your system supports it,
but I wouldn't count on it.

oh ok. I am not sure here but you mean to say gcc by default dumps an
intermediate preprocessed file.
I assumed gcc -E only does that. But, yes as such I agree even the
machine codes can be understood
but for me the importance is to make the code as difficult as possible
to understand to the end user.
But if the user can run this toolchain, he can also hack the given
decrypting copy of gcc to get it to output the source after it's
decrypted.  You can't win.

but user won't know the key required to decrypt the file.
 
S

sh.vipin

sorry to snip the earlier lines. but i am kind of reposting the query
with a more explanation on the actual problem.


I am writing a commercial tool [A] which works as a simulator for
digital circuits. Tool [A] takes design <d> as an input and writes
equivalent "C" code for the design <d>. This C Code is then compiled
and binary is run as a simulator. Here, C Code {c} is specific to each
design. We give [A] to customer and he uses [A] at his end to generate
{C} for different designs.

so the flow is like
<d> --> [A] ==> {c}
{c} --> [gcc] ==> a.out
./a.out /* works as a simulator*/

here are below problems
- we can check for license only inside [A] but not in the {c}.
because first of all user can easily comment that. Consequence is that
customer might buy just ONE license for [A] and use that for
generating {c1,c2,c3....} for <d1,d2,d3.....>
- another problem is that some of the copyrighted/patented code is
also exposed to the user. Moreover all the proprietary libraries will
be exposed to user. To avoid this we do not ever want to expose the
"C" code generated.

To make if DIFFICULT for user(and our competitors too) to get to know
what {C} is generated for <d>, I just want to generate encrypted {C}
file. And then some compiler should be able to take this encrypted
{enC} and generate object code directly. It has to be like user gets
a.out but not {c}.

Considering that, my idea is to have the flow like as below.

<d> ---> [A] ==> {enC} /* encrypted C, a simple offset 10 */
<enC> --> [wrapGcc] ==> a.obj
<a.obj> --> [gcc] ==> a.out


wrapGcc is another C program which does the following
- take the encrypted file name as a command line argument
- find out the decryption key <offset> required for decrypting.
- invoke gcc internally using the system function call.
system("gcc -c --decrypt <offset> {enC}");

~ Here since gcc is invoked internally, end user will never get to
know <offset> required for decrypting.
~ since <offset> is never known to user, he cann't use -E option in
gcc to dump the preprocessed file, all he gets is .obj file



I agree, even .obj can be hacked/understood and for that matter even
the machine code. But my purpose is to make it VERY DIFFICULt if not
impossible.


As a reply to some of the queries
- yes i tried posting it in gcc help forum too, g++ help but didn't
get much help.
- offset is very basic method but far better than obfuscation.
obfuscated code can be brought into debugger and hacked easily. yes i
might need to know better cryptographic algos, but at the moment for
me it is important to devise the flow.
- If compiler does preprocessing inside memory itself and does n't
generate intermediate file, then this flow is certainly not futile.
Moreover there too a flag can be added to tell the compiler that do
not generate intermediate temporary preprocessed file.


Overall, I got some nice feedback here. Thanks to you all for the
same. But i guess support for such a flow doesn;t exist in the
existing compilers. but it is probably not a so bad thing to have.
 
E

Eric Sosman

sorry to snip the earlier lines. but i am kind of reposting the query
with a more explanation on the actual problem.


I am writing a commercial tool [A] which works as a simulator for
digital circuits. Tool [A] takes design <d> as an input and writes
equivalent "C" code for the design <d>. This C Code is then compiled
and binary is run as a simulator. Here, C Code {c} is specific to each
design. We give [A] to customer and he uses [A] at his end to generate
{C} for different designs.

so the flow is like
<d> --> [A] ==> {c}
{c} --> [gcc] ==> a.out
./a.out /* works as a simulator*/

here are below problems
- we can check for license only inside [A] but not in the {c}.
because first of all user can easily comment that. Consequence is that
customer might buy just ONE license for [A] and use that for
generating {c1,c2,c3....} for <d1,d2,d3.....>

So your licensing model is one license per circuit?
Or to put it another way, a customer who buys one [A]
license gets to run [A] just once for one <d>, and has
to buy another license if he made a mistake in his <d>
description and needs to correct it an re-run?

If that's the model, I think you'd better set things
up as a client-server scheme. Keep [A] on your server as
a private, guarded secret. The customer sends you <d>
along with appropriate credentials, and your server sends
back a compiled executable with debugging information
stripped out. Even that's not foolproof, since a stripped
executable is merely difficult to disassemble, not impossible.
Still better would be for the customer to send <d> and your
server sends back the simulation results.

Of course, a customer who's trying to analyze his own
highly-secret circuit design may balk at revealing that
design to you. But if so, the problem solves itself in
another way: Your customers won't snoop your secrets
because you'll have no customers! Problem solved.
I agree, even .obj can be hacked/understood and for that matter even
the machine code. But my purpose is to make it VERY DIFFICULt if not
impossible.

The approach you're taking may at best make the code
recovery INCONVENIENT. VERY DIFFICULT is more than you can
expect from this scheme.
 
S

sh.vipin

sorry to snip the earlier lines. but i am kind of reposting the query
with a more explanation on the actual problem.
I am writing a commercial tool [A] which works as a simulator for
digital circuits. Tool [A] takes design <d> as an input and writes
equivalent "C" code for the design <d>. This C Code is then compiled
and binary is run as a simulator. Here, C Code {c} is specific to each
design. We give [A] to customer and he uses [A] at his end to generate
{C} for different designs.
so the flow is like
  <d> --> [A] ==> {c}
  {c} --> [gcc] ==> a.out
  ./a.out  /* works as a simulator*/
here are below problems
  - we can check for license only inside [A] but not in the {c}.
because first of all user can easily comment that. Consequence is that
customer might buy just ONE license for [A] and use that for
generating {c1,c2,c3....} for <d1,d2,d3.....>

     So your licensing model is one license per circuit?
Or to put it another way, a customer who buys one [A]
license gets to run [A] just once for one <d>, and has
to buy another license if he made a mistake in his <d>
description and needs to correct it an re-run?

     If that's the model, I think you'd better set things
up as a client-server scheme.  Keep [A] on your server as
a private, guarded secret.  The customer sends you <d>
along with appropriate credentials, and your server sends
back a compiled executable with debugging information
stripped out.  Even that's not foolproof, since a stripped
executable is merely difficult to disassemble, not impossible.
Still better would be for the customer to send <d> and your
server sends back the simulation results.

No, that's too limited way of thinking about it. Another model can be
that, whether you generate {c} or {c} you exhaust one license per
machine.
you can generate 100 {c} for 100 different {d} but every time you run
a compiled {c} you exhaust one license. it's up to you to decide on
how many license you want to use. but certainly we don't want it the
way that user takes one license and generated 100 {c} and then can run
them w/o license check. what if user buys one license for one month
and later on doesn't renew his license but still can run compiled {c}
because t doesn't check for license.

anyways i think out of the context to discuss here in C discussion
group.

     Of course, a customer who's trying to analyze his own
highly-secret circuit design may balk at revealing that
design to you.  But if so, the problem solves itself in
another way: Your customers won't snoop your secrets
because you'll have no customers!  Problem solved.


     The approach you're taking may at best make the code
recovery INCONVENIENT.  VERY DIFFICULT is more than you can
expect from this scheme.

agreed INCONVENIENT might be a better term here. But that is a
subjective term and relative too. What is inconvenient to one can be
difficult to someone else.
overall it is also out of context topic to discuss here. i expected
some better technical feedback.

thanks
 
B

Bartc

sorry to snip the earlier lines. but i am kind of reposting the query
with a more explanation on the actual problem.


I am writing a commercial tool [A] which works as a simulator for
digital circuits. Tool [A] takes design <d> as an input and writes
equivalent "C" code for the design <d>. This C Code is then compiled
and binary is run as a simulator. Here, C Code {c} is specific to each
design. We give [A] to customer and he uses [A] at his end to generate
{C} for different designs.

so the flow is like
<d> --> [A] ==> {c}
{c} --> [gcc] ==> a.out
./a.out /* works as a simulator*/

For protecting A we used to use dongles. Maybe they still exist in some
form.

If the intermediate C output of your product is sensitive, then now I agree
a simple mod to a compiler might work. Except gcc is not simple to mod. If
this is Windows platform, suggest possibly contacting developer of lccwin32
(google for it), who might modify that for you, for a fee.

There are other ideas to have different intermediate representation, but
they require a lot more effort possibly outside your expertise in A.

I think Eric suggested a web solution, but as a customer I'd prefer my
working to be local (but then, I'm old fashioned).
 
C

Chris Dollin

Richard said:
I repeated it because I felt you had glossed over it as inconsequential
- that any pre/post encryption cross over causes leaks and makes the
rest of the conversation pretty much a moot point.

You very mention of "decrypted output" is the security hole.

My point was only that it was false that there needed to be

Only one decrypted output is needed if we're using a wrapper
script. I did not address, did not intend to address, and thought
it was obvious that I was not attempting to address, any issue
about whether having even one decrypted output was wise or not.
It seems you thought otherwise, but if so I believe you were mistaken.
 
R

Richard Bos

Is there any C Compiler that accepts encrypted source files. That is
something which may take decryption key and source file as an input
and generates the object code after decrypting the source file
internally.

In all sincerity:

This is a fucking stupid idea. It doesn't work on so many levels that it
really isn't even worth debunking.

Richard
 
S

sh.vipin

For protecting A we used to use dongles. Maybe they still exist in some
form.
for this we use server based licensing. Basically every machine on the
network gets a license from a server before using the tool. but any
machine on the network can use it. license is not machine specific.
dongles become machine specific and need to be carried physically
along with.
If the intermediate C output of your product is sensitive, then now I agree
a simple mod to a compiler might work. Except gcc is not simple to mod. If
this is Windows platform, suggest possibly contacting developer of lccwin32
(google for it), who might modify that for you, for a fee.

yes that is a possibility.
There are other ideas to have different intermediate representation, but
they require a lot more effort possibly outside your expertise in A.

really would like to know. one idea that came up was to directly
write .obj/.o files but that requires integrating a compiler in code
of [A] or giving file as an input stream to [gcc]. i didn't see any
option for that either in gcc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top