Two programs with same logic

Discussion in 'C Programming' started by karthikbalaguru, Dec 20, 2009.

  1. Hi,
    Two set of codes can be compared by using
    beyond compare or other equivalent software
    to determine the areas of differences and the
    areas of similarities. But, there are scenarios
    in which the same logic would be used by 2 set of
    programs/softwares but that the variable names
    might be different. So, how to determine this
    without executing the program ? Is there any tool
    that will help in identifying the existence of similar
    logics in two different programs ? Any ideas ?

    Thx in advans,
    Karthik Balaguru
     
    karthikbalaguru, Dec 20, 2009
    #1
    1. Advertising

  2. karthikbalaguru

    Stefan Ram Guest

    karthikbalaguru <> writes:
    >in which the same logic would be used by 2 set of


    Determining whether two programs behave the same for
    every input should in general be equivalent to the

    http://en.wikipedia.org/wiki/Halting_problem

    .

    For special conditions, it should still be possible:
    For example, when variables are only renamed, one
    can replace both name sequences by generated names.

    (I have removed other groups from the »Newsgroups:«
    header.)
     
    Stefan Ram, Dec 20, 2009
    #2
    1. Advertising

  3. -berlin.de (Stefan Ram) writes:
    > karthikbalaguru <> writes:
    >>in which the same logic would be used by 2 set of

    >
    > Determining whether two programs behave the same for
    > every input should in general be equivalent to the
    >
    > http://en.wikipedia.org/wiki/Halting_problem
    >
    > .
    >
    > For special conditions, it should still be possible:
    > For example, when variables are only renamed, one
    > can replace both name sequences by generated names.
    >
    > (I have removed other groups from the »Newsgroups:«
    > header.)


    The original article was cross-posted to
    comp.os.linux.development.apps, comp.programming,
    comp.unix.programmer, and comp.lang.c. Why did you remove everything
    but comp.lang.c? The question isn't specific to C or to any other
    language; surely comp.programming would be more appropriate.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Dec 20, 2009
    #3
  4. karthikbalaguru

    Stefan Ram Guest

    Keith Thompson <> writes:
    >The question isn't specific to C or to any other language;
    >surely comp.programming would be more appropriate.


    Since it was posted to comp.lang.c, I assumed that the
    OP was referring to a C program. This assumption might
    have been wrong, but I had it at the time I was reading
    his OP.

    >Why did you remove everything but comp.lang.c?


    Since I had this assumption, I deemed the other groups
    to be not appropriate. (I might have been wrong with
    this assessment.)

    It is my own posting style not to crosspost to multiple
    groups without sufficient reason. This was another reason
    for me to trim the Newgroup header.
     
    Stefan Ram, Dec 20, 2009
    #4
  5. On 20 Dec 2009 at 15:50, Stefan Ram wrote:
    > karthikbalaguru <> writes:
    >>in which the same logic would be used by 2 set of

    >
    > Determining whether two programs behave the same for every input
    > should in general be equivalent to the
    > http://en.wikipedia.org/wiki/Halting_problem


    A gold medal there for exceeding even the customary level of uselessness
    in a clc response.

    It's obvious that the OP doesn't care about whether two programs behave
    the same for every input. He's interested in the question:
    heuristically, given two programs, are they "very similar"?

    Notice that there are plenty of practical reasons for wanting to answer
    this question:
    * if the programs are given as source code, an instructor wants to check
    for copying-with-superficial-changes; or a lawyer wants to check for a
    GPL infringement
    * if the programs are binary, anti-virus vendors might be interested in
    this

    Compare that with the halting problem, which has no practical
    application and is of purely theoretical interest.
     
    Antoninus Twink, Dec 20, 2009
    #5
  6. karthikbalaguru

    spinoza1111 Guest

    On Dec 20, 11:42 pm, karthikbalaguru <>
    wrote:
    > Hi,
    > Two set of codes can be compared by using
    > beyond compare or other equivalent software
    > to determine the areas of differences and the
    > areas of similarities. But, there are scenarios
    > in which the same logic would be used by 2 set of
    > programs/softwares but that the variable names
    > might be different. So, how to determine this
    > without executing the program ? Is there any tool
    > that will help in identifying the existence of similar
    > logics in two different programs ? Any ideas ?
    >
    > Thx in advans,
    > Karthik Balaguru


    As others have pointed out, the general problem is equivalent to the
    Halting problem and is not solvable by an automated tool.

    However, there are useful levels of similarity, where "similarity" is
    a generalized identity:

    (1) Similarity after compiler lexical analysis: "printf( \"Hello world\
    \n\" );" as a fragment is similar in this way to "printf(\"Hello world\
    \n\");"

    (2) Similarity after parsing: "int a; a = 1+1;" is similar in this
    sense to "int b; b = 1+1;"

    (3) Similarity after code motion, elimination and optimization that
    preserves intent: "int a; int b; a = 0; b = 0;" is similar to "int b;
    int a; b = 0; a = 0;". "if (b || -1) c();" is similar in this sense to
    "c();"

    Factoring compilers better and providing an open architecture would
    allow us to use compilers for far more than merely compiling. Instead
    of shipping compilers, we should ship components. But, it's much more
    amusing here to engage in the politics of personal destruction.
     
    spinoza1111, Dec 22, 2009
    #6
  7. karthikbalaguru

    Joe Beanfish Guest

    karthikbalaguru wrote:
    > Hi,
    > Two set of codes can be compared by using
    > beyond compare or other equivalent software
    > to determine the areas of differences and the
    > areas of similarities. But, there are scenarios
    > in which the same logic would be used by 2 set of
    > programs/softwares but that the variable names
    > might be different. So, how to determine this
    > without executing the program ? Is there any tool
    > that will help in identifying the existence of similar
    > logics in two different programs ? Any ideas ?


    As others say, it would be difficult at best to find dups. But a
    couple approaches that could help. One might be to run both codes
    through an obfuscator which would replace the programmer's meaningful
    variable names with generated names. Then compare. But there's still
    a lot of room for programming style to mess you up.

    A second approach would be to compile both to assembler and compare
    the assembler code. That should remove a fair amount of programmer
    style problems but not all.
     
    Joe Beanfish, Dec 22, 2009
    #7
  8. karthikbalaguru

    Kaz Kylheku Guest

    On 2009-12-20, karthikbalaguru <> wrote:
    > Hi,
    > Two set of codes can be compared by using
    > beyond compare or other equivalent software
    > to determine the areas of differences and the
    > areas of similarities. But, there are scenarios
    > in which the same logic would be used by 2 set of
    > programs/softwares but that the variable names
    > might be different. So, how to determine this
    > without executing the program ? Is there any tool
    > that will help in identifying the existence of similar
    > logics in two different programs ? Any ideas ?


    The structural similarity you are hinting at can be easily found if you parse
    the code into a nice abstract syntax tree representation. Then it's just a
    matter of writing an equality function which compares two pieces of code.

    This is easiest to understand and illustrate through the Lisp language.

    The comparison is similar to a routine that compares two trees, except that it
    has to understand all ``special forms'' in the language, and their semantics.
    That is to say, the comparison has to behave as a ``code walker'', with respect
    to two trees at the same time. It has to recognize all special syntactic
    constructs, so that it can apply the right kind of comparison.

    It also must be capable of performing variable renaming.

    So for instance we might want these two to be found equivalent:

    (let ((x 1)) (+ x x))

    (let ((y 1)) (+ y y))

    For thaat we have to recognize that we have two matching constructs
    (both are ``let''). They both bind the same number of variables,
    using the same initialization values. Next, we have to normalize
    the bodies. We can do that by renaming the variables on each
    side: e.g. both x and y get renamed to t0 (temporary variable #0).
    Each time we rename a variable we generate a new temporary.
    When we rename x to t0 and substitute, we end up with:

    (let ((t0 1)) (+ t0 t0))

    And when we do the same substitution on the variable y on the second form we
    of course also end up with:

    (let ((t0 1)) (+ t0 t0))

    So now having done this substituion over the let, we can recursively process
    the body of each let and continue comparing: we now end up comparing (+ t0 t0)
    to (+ t0 t0). We have a positive match on the same symbol +, and we know that
    it's a function call. The two functions calls have the same number of
    arguments, which we can compare one by one, finding an exact match in
    ech position: the expression t0 trivially matches the expression t0.
    Thus having reached bottom, we conclude that the two constructs are equivalent
    (modulo variable naming).

    This applies to the comparison of C programs too because for the C language,
    you can pick this kind of tree representation with symbols and atoms.
    Parse the code to that, and write a comparison which applies the right
    kind of rules.

    The question is the semantic depth that you want in the comparison.

    Should the expression !(!P || !Q) match P && Q, by the application of
    De Morgan's law?

    Does your equivalence function fold constant expressions, so that
    2 + 2 matches 1 + 3?

    There is a huge range of things you can do in between the most naive
    static comparison (in which even differences in variable names do matter)
    to running the program and trying to identify that it does the same things
    to all inputs (running into the halting problem).


    Now about implementing this. Turns out, the annoying part of parsing C to
    structure is already done. For instance, there is this Japanese gentleman's
    project called SC:

    http://super.para.media.kyoto-u.ac.jp/~tasuku/sc/index-e.html

    With this software, you would be able to just concentrate on writing the
    equivalence function.
     
    Kaz Kylheku, Dec 22, 2009
    #8
  9. On Dec 23, 12:05 am, Joe Beanfish <> wrote:
    > karthikbalaguru wrote:
    > > Hi,
    > > Two set of codes can be compared by using
    > > beyond compare or other equivalent software
    > > to determine the areas of differences and the
    > > areas of similarities. But, there are scenarios
    > > in which the same logic would be used by 2 set of
    > > programs/softwares but that the variable names
    > > might be different. So, how to determine this
    > > without executing the program ? Is there any tool
    > > that will help in identifying the existence of similar
    > > logics in two different programs ? Any ideas ?

    >
    > As others say, it would be difficult at best to find dups. But a
    > couple approaches that could help. One might be to run both codes
    > through an obfuscator which would replace the programmer's meaningful
    > variable names with generated names. Then compare. But there's still
    > a lot of room for programming style to mess you up.
    >


    Agreed !

    > A second approach would be to compile both to assembler and compare
    > the assembler code. That should remove a fair amount of programmer
    > style problems but not all.


    Interesting approach ! I think, this is better than the earlier
    approach. But, Need to know the kind of programmer style
    problems that will not get covered by this second approach.
    Any ideas ?

    Thx in advans,
    Karthik Balaguru
     
    karthikbalaguru, Dec 27, 2009
    #9
  10. On Dec 23, 12:53 am, Kaz Kylheku <> wrote:
    > On 2009-12-20, karthikbalaguru <> wrote:
    >
    > > Hi,
    > > Two set of codes can be compared by using
    > > beyond compare or other equivalent software
    > > to determine the areas of differences and the
    > > areas of similarities. But, there are scenarios
    > > in which the same logic would be used by 2 set of
    > > programs/softwares but that the variable names
    > > might be different. So, how to determine this
    > > without executing the program ? Is there any tool
    > > that will help in identifying the existence of similar
    > > logics in two different programs ? Any ideas ?

    >
    > The structural similarity you are hinting at can be easily found if you parse
    > the code into a nice abstract syntax tree representation. Then it's just a
    > matter of writing an equality function which compares two pieces of code.
    >
    > This is easiest to understand and illustrate through the Lisp language.
    >
    > The comparison is similar to a routine that compares two trees, except that it
    > has to understand all ``special forms'' in the language, and their semantics.
    > That is to say, the comparison has to behave as a ``code walker'', with respect
    > to two trees at the same time. It has to recognize all special syntactic
    > constructs, so that it can apply the right kind of comparison.
    >
    > It also must be capable of performing variable renaming.
    >
    > So for instance we might want these two to be found equivalent:
    >
    >   (let ((x 1)) (+ x x))
    >
    >   (let ((y 1)) (+ y y))
    >
    > For thaat we have to recognize that we have two matching constructs
    > (both are ``let''). They both bind the same number of variables,
    > using the same initialization values. Next, we have to normalize
    > the bodies. We can do that by renaming the variables on each
    > side: e.g. both x and y get renamed to t0 (temporary variable #0).
    > Each time we rename a variable we generate a new temporary.
    > When we rename x to t0 and substitute, we end up with:
    >
    >   (let ((t0 1)) (+ t0 t0))
    >
    > And when we do the same substitution on the variable y on the second form we
    > of course also end up with:
    >
    >   (let ((t0 1)) (+ t0 t0))
    >
    > So now having done this substituion over the let, we can recursively process
    > the body of each let and continue comparing: we now end up comparing (+ t0 t0)
    > to (+ t0 t0). We have a positive match on the same symbol +, and we know that
    > it's a function call.  The two functions calls have the same number of
    > arguments, which we can compare one by one, finding an exact match in
    > ech position: the expression t0 trivially matches the expression t0.
    > Thus having reached bottom, we conclude that the two constructs are equivalent
    > (modulo variable naming).
    >
    > This applies to the comparison of C programs too because for the C language,
    > you can pick this kind of tree representation with symbols and atoms.
    > Parse the code to that, and write a comparison which applies the right
    > kind of rules.
    >


    Interesting. Agreed !

    > The question is the semantic depth that you want in the comparison.
    >


    True !

    > Should the expression  !(!P || !Q) match P && Q, by the application of
    > De Morgan's law?
    >
    > Does your equivalence function fold constant expressions, so that
    > 2 + 2 matches 1 + 3?
    >
    > There is a huge range of things you can do in between the most naive
    > static comparison (in which even differences in variable names do matter)
    > to running the program and trying to identify that it does the same things
    > to all inputs (running into the halting problem).


    Yeah. True !

    > Now about implementing this. Turns out, the annoying part of parsing C to
    > structure is already done. For instance, there is this Japanese gentleman's
    > project called SC:
    >
    > http://super.para.media.kyoto-u.ac.jp/~tasuku/sc/index-e.html
    >
    > With this software, you would be able to just concentrate on writing the
    > equivalence function.


    Okay, I will look into this !

    Thx,
    Karthik Balaguru
     
    karthikbalaguru, Dec 27, 2009
    #10
  11. karthikbalaguru <> writes:

    > On Dec 23, 12:05 am, Joe Beanfish <> wrote:
    >> karthikbalaguru wrote:
    >> > Hi,
    >> > Two set of codes can be compared by using
    >> > beyond compare or other equivalent software
    >> > to determine the areas of differences and the
    >> > areas of similarities. But, there are scenarios
    >> > in which the same logic would be used by 2 set of
    >> > programs/softwares but that the variable names
    >> > might be different. So, how to determine this
    >> > without executing the program ? Is there any tool
    >> > that will help in identifying the existence of similar
    >> > logics in two different programs ? Any ideas ?

    >>
    >> As others say, it would be difficult at best to find dups. But a
    >> couple approaches that could help. One might be to run both codes
    >> through an obfuscator which would replace the programmer's meaningful
    >> variable names with generated names. Then compare. But there's still
    >> a lot of room for programming style to mess you up.
    >>

    >
    > Agreed !
    >
    >> A second approach would be to compile both to assembler and compare
    >> the assembler code. That should remove a fair amount of programmer
    >> style problems but not all.

    >
    > Interesting approach ! I think, this is better than the earlier
    > approach. But, Need to know the kind of programmer style
    > problems that will not get covered by this second approach.
    > Any ideas ?


    It's not just a question or programming style. You're at the semantics
    of the programs. The question is to compare the semantics of two
    programs. So you first have to represent the semantics of a program.
    One way to do that would be to have a formal semantic definition of
    the programming language, and to translate the programs into that
    formalism, and then compare thetheir formal semantics.
    Unfortunately, not a lot of languages have a formal semantic
    definition, and for most of the programming languages which have been
    developed without such in mind, when one is developed for them, if it
    is complete, then it is rather complex, giving formal semantic forms
    that are not really more obvious than the original program (for
    example, in the case of C). In conclusion, for these kind of
    programming language, the best way to express their semantics in a
    formal way is to consider their compilation to some simplier
    processor code, and then compare these codes.

    Now of course when you compile these two programs:

    (defun s1 (v)
    (loop for e across v sum e))

    (defun s2 (v)
    (do ((s 0)
    (i (1- (length v)) (1- i)))
    ((< i 0) s)
    (incf s (aref v i))))


    you get two different listings, but you have to find a way to conclude
    that they do exactly the same thing:

    C/LISP[215]> (disassemble 's1)

    Disassembly of function S1
    (CONST 0) = 0
    1 required argument
    0 optional arguments
    No rest parameter
    No keyword parameters
    19 byte-code instructions:
    0 (CONST&PUSH 0) ; 0
    1 (NIL&PUSH)
    2 (CONST&PUSH 0) ; 0
    3 (JMP L19)
    5 L5
    5 (LOAD&PUSH 4)
    6 (LOAD&PUSH 3)
    7 (CALLSR&STORE 1 1 1) ; AREF
    11 (LOAD&PUSH 0)
    12 (LOAD&PUSH 2)
    13 (CALLSR&STORE 2 53 0) ; +
    17 (LOAD&INC&STORE 2)
    19 L19
    19 (LOAD&PUSH 2)
    20 (LOAD&PUSH 5)
    21 (CALLS2&PUSH 72) ; LENGTH
    23 (CALLSR&JMPIFNOT 1 50 L5) ; >=
    27 (POP)
    28 (SKIP&RET 4)
    NIL
    C/LISP[216]> (disassemble 's2)

    Disassembly of function S2
    (CONST 0) = 0
    1 required argument
    0 optional arguments
    No rest parameter
    No keyword parameters
    17 byte-code instructions:
    0 (CONST&PUSH 0) ; 0
    1 (LOAD&PUSH 2)
    2 (CALLS2&PUSH 72) ; LENGTH
    4 (CALLS2&PUSH 152) ; 1-
    6 (JMP L20)
    8 L8
    8 (LOAD&PUSH 1)
    9 (LOAD&PUSH 4)
    10 (LOAD&PUSH 2)
    11 (CALLSR&PUSH 1 1) ; AREF
    14 (CALLSR&STORE 2 53 1) ; +
    18 (LOAD&DEC&STORE 0)
    20 L20
    20 (LOAD&PUSH 0)
    21 (CALLS2&JMPIFNOT 148 L8) ; MINUSP
    24 (LOAD 1)
    25 (SKIP&RET 4)
    NIL


    The semantics of assembly is not more obvious than the semantics of
    high level programming languages. The reason why you will work with
    assembly is not that (otherwise we would keep programming in
    assembly!). The point of assembly, is that the instructions are
    simplier: their semantics is simplier. Therefore it is easier for
    programs to work with it. You want to compare semantics, then it's
    simplier to compare the semantics of simplier instructions. However,
    you still have a lot of work to do, even if each step of this work
    will be simplier. But it is possible to infer mechanically from both
    these assembly codes that each element of the vector is read, and
    summed, and that the sum is returned by the function. Therefore that
    these two functions are equivalent semantically.


    --
    __Pascal Bourguignon__ http://www.informatimago.com/
     
    Pascal J. Bourguignon, Dec 27, 2009
    #11
  12. In article <>,
    Pascal J. Bourguignon <> was apparently confused
    about which newsgroup he was in when he wrote:
    ....
    >Now of course when you compile these two programs:
    >
    >(defun s1 (v)
    > (loop for e across v sum e))
    >
    >(defun s2 (v)
    > (do ((s 0)
    > (i (1- (length v)) (1- i)))
    > ((< i 0) s)
    > (incf s (aref v i))))


    (Channelling our good friend Kiki, who is so good at pointing out the
    obvious)

    Lisp and C are two different languages.
     
    Kenny McCormack, Dec 27, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    12
    Views:
    1,680
    Dave Thompson
    Jan 10, 2005
  2. Replies:
    3
    Views:
    335
    Dave Moore
    Feb 1, 2005
  3. Coca
    Replies:
    7
    Views:
    762
    Aidan Grey
    Aug 24, 2004
  4. Replies:
    18
    Views:
    642
    Dave Thompson
    Jan 10, 2005
  5. spike
    Replies:
    8
    Views:
    1,517
    Steve Holden
    Feb 9, 2010
Loading...

Share This Page