Copy / Paste in software development

jacob navia · Feb 13, 2009

I think nobody here can deny that copy and paste is an established
method of software development.

You know: you have some code that works, and you want to modify it. You
copy it and paste the code somewhere else, then you modify it leaving
the running code in its place until you switch to the new version.

True, many people are against this fashion of developing software. The
correct (in the abstract) process should be of finding out the common
parts of the code and isolate the changes as far as it is possible,
maintaining a common line.

The copy/paste is considered harmful.

But... I was surprised when I read this article in PLOS: [1]

<quote>
One of the primary agents of genome evolution is gene duplication.
Duplicated genes provide the raw material for the generation of novel
genes and biological functions, which in turn allow the evolution of
organismal complexity and new species. James Sikela and colleagues set
out to compare gene duplications between humans and four of our closest
primate relatives to find the genetic roots of our evolutionary split
from the other great apes. Collecting the DNA of humans, chimpanzees,
bonobos, gorillas, and orangutans from blood and experimental cell
lines, the researchers used microarray analysis to identify variations
in the number of copies of individual genes among the different species.
They analyzed nearly 30,000 human genes and compared their copy numbers
in the genomes of humans and the four great apes.

Overall, Sikela and colleagues found more than 1,000 genes with
lineage-specific changes in copy number, representing 3.4% of the genes
tested. All the great ape species showed more increases than decreases
in gene copy numbers, but relative to the evolutionary age of each
lineage, humans showed the highest number of genes with increased copy
numbers, at 134. Many of these duplicated human genes are implicated in
brain structure and function.

The gene changes identified in the study, the authors conclude, likely
represent most of the major lineage-specific gene expansions (or losses)
that have taken place since orangutans split from the other great apes,
some 15 million years ago. (Humans diverged from their closest cousins,
the chimp and bonobo, roughly 5 million to 7 million years ago.) And
because some of these gene changes were unique to each of the species
examined, they will likely account for some of the physiological and
morphological characteristics that are unique to each species.

<end quote>

Apparently the programmer (or programmer team) 15 million years ago were
in a hurry. And copy / paste, as everyone here knows, is not an accepted
method but... it works, and that is all that counts.

What were they tinkering with?

"... humans showed the highest number of genes with increased copy
numbers, at 134. Many of these duplicated human genes are implicated in
brain structure and function"

All those millions of years later, the descendants of those apes, still
running the same code, start to wonder...

WHAT HAPPENED?

Bo Persson · Feb 14, 2009

jacob said:
I think nobody here can deny that copy and paste is an established
method of software development.

You know: you have some code that works, and you want to modify it.
You copy it and paste the code somewhere else, then you modify it
leaving the running code in its place until you switch to the new
version.
True, many people are against this fashion of developing software.
The correct (in the abstract) process should be of finding out the
common parts of the code and isolate the changes as far as it is
possible, maintaining a common line.

The copy/paste is considered harmful.

But... I was surprised when I read this article in PLOS: [1]

<quote>
One of the primary agents of genome evolution is gene duplication.
Duplicated genes provide the raw material for the generation of
novel genes and biological functions, which in turn allow the
evolution of organismal complexity and new species. James Sikela
and colleagues set out to compare gene duplications between humans
and four of our closest primate relatives to find the genetic roots
of our evolutionary split from the other great apes. Collecting the
DNA of humans, chimpanzees, bonobos, gorillas, and orangutans from
blood and experimental cell lines, the researchers used microarray
analysis to identify variations in the number of copies of
individual genes among the different species. They analyzed nearly
30,000 human genes and compared their copy numbers in the genomes
of humans and the four great apes.
Overall, Sikela and colleagues found more than 1,000 genes with
lineage-specific changes in copy number, representing 3.4% of the
genes tested. All the great ape species showed more increases than
decreases in gene copy numbers, but relative to the evolutionary
age of each lineage, humans showed the highest number of genes with
increased copy numbers, at 134. Many of these duplicated human
genes are implicated in brain structure and function.

The gene changes identified in the study, the authors conclude,
likely represent most of the major lineage-specific gene expansions
(or losses) that have taken place since orangutans split from the
other great apes, some 15 million years ago. (Humans diverged from
their closest cousins, the chimp and bonobo, roughly 5 million to 7
million years ago.) And because some of these gene changes were
unique to each of the species examined, they will likely account
for some of the physiological and morphological characteristics
that are unique to each species.
<end quote>

Apparently the programmer (or programmer team) 15 million years ago
were in a hurry. And copy / paste, as everyone here knows, is not
an accepted method but... it works, and that is all that counts.

What were they tinkering with?

"... humans showed the highest number of genes with increased copy
numbers, at 134. Many of these duplicated human genes are
implicated in brain structure and function"

All those millions of years later, the descendants of those apes,
still running the same code, start to wonder...

WHAT HAPPENED?

Copy/Paste might work well when you actually indend to create new
functionality.

Note that some of the "software" we are copies from, has since been
scrapped. Could this be bcause some of the improvements and obvious
bug fixes were never back-ported to the original code?

Bo Persson

abcd · Feb 14, 2009

http://www.sampioni.com/en/Zasto_sampioni.htm

James Kanze · Feb 14, 2009

Actually, it isn't, and it never has been an acceptable method.
The procedure in well run shops is called refactoring, and it
doesn't involve copy/paste.

[snip]
True, many people are against this fashion of developing
software. The correct (in the abstract) process should be of
finding out the common parts of the code and isolate the
changes as far as it is possible, maintaining a common line.
The copy/paste is considered harmful.

Click to expand...

I have personally investigated this, and built tools to detect
clones in code, including in C and C++.
Seehttp://www.semdesigns.com/Company/Publications/ICSM98.pdf

"Harmful" is the wrong way to think about this, in the same
sense that "Gotos are Harmful" is the wrong way to think about
Gotos. (Without Gotos, you can't reasonably implement a
high-performance finite state machine, and everybody agrees
FSAs are good, right?)

Two pieces of bullshit in one paragraph. I've yet to see any
competent programmer use goto, and I've yet to see any competent
programmer use copy/paste. (And I've implemented more than a
few finite state machines.)

jacob navia · Feb 14, 2009

James said:
Two pieces of bullshit in one paragraph. I've yet to see any
competent programmer use goto, and I've yet to see any competent
programmer use copy/paste. (And I've implemented more than a
few finite state machines.)

Great.

If you read the original message however, it seems that the programmer
of your genetic code (the code you are running) has used extensively
copy/paste to replicate entire portions of an ape genome around 5-7
million years ago, to modify them later.

That was the point of my message.

You would treat the unknown tinkerer as "incompetent"?

Maybe cut/paste does work

Alf P. Steinbach · Feb 14, 2009

* Ira Baxter:

So, I have an FSA with 3 states:

A: if A1 then action1; goto B
(otherwise) goto C

B: if B1 then action2; goto A
if B2 then action3; goto B
(otherwise) goto C

C: if C1 then ...
...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other overhead?

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing
space (whether in source lines or machine memory),
and only being able to place *one* of the
basic blocks in front of the other. The other
one has to go somewhere else, and transfer
control to the shared successor.

The Bohm-Jacopini theorem from back in the early
70s says you can always build structured code,
if you don't mind adding flag variables, which I
count as "extra overhead". You may argue that isn't
expensive; I'd argue adding the flags make it not high-performance.
Some applications really care. Yours may not,
but that isn't the point.

Your argument is incredibly silly. Taking it as a given that it's desirable to
keep the sordid mess of of a spaghetti state machine (it's not), you're arguing
that it should in turn be implemented as spaghetti goto-based code in order to
save /one/ integral variable. It's borderline lunacy.

Cheers & hth.,

- Alf

Kai-Uwe Bux · Feb 14, 2009

Alf said:
* Ira Baxter: [snip]

So, I have an FSA with 3 states:

A: if A1 then action1; goto B
(otherwise) goto C

B: if B1 then action2; goto A
if B2 then action3; goto B
(otherwise) goto C

C: if C1 then ...
...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other overhead?

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing
space (whether in source lines or machine memory),
and only being able to place *one* of the
basic blocks in front of the other. The other
one has to go somewhere else, and transfer
control to the shared successor.

The Bohm-Jacopini theorem from back in the early
70s says you can always build structured code,
if you don't mind adding flag variables, which I
count as "extra overhead". You may argue that isn't
expensive; I'd argue adding the flags make it not high-performance.
Some applications really care. Yours may not,
but that isn't the point.

Click to expand...

Your argument is incredibly silly. Taking it as a given that it's
desirable to keep the sordid mess of of a spaghetti state machine (it's
not),

Whether it is a good idea to keep the mess, depends very much on context. I
find myself often in the position that some of the best algorithms
described in the literature are given in a somewhat messy way. However,
that mess has the _huge_ advantage of being shown correct. In those cases,
I consider it a good strategy to make my code mimmick the paper as close as
possible so that it is easy to check that my code is nothing but a
transcription. Was I to depart from that pattern, I might have to argue
correctness again (which can be highly non-trivial). For instance, a
central piece for generaing Poisson distributed random variables in my
library is the following piece of spaghetti code:

...
fhuge u;
fhuge g;
result_type k;
fhuge e;
fhuge t;
fhuge dummy;
if ( this->m < 10.0 ) {
return( this->grow_table( uni_d( urng ) ) );
} else {
step_N :
g = nor_d( urng, normal_param( this->m, this->s ) );
k = floor(g);
if ( g < 0.0 ) {
goto step_P;
}
step_I :
if ( fhuge(k) >= this->L ) {
return ( k );
}
step_S:
u = uni_d( urng );
dummy = this->m - fhuge(k);
if ( this->d * u >= dummy * dummy * dummy ) {
return ( k );
}
step_P :
if ( g >= 0.0 ) {
this->procedure_F( k );
} else {
goto step_E;
}
step_Q :
if ( this->fy * ( 1.0L - u )
<=
this->py * exp( this->px - this->fx ) ) {
return ( k );
}
step_E :
e = exp_d( urng );
u = uni_d( urng, uniform_param( -1.0L, 1.0L ) );
t = 1.8L + ( u > 0 ? e : -e );
if ( t <= -0.6744L ) {
goto step_E;
}
k = floor( this->m + this->s * t );
this->procedure_F( k );
step_H :
if ( this->c * abs(u)

>

this->py * exp( this->px + e )
-
this->fy * exp( this->fx + e ) ) {
goto step_E;
}
return ( k );
}
...

Nobody is supposed to understand what is going on. However, if you have a
copy of the article

J.H. Ahrens, U. Dieter: "Computer Generation of Poisson
Deviates from Modified Normal Distributions", ACM Transactions
on Mathematical Software 8 (1982) 163-179

you will be able to verify that it follows the document closely. In
particular, names for case labels and variables are chosen with that in
mind. (There is of course a tricky part with this: There can be a typo in
the paper. In this particular case, it is in procedure_F

Of course, the
file containing the spaghetti code gives the reference (and explains and
corrects the typo).

So, if you happen to have a proof of correctness for the messy FSA, it can
be a good business decision to keep it. Otherwise, you might ending up
wasting resources.

you're arguing that it should in turn be implemented as spaghetti
goto-based code in order to save /one/ integral variable. It's borderline
lunacy.

Well, _if_ (in the rare cases where appropriate) you keep the messy FSA,
then I think you have to deal with control flow jumping all over the place
whether you put in an integral variable or not. I don't see any advantage
to turning a

goto vertex_5;

into

next_vertex = 5;

Best

Kai-Uwe Bux

Alf P. Steinbach · Feb 14, 2009

* Kai-Uwe Bux:

Alf said:
Alf said:

* Ira Baxter: [snip]

So, I have an FSA with 3 states:

A: if A1 then action1; goto B
(otherwise) goto C

B: if B1 then action2; goto A
if B2 then action3; goto B
(otherwise) goto C

C: if C1 then ...
...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other overhead?

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing
space (whether in source lines or machine memory),
and only being able to place *one* of the
basic blocks in front of the other. The other
one has to go somewhere else, and transfer
control to the shared successor.

The Bohm-Jacopini theorem from back in the early
70s says you can always build structured code,
if you don't mind adding flag variables, which I
count as "extra overhead". You may argue that isn't
expensive; I'd argue adding the flags make it not high-performance.
Some applications really care. Yours may not,
but that isn't the point.

Click to expand...

Your argument is incredibly silly. Taking it as a given that it's
desirable to keep the sordid mess of of a spaghetti state machine (it's
not),

Click to expand...

Whether it is a good idea to keep the mess, depends very much on context. I
find myself often in the position that some of the best algorithms
described in the literature are given in a somewhat messy way.

Then they're decidedly not, in general, the best: if the authors were any good,
they'd not make a mess of it, so any particular one would be best just by chance.

However,
that mess has the _huge_ advantage of being shown correct. In those cases,
I consider it a good strategy to make my code mimmick the paper as close as
possible so that it is easy to check that my code is nothing but a
transcription. Was I to depart from that pattern, I might have to argue
correctness again (which can be highly non-trivial). For instance, a
central piece for generaing Poisson distributed random variables in my
library is the following piece of spaghetti code:

...
fhuge u;
fhuge g;
result_type k;
fhuge e;
fhuge t;
fhuge dummy;
if ( this->m < 10.0 ) {
return( this->grow_table( uni_d( urng ) ) );
} else {
step_N :
g = nor_d( urng, normal_param( this->m, this->s ) );
k = floor(g);
if ( g < 0.0 ) {
goto step_P;
}
step_I :
if ( fhuge(k) >= this->L ) {
return ( k );
}
step_S:
u = uni_d( urng );
dummy = this->m - fhuge(k);
if ( this->d * u >= dummy * dummy * dummy ) {
return ( k );
}
step_P :
if ( g >= 0.0 ) {
this->procedure_F( k );
} else {
goto step_E;
}
step_Q :
if ( this->fy * ( 1.0L - u )
<=
this->py * exp( this->px - this->fx ) ) {
return ( k );
}
step_E :
e = exp_d( urng );
u = uni_d( urng, uniform_param( -1.0L, 1.0L ) );
t = 1.8L + ( u > 0 ? e : -e );
if ( t <= -0.6744L ) {
goto step_E;
}
k = floor( this->m + this->s * t );
this->procedure_F( k );
step_H :
if ( this->c * abs(u)
this->py * exp( this->px + e )
-
this->fy * exp( this->fx + e ) ) {
goto step_E;
}
return ( k );
}
...

Nobody is supposed to understand what is going on. However, if you have a
copy of the article

J.H. Ahrens, U. Dieter: "Computer Generation of Poisson
Deviates from Modified Normal Distributions", ACM Transactions
on Mathematical Software 8 (1982) 163-179

you will be able to verify that it follows the document closely.

Hm. In effect you're just hoping for the best, taking some spaghetti code based
on spaghetti mathematics and hoping the authors have done it right at both
levels. Of course it *may happen* that the above is correct with respect to some
specification, but hey.

And no, I really don't care how well-respected or not the authors are.

If the king of Norway visits and spews on my carpet I kick him and yell at him,
king or whatever.

In
particular, names for case labels and variables are chosen with that in
mind. (There is of course a tricky part with this: There can be a typo in
the paper. In this particular case, it is in procedure_F

Ah!

So, you had to correct their work.

I'm not surprised!

Of course, the
file containing the spaghetti code gives the reference (and explains and
corrects the typo).

So, if you happen to have a proof of correctness for the messy FSA, it can
be a good business decision to keep it. Otherwise, you might ending up
wasting resources.

Well, _if_ (in the rare cases where appropriate) you keep the messy FSA,
then I think you have to deal with control flow jumping all over the place
whether you put in an integral variable or not. I don't see any advantage
to turning a

goto vertex_5;

into

next_vertex = 5;

You mean "current_vertex".

It really helps debugging and tracing, and it really helps separating those
cases into individual routines, which is a good idea when you're the one
creating and implementing the state machine.

For your case of sort of "inheriting" the mess from a pair of messymathicians
I'd just search for some other less messy Poisson distribution generator (e.g.
there is a simple one, just a few lines of Smalltalk, in the book nearest to me
right now).

Cheers & hth.,

- Alf

Kai-Uwe Bux · Feb 14, 2009

Alf said:
* Kai-Uwe Bux:

Alf said:

* Ira Baxter: [snip]
So, I have an FSA with 3 states:

A: if A1 then action1; goto B
(otherwise) goto C

B: if B1 then action2; goto A
if B2 then action3; goto B
(otherwise) goto C

C: if C1 then ...
...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other overhead?

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing
space (whether in source lines or machine memory),
and only being able to place *one* of the
basic blocks in front of the other. The other
one has to go somewhere else, and transfer
control to the shared successor.

The Bohm-Jacopini theorem from back in the early
70s says you can always build structured code,
if you don't mind adding flag variables, which I
count as "extra overhead". You may argue that isn't
expensive; I'd argue adding the flags make it not high-performance.
Some applications really care. Yours may not,
but that isn't the point.
Your argument is incredibly silly. Taking it as a given that it's
desirable to keep the sordid mess of of a spaghetti state machine (it's
not),

Click to expand...

Whether it is a good idea to keep the mess, depends very much on context.
I find myself often in the position that some of the best algorithms
described in the literature are given in a somewhat messy way.

Click to expand...

Then they're decidedly not, in general, the best: if the authors were any
good, they'd not make a mess of it, so any particular one would be best
just by chance.

You confuse two issues. The paper is not a mess. The algorithm is only (a
small) part of the paper. Most of the paper deals with explaining the
various steps and showing what they are doing, why that is the right thing
to do, and the arguing resulting runtime complexity. That a C++ transcript
of the algorithm is incomprehensible without the explanations that
accompany the presentation in the paper is of no relevance as to the merits
of the paper or the algorithm.

Hm. In effect you're just hoping for the best, taking some spaghetti code
based on spaghetti mathematics and hoping the authors have done it right
at both levels. Of course it *may happen* that the above is correct with
respect to some specification, but hey.

You misinterpret the argument. There is no need for hope since one can check
the paper.

The case is simply that when I write a piece of C++ code for the purpose of
generating Poisson variables, I want to make it (a) efficient and (b)
verifiable. For the second part, it is a great time saver to push off the
work to people who have already argued correctness. However, then I have to
write the code so that it is easy to check that the code matches the
algorithm they talk about. That influences how I write the code.

And no, I really don't care how well-respected or not the authors are.

Neither do I. But I do care about the efficiency of my work. I could not
possibly come up with that good an algorithm myself. If the best available
math is not to your liking, you can settle for an algorithm of lesser
quality that fits your coding guidelines. But I think that says more about
the coding guidelines than about the algorithm.

If the king of Norway visits and spews on my carpet I kick him and yell at
him, king or whatever.

So what? The mathematics in the paper is good. The algorithm is expressed
clearly! It just doesn't fit some coding guidelines. But hey, mathematics
is not programming. The most maintainable way to code an algorithm is not
necessarily the same as the best way to present it and argue its
correctness.

Ah!

So, you had to correct their work.

I'm not surprised!

I said, it was a typo. The paper contains the correct formula but in the
algorithm, a _sign_ changed. Reading the paper with understanding(!) is, of
course, imperative. Again, however, reading with understanding is much
easier than creating another paper myself.

You mean "current_vertex".

It really helps debugging and tracing, and it really helps separating
those cases into individual routines, which is a good idea when you're the
one creating and implementing the state machine.

Since I don't have first hand experience with FSAs, I take your word for it.

For your case of sort of "inheriting" the mess from a pair of
messymathicians I'd just search for some other less messy Poisson
distribution generator (e.g. there is a simple one, just a few lines of
Smalltalk, in the book nearest to me right now).

Generating Poisson variables with constant expected runtime independent of
parameter values is non-trivial. I did some research on this one before I
settled on this particular method. There are algorithms less involved, but
they become very inefficient for large parameters.

When writing library code, you cannot profile the application. In those
cases, sacrificing performance for no good reason is poor form. The various
algorithms in the literature are there for a reason. I do not see why I
should pick a suboptimal one just because its expression fits my current
coding style better.

Best

Kai-Uwe Bux

James Kanze · Feb 15, 2009

James Kanze" <[email protected]> wrote in message

<On Feb 14, 8:09 pm, "Ira Baxter" <[email protected]>
wrote:

So, I have an FSA with 3 states:

A: if A1 then action1; goto B
(otherwise) goto C

B: if B1 then action2; goto A
if B2 then action3; goto B
(otherwise) goto C

C: if C1 then ...
...

How would you propose that control pass from the otherwise
clauses in both A and B to C, without the goto, and no other
overhead?

The classical solution is to use a switch. You need a variable
to maintain the state anyway; otherwise, it's impossible to
understand what is going on, and given that, a switch is the
natural structure.

The goto is a consequence of having separate basic blocks
whose control flow merges, and a linear addressing space
(whether in source lines or machine memory), and only being
able to place *one* of the basic blocks in front of the other.
The other one has to go somewhere else, and transfer control
to the shared successor.

The goto is the consequence of not caring about whether the code
is maintainable or not.

James Kanze · Feb 15, 2009

* Ira Baxter:

[...]

Your argument is incredibly silly. Taking it as a given that
it's desirable to keep the sordid mess of of a spaghetti state
machine (it's not), you're arguing that it should in turn be
implemented as spaghetti goto-based code in order to save
/one/ integral variable. It's borderline lunacy.

I agree with your comments, but sometimes, the spaghetti state
machine is part of the external specification. Back in the old
days (and maybe still---I just haven't worked with this sort of
thing for about 25 years), controller chips were often specified
in terms of state machines (and for the more complicated ones,
they were true spaghetti); today, a number of protocols, or
parts of them, at least, are also specified as state machines
(TCP connection, for example). In such cases, you're probably
better off following the specification literally.

In C++, in addition to the classical switch statement, there is
also the state pattern. Robert Martin also has a lot to say
about the pattern, and had (has?) a code generator for the
boiler-plate parts of it. (In practice, any time the state
diagram gets a bit complicated, you're probably better off using
automatically generated code.)

James Kanze · Feb 15, 2009

[...]

Well, _if_ (in the rare cases where appropriate) you keep the
messy FSA, then I think you have to deal with control flow
jumping all over the place whether you put in an integral
variable or not. I don't see any advantage to turning a

goto vertex_5;

next_vertex = 5;

Until you try to debug it. Being able to see what state you're
in just by reading a variable has a lot of advantages. It also
makes it clear that you are dealing with a state machine. If I
were writing the code by hand, I'd go one step further, and
insist that the state variable have class type, and all state
changes go through a member function, so that I could log them.

Of course, if I had to deal with a complicated state machine
today, I'd probably use some variant of the State pattern,
rather than a switch, at least in most environments. (An
implementation of the state pattern may require more memory than
just a switch---enough more to make a difference on really small
embedded processors.)

The one exception with regards to the goto might be if I were
using automatically generated code. As long as no one has to
understand or maintain the code, there's no problem with goto.
(This is basically the same argument you gave with regards to
transcribing a published algorithm. I sort of agree with your
argument there, too, as long as you stick with a literal
transcription.)

James Kanze · Feb 15, 2009

* Kai-Uwe Bux:

[...]

Then they're decidedly not, in general, the best: if the
authors were any good, they'd not make a mess of it, so any
particular one would be best just by chance.

I think, Alf, that that's an anachronism. Today, any good
author will make a highly structured presentation; it's a lot
simpler to reason about the code, and prove it correct, if it's
highly structures. Fourty or more years ago, the importance of
this wasn't realized, however, and a lot of the classical
algorithms look horrible by today's standards. If it's an
algorithm that's heavily used and written about, there will be
later articles that do present it in a structured manner, but I
suspect that there are a lot of specialized algorithms for which
the last published article was also the first one to present the
solution and prove it.

[...]

Hm. In effect you're just hoping for the best, taking some
spaghetti code based on spaghetti mathematics and hoping the
authors have done it right at both levels. Of course it *may
happen* that the above is correct with respect to some
specification, but hey.

The article he sited appeared in a peer-reviewed journal. That
means that the code in it has been reviewed by a number of very
competent reviewers; certainly more and more competent than
you'd get in a code review in your company. And were there an
error, you can bet that it would have appeared in the letters to
the editors in a later edition of the journal. (Presumably,
that's how Kai-Uwe knew about the typo in the original article.)
Open Source isn't a recepe for anything in itself, but Open
Source where you're pratically guaranteed that the code will be
read and analysed by the best minds in the business does.

And no, I really don't care how well-respected or not the
authors are.

It's not just the authors. It's the entire process.

And yes, errors do still slip through. But they don't normally
last long---anything older than about 20 years is probably safe.
And you have to compare it with the errors that might slip
through if you reworked it; the code review process in your
company will almost certainly not be to the same level as that
in a peer-reviewed journal.

The weakest link in this sort of thing is, in fact, ensuring
that your transcripion is accurate; that you haven't
accidentally changed something converting the pseudo-code into
C++. And the spaghetti nature of the original doesn't simplify
that.

Guest · Feb 15, 2009

James Kanze wrote:

I think Paul Hsiu was implemented state machines with gotos...

Great.

If you read the original message however, it seems that the programmer
of your genetic code (the code you are running) has used extensively
copy/paste to replicate entire portions of an ape genome around 5-7
million years ago, to modify them later.

1. the genome is not a program
2. the genome was not designed
3. the genome does not have a designer

natural selection is a very wasteful form of "design".
If you want to build your compiler I doubt you can wait a few
million years between releases!

That was the point of my message.

You would treat the unknown tinkerer as "incompetent"?

there is no "unknown tinkerer"

Maybe cut/paste does work

yes natural selection operates on duplicated genes but this
is stretching analogy to apply it to software

jacob navia · Feb 15, 2009

1. the genome is not a program

Well, the genome consists of executable code and many other things that
we have no idea of what do they do. ("Junk" DNA)

The executable code comes in chunks called "genes", that contain
instructions for the interpreter. They start with a "start" sequence,
and end with a "stop" instruction. Within those instructions there are
sequences of words composed of 3 base pairs.

Before execution the code is copied into a buffer, and taken out of the
cell nucleus into the interpreter. The buffer is called "messenger RNA",
and is si,gle stranded, as opposed to the double stranded DNA.

The interpreter is a machine called "ribosome", that interprets each
sequence of 3 codons as a sequence of amino-acids to be built into a
protein/enzyme or wahetever. This machine has two active sites: one
where the mRNA arrives, and the other where the nascent amino-acid chain
goes out.

It is the ribosome that interprets (among others) the "start" and "stop"
sequences.

2. the genome was not designed
3. the genome does not have a designer

Well, those statements are the same. Of course I do not know who made
those copy/paste operations 15 million years ago. Ican't tell you
anything about it (him, her, or whatever!). What is surprising is that
an operation that I have done so many times in my programmer life
apperas in another, completely different software context... I can't but
be amazed at this "coincidence".

Note that I do NOT believe in any religiuos god, and I have been an
agnostic all my life.

natural selection is a very wasteful form of "design".
If you want to build your compiler I doubt you can wait a few
million years between releases!

Natural selection works by survival of the fittest. Genes that produce
an advantage in the context where a species is living will be in
organisms that produce more offspring, becoming dominant in the long run.

How natural selection would explain a copy and paste operation is a
mystery (to me). I think that here we have another process at work.

there is no "unknown tinkerer"

Well, how do you explain this proliferation of copy/paste operations in
the genes that control our brain 6 million years ago?

yes natural selection operates on duplicated genes but this
is stretching analogy to apply it to software

Life *IS* software.
1) It has a one dimensional stored code, i.e. a linear sequence of
instructions that is interpreted by a machine (ribosome) producing
living matter.

2) It has a copy mechanism that copies the code in different contexts:
2-1) Code is copied before being executed and the interpreter uses a
copy of the code (the buffer mRNA)
2-2) Code is copied when reproducing either by plain copy (asexual
reproduction) or copy with a merge operation (sexual reproduction).

3) It has a text editor for repairing errors in the copies of the stored
code. This editor is what makes "nick_keighley" be still living by
eliminating many copy errors that would have killed him ages ago.

4) The interpreter understands the commands written in the code in a
symbolic form. There is not a physical connection between the 3 codons
UGG and the tryptophan amino-acid. It is just a *convention* (code) for
tryptophan. Even more evident is the code UAA (or UAG) that means just
"stop", i.e. end of the executable code.

This code is running mostly unchanged in all organisms in this planet,
you included, since 4 000 million years (more or less).

Ben Morse · Feb 15, 2009

The executable code comes in chunks called "genes", that contain
instructions for the interpreter. They start with a "start" sequence,
and end with a "stop" instruction. Within those instructions there are
sequences of words composed of 3 base pairs.

Yet, there is no 'function call' codon. This might explain the
proliferation of what you call 'copy/paste'.

Even someone who was actively and consciously engineering a DNA
sequence would still have to copy subsequences if they wanted to
create a new protein that shared substructure with another protein.
There isn't another way to do it with DNA.

However, we -do- have a different way of doing it with code, and there
are numerous benefits to it. A better way of looking at duplicated
subsequences in DNA would be as functions that are inlined when you
compile to machine code. Of course, when the machine doesn't support
a call stack abstraction, inlining is all you can do.

That said, cute idea! Good luck on your spiritual journey.

James Kanze · Feb 16, 2009

If you read the original message however, it seems that the
programmer of your genetic code (the code you are running) has
used extensively copy/paste to replicate entire portions of an
ape genome around 5-7 million years ago, to modify them later.

That was the point of my message.

You would treat the unknown tinkerer as "incompetent"?

Maybe cut/paste does work

If you've got millions of years to get the program working, can
afford to throw out 99% or more of the programs you write, and
don't mind having a lot of excess baggage only relevant to
earlier versions, maybe. Most companies I've worked for would
not consider generating random variations, testing them, and
then only keeping the ones that worked, an acceptable
development methodology. (Although... it sounds an awful lot
like `test-deiven design'. But I don't think that even its
proponents are arguing for random modifications---and most of
them seem to insist strongly on refactoring, eliminating
redundancies.)

Guest · Feb 16, 2009

jacob said:
jacob said:

I think nobody here can deny that copy and paste is an established
method of software development.

Click to expand...

You know: you have some code that works, and you want to modify it.
You copy it and paste the code somewhere else, then you modify it
leaving the running code in its place until you switch to the new
version.
True, many people are against this fashion of developing software.
The correct (in the abstract) process should be of finding out the
common parts of the code and isolate the changes as far as it is
possible, maintaining a common line.

Click to expand...

The copy/paste is considered harmful.

Click to expand...

But... I was surprised when I read this article in PLOS: [1]

Click to expand...

<quote>
One of the primary agents of genome evolution is gene duplication.
Duplicated genes provide the raw material for the generation of
novel genes and biological functions, which in turn allow the
evolution of organismal complexity and new species. James Sikela
and colleagues set out to compare gene duplications between humans
and four of our closest primate relatives to find the genetic roots
of our evolutionary split from the other great apes. Collecting the
DNA of humans, chimpanzees, bonobos, gorillas, and orangutans from
blood and experimental cell lines, the researchers used microarray
analysis to identify variations in the number of copies of
individual genes among the different species. They analyzed nearly
30,000 human genes and compared their copy numbers in the genomes
of humans and the four great apes.
Overall, Sikela and colleagues found more than 1,000 genes with
lineage-specific changes in copy number, representing 3.4% of the
genes tested. All the great ape species showed more increases than
decreases in gene copy numbers, but relative to the evolutionary
age of each lineage, humans showed the highest number of genes with
increased copy numbers, at 134. Many of these duplicated human
genes are implicated in brain structure and function.

Click to expand...

The gene changes identified in the study, the authors conclude,
likely represent most of the major lineage-specific gene expansions
(or losses) that have taken place since orangutans split from the
other great apes, some 15 million years ago. (Humans diverged from
their closest cousins, the chimp and bonobo, roughly 5 million to 7
million years ago.) And because some of these gene changes were
unique to each of the species examined, they will likely account
for some of the physiological and morphological characteristics
that are unique to each species.
<end quote>

Click to expand...

Apparently the programmer (or programmer team) 15 million years ago
were in a hurry. And copy / paste, as everyone here knows, is not
an accepted method but... it works, and that is all that counts.

Click to expand...

What were they tinkering with?

Click to expand...

"... humans showed the highest number of genes with increased copy
numbers, at 134. Many of these duplicated human genes are
implicated in brain structure and function"

Click to expand...

All those millions of years later, the descendants of those apes,
still running the same code, start to wonder...

Click to expand...

WHAT HAPPENED?

Click to expand...

Copy/Paste might work well when you actually indend to create new
functionality.

Note that some of the "software" we are copies from, has since been
scrapped. Could this be bcause some of the improvements and obvious
bug fixes were never back-ported to the original code?

they never merge different branches. Must be using ClearCase.

Guest · Feb 16, 2009

(e-mail address removed) wrote:

Well, the genome consists of executable code and many other things that
we have no idea of what do they do. ("Junk" DNA)

ok, we are going to have to agree to differ here. DNA is not machine
code, this is an analogy. The genome does not "consist of executable
code".

The executable code comes in chunks called "genes", that contain
instructions for the interpreter. They start with a "start" sequence,
and end with a "stop" instruction. Within those instructions there are
sequences of words composed of 3 base pairs.

Before execution the code is copied into a buffer, and taken out of the
cell nucleus into the interpreter. The buffer is called "messenger RNA",
and is si,gle stranded, as opposed to the double stranded DNA.

The interpreter is a machine called "ribosome", that interprets each
sequence of 3 codons as a sequence of amino-acids to be built into a
protein/enzyme or wahetever. This machine has two active sites: one
where the mRNA arrives, and the other where the nascent amino-acid chain
goes out.

It is the ribosome that interprets (among others) the "start" and "stop"
sequences.

I was vaguely aware of this. And I *still* don't think
the genome is a program. At best it is a recipe.

Well, those statements are the same.

perhaps I should have said "therefore the genome does not have a
designer". Some people stretch "design" to mean "thrown together
in any way what so ever". Hence pebbles on the beach are designed
by the sea. I (and apparently you) consider design to be a conscious
process.

Of course I do not know who made
those copy/paste operations 15 million years ago.

I'm objecting to these assumed conclusions. It has yet
to be demonstracted that there is any "who" at all. So better
to phrase it "we do not know the details of the process(es) that
duplicated parts of the genome..."

I can't tell you
anything about it (him, her, or whatever!).

assuming conclusion...

What is surprising is that
an operation that I have done so many times in my programmer life
apperas in another, completely different software context...

"what is surprising/interesting is that an operation I have
carried out so many times [...] has an analogous form in another
domain"

I can't but be amazed at this "coincidence".

I suppose there are only a limited number of operations
that can be carried out on representaions of information.

Note: Natural Selection is *dumb*. No branch merging, no subroutines.
If software developers wrote code like this they'd be shot.

Note that I do NOT believe in any religiuos god, and I have been an
agnostic all my life.

An agnostic is someone who isn't sure if he's an atheist.
I'm a meta-agnostic as I'm not sure I'm an agnostic.

Natural selection works by survival of the fittest.
partially

Genes that produce
an advantage in the context where a species is living will be in
organisms that produce more offspring, becoming dominant in the long run.

How natural selection would explain a copy and paste operation is a
mystery (to me). I think that here we have another process at work.

the so-called copy-paste is essentially an error in the "copying"
machinary. Just as individual codons don't always get copied correctly
sometimes the copy-this machinary glitches.

NS operates on the diversity it has available. The diversity
is produced by mutation. This can be caused by radiation or chemicals
or simply bad luck in the machinary. Quantum jiggling and so on.

Well, how do you explain this proliferation of copy/paste operations in
the genes that control our brain 6 million years ago?

seriously? Copying errors. You should really move this
to a more biological group. talk.origins used to be quite good
but it's troll/looney infested these days. They used to
have a good web site.

Life *IS* software.
no

1) It has a one dimensional stored code, i.e. a linear sequence of
instructions that is interpreted by a machine (ribosome) producing
living matter.

2) It has a copy mechanism that copies the code in different contexts:
2-1) Code is copied before being executed and the interpreter uses a
copy of the code (the buffer mRNA)
2-2) Code is copied when reproducing either by plain copy (asexual
reproduction) or copy with a merge operation (sexual reproduction).

3) It has a text editor for repairing errors in the copies of the stored
code. This editor is what makes "nick_keighley" be still living by
eliminating many copy errors that would have killed him ages ago.

more like an error correcting code, Hamming distances and all that

4) The interpreter understands the commands written in the code in a
symbolic form. There is not a physical connection between the 3 codons
UGG and the tryptophan amino-acid. It is just a *convention* (code) for
tryptophan. Even more evident is the code UAA (or UAG) that means just
"stop", i.e. end of the executable code.

This code is running mostly unchanged in all organisms in this planet,
you included, since 4 000 million years (more or less).

more like 3Billion I think. I'm still not going to accept life
is a program. This seems like to the man with a hammer everything
is a nail. To a programmer everthing is code.

Your analogy breaks down in that there isn't a simple
codon -> protein map

it's worse than the worst assembler nightmare. Codons overlap,
codons can be read in either direction etc.

Once beyond protein it gets much messier. The same protein
does different things in different cells.

The "DNA code" is "annotated" with methyl groups. This is why
my hair cells don't make blood proteins.

And some things are highly determined by the environment
(this includes the womb). So that the cloned tabby cat (CC)
was not identically marked to the cat it was cloned from
(which leads me to ask "why bother then?")

Guest · Feb 16, 2009

If you've got millions of years to get the program working, can
afford to throw out 99% or more of the programs you write, and
don't mind having a lot of excess baggage only relevant to
earlier versions, maybe.

Windows Vista?

Copy and paste in software development	2	Feb 13, 2009
Which problems do you find in a software development contract?	2	Mar 9, 2022
Hi;	33	Aug 7, 2025
Copy and paste from Excel to GridView	5	May 21, 2010
Copy/Paste HTMl code	2	Feb 23, 2008
Netbeans "form" code is uneditable unless copy/paste into new java file	6	Aug 26, 2012
To Cut,Copy and Paste items to a linked list	0	Mar 4, 2006
Multiple copy and paste thing in Perl	5	May 6, 2007

Copy / Paste in software development

jacob navia

Bo Persson

abcd

James Kanze

jacob navia

Alf P. Steinbach

Kai-Uwe Bux

Alf P. Steinbach

Kai-Uwe Bux

James Kanze

James Kanze

James Kanze

James Kanze

Guest

jacob navia

Ben Morse

James Kanze

Guest

Guest

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads