software engineering, program construction

C

ccc31807

I've been writing software for about ten years, mostly Perl but also
Java, C, Python, some Lisp, Javascript, and the assorted stuff we all
pick up (SQL, HTML, XML, etc.) I've never worked on a big project, my
scripts typically running between several hundred and several thousand
LOC. I've written in a number of different styles, evolving over the
years. After reviewing some of the work I've done in the past couple
of years, rewriting a lot and revising a lot (due to changing data
requirements), I've noticed that I use a particular style.

In the past, the technology I've used seems to influence the style.
For example, at one time I was writing in C, and my Perl code
consisted of modules that acted the same as header files. When I was
writing some Lisp, my Perl code conformed a lot more to a functional
style.

Now, I don't know what I do. I've copied the guts of a program below,
and would like comments from those who might have a lot more
experience than I do.

Essentially, what I do is declare all my variables as global
variables, write the subroutines that manipulate the data, and call
the subroutines one after another. The problem domain consists of data
munging, slicing and dicing data files and preparing reports of one
kind or another.

Thoughts?

Thanks, CC.
---------------------sample of
script-----------------------------------
#declare necessary variables
my ($global1, $global2, $global3, $global4, $global5, $global6);
my (@global1, @global2, @global3);
my (%global1, %global2, %global3, %global4);

#run subroutines
&get_student_info;
&get_letter_info;
&check_site_exists;
&test_hashes;
&make_new_dir;
&create_letters;

#email to sites
my $answer = 'n';
print "Do you want to email the letters to the sites? [y|n] ";
$answer = <STDIN>;
&email_letters if $answer =~ /y/i;
exit(0);

#construct user defined functions
sub get_student_info { ...}
sub get_letter_info {... }
#etc .
 
D

Danny Woods

ccc31807 said:
Essentially, what I do is declare all my variables as global
variables, write the subroutines that manipulate the data, and call
the subroutines one after another. The problem domain consists of data
munging, slicing and dicing data files and preparing reports of one
kind or another.

Thoughts?

This kind of stuff is fine if it's just a script that performs a
specific task, but if it's something that you're likely to have to
revisit or modify, you'll benefit from reducing the amount of global
state you've got and instead handing required state to functions and
returning the transformed data to feed into other functions. Since
you've done some Lisp, you'll be familiar with trying to keep functions
free of side effects, which is a great thing for testing since no
function depends upon external state. Leave the side effects (like I/O)
to tiny functions which are simple and easy to reason about.

I'm generally against cutting a script up for the sake of making it
modular unless I actually believe that I'm going to use that module
somewhere else. If you *do* choose that route, stick the functions in a
package in a .pm file and export the interesting functions with Exporter
('perldoc Exporter' for more information). If you want to go all object
oriented (which doesn't appear to be necessary given the size of the
script), and have a non-ancient version of Perl, have a look at Moose
rather than going down the vanilla 'perltoot' path.

Taking the additional complexity of Exporter or Moose into account,
however, and the size of the script, I'd probably just stick with the
functional refactorings.

Cheers,
Danny.
 
J

Jürgen Exner

ccc31807 said:
In the past, the technology I've used seems to influence the style.
For example, at one time I was writing in C, and my Perl code
consisted of modules that acted the same as header files. When I was
writing some Lisp, my Perl code conformed a lot more to a functional
style.

That is pretty typical. Most programming languages encourage a
particular programming style and if you are using that style for a while
then you will program in that style in whatever language you are using.
Now, I don't know what I do. I've copied the guts of a program below,
and would like comments from those who might have a lot more
experience than I do.

Your question is not about software engineering but about good practices
and common sense for basic programming. Software engineering deals with
how to design and modularize complex software systems and how to design
interfaces between those software components.
The difference is similar to a plumber installing a new bathroom in a
home and an engineer planning and building the water supply for a city
block or a whole city. There are rules and best practices for the
bathroom which he should follow, but it really has little to do with
engineering.
Essentially, what I do is declare all my variables as global
variables,

Bad idea. Don't use globals unless you have a good reason to do so. One
principle of programming is to keep data as local as reasonable such
that code outside of your local area cannot accidently or deliberately
step on that data. This applies even to single function calls.
write the subroutines that manipulate the data, and call
the subroutines one after another. The problem domain consists of data
munging, slicing and dicing data files and preparing reports of one
kind or another.
---------------------sample of
script-----------------------------------
#declare necessary variables
my ($global1, $global2, $global3, $global4, $global5, $global6);
my (@global1, @global2, @global3);
my (%global1, %global2, %global3, %global4);

In addition to the comment above about use of global variables it is
very much frowned upon to use non-descriptive names like that. Large
IT-organizations even have very stringent rules how to compose variable
names (sometimes excessively so), but they always contain a descriptive
part..
#run subroutines
&get_student_info;
&get_letter_info;
&check_site_exists;
&test_hashes;
&make_new_dir;
&create_letters;

Do you know what the '&' does? Do you really, really need that
functionality? If the answer is no to either question, then for crying
out loud drop that ampersand. Or are you stuck with Perl4 fro some odd
reason?

It appears as if your functions don't take arguments and don't return
results, either, but communicate only by side effects on global
variables. That is very poor coding style because it violates the
principle of locality.
#email to sites
my $answer = 'n';
print "Do you want to email the letters to the sites? [y|n] ";
$answer = <STDIN>;
&email_letters if $answer =~ /y/i;

Do you need the current @_ in 'email_letters'? If not, then why are you
passing it to the sub?
exit(0);

#construct user defined functions
sub get_student_info { ...}
sub get_letter_info {... }
#etc .

If you define your subroutines at the beginning or at the end of your
code is mostly a matter of personal preference. But you should most
definitely use parameters and results to pass the necessary arguments
between sub and caller.

jue
 
D

David Filmer

software engineering, program construction
Thoughts?

I HIGHLY recommend the O'Reilly book, _Perl_Best_Practices_, by Dr.
Damian Conway. Learn it, live it, love it.

FWIW, I have never used a global variable in any production program.
I always tightly scope my code (even if the variables have the same
name). For example:

my $dbh = [whatever you do to get a database handle];
my $student_id = shift or die "No student ID\n"; #Not
Damian-approved, FWIW
my %student = %{ get_student_info($student_id, $dbh) };

print "The e-mail address for student $student_id is $student
{'email'}\n";

...

sub get_student_info {
my( $student_id, $dbh ) = @_;
my $sql = qq{
SELECT firstname, lastname, email
FROM student_table
WHERE student_id = $student_id
};
return $dbh->selectrow_hashref( $sql );
}

Now the sub is purely generic - you can move it to a module and call
it from any program.

Oh, and I HIGHLY recommend the O'Reilly book, _Perl_Best_Practices_,
by Damian Conway.

And, did I mention _Perl_Best_Practices_, by Damian Conway?
 
C

ccc31807

This kind of stuff is fine if it's just a script that performs a
specific task, but if it's something that you're likely to have to
revisit or modify, you'll benefit from reducing the amount of global
state you've got and instead handing required state to functions and
returning the transformed data to feed into other functions. Since
you've done some Lisp, you'll be familiar with trying to keep functions
free of side effects, which is a great thing for testing since no
function depends upon external state.  Leave the side effects (like I/O)
to tiny functions which are simple and easy to reason about.

Many of these scripts produce standard reports that essentially are
static over long periods of time. The reason I gravitated toward
variables global to the script is because I had trouble visualizing
them when I scattered them. With all the declarations in one place, I
can see the variable names and types.

I agree with you about Lisp, but honestly, writing Perl in a
functional style was more trouble that it was worth, given the limited
scope of these kinds of scripts. FWIW, I like the fact that with Lisp
you can play with your functions on the top level and save them when
you have what you want. However, it's a different style of programming
with different kinds of tasks.
I'm generally against cutting a script up for the sake of making it
modular unless I actually believe that I'm going to use that module
somewhere else.  If you *do* choose that route, stick the functions in a
package in a .pm file and export the interesting functions with Exporter
('perldoc Exporter' for more information).  If you want to go all object
oriented (which doesn't appear to be necessary given the size of the
script), and have a non-ancient version of Perl, have a look at Moose
rather than going down the vanilla 'perltoot' path.

I agree with your statement about modules. When I develop web apps, I
do indeed modularize the functions, typically writing HTML.pm, SQL.mo,
and CONSOLE.pm for the HTML, SQL, and program specific logic.

I've never written any OO Perl, although I've studied both Conway's
'OO Perl' and Schwartz's 'Learning PORM'. If I were going to write a
large OO app, I'd use Java (because Perl's lack of enforced
disciplines makes it too easy to ignore SWE practices.)
Taking the additional complexity of Exporter or Moose into account,
however, and the size of the script, I'd probably just stick with the
functional refactorings.

That's one of the points of my post. Typically, I very much disfavor
cutting and pasting, but with the 'modular' subroutines, I find myself
cutting and pasting previously written subroutines between scripts. My
conscience bothers me a little bit when I do this, but for the little
bit of programming I do it's not hard to just cut and paste and it
really does lead to appropriate refactoring.

CC.
 
D

Danny Woods

ccc31807 said:
I agree with you about Lisp, but honestly, writing Perl in a
functional style was more trouble that it was worth, given the limited
scope of these kinds of scripts. FWIW, I like the fact that with Lisp
you can play with your functions on the top level and save them when
you have what you want. However, it's a different style of programming
with different kinds of tasks.

I'm inclined to disagree here: Perl is a great language for functional
programming, as Mark Jason Dominus attests in his excellent book, Higher
Order Perl (legitimate free PDF at http://hop.perl.plover.com/#free).
Of course, you're entirely at liberty to disagree! Some languages
(those without closures and first-class functions) make it difficult to
program functionally, but the benefits (to me) outweigh the mental
gymnastics required to think in a functional manner.
That's one of the points of my post. Typically, I very much disfavor
cutting and pasting, but with the 'modular' subroutines, I find myself
cutting and pasting previously written subroutines between scripts. My
conscience bothers me a little bit when I do this, but for the little
bit of programming I do it's not hard to just cut and paste and it
really does lead to appropriate refactoring.

Lots of big businesses don't like refactoring (every change to code,
however innocent, has the potential for breakage, and business-types
don't like it when the rationale is code purity). That said, nothing
stops you from taking code that you realise you're about to cut and
paste into your script and instead paste it into a module for future
use: if you're going to re-use it once, chances are you'll think about
it again.

Cheers,
Danny.
 
D

DouglasG.Wilson

   sub get_student_info {
      my( $student_id, $dbh ) = @_;
      my $sql = qq{
         SELECT    firstname, lastname, email
         FROM      student_table
         WHERE     student_id = $student_id
      };
      return $dbh->selectrow_hashref( $sql );
   }

I'd recommend using placeholders in that SQL (insert comment about
little Bobby Tables and security), and possibly using prepare_cached
and/or the Memoize module if that function is called a lot.

HTH,
-Doug
 
J

Jürgen Exner

ccc31807 said:
[...] The reason I gravitated toward
variables global to the script is because I had trouble visualizing
them when I scattered them. With all the declarations in one place, I
can see the variable names and types.

Wrong way of thinking. Don't think in terms of variables. Instead think
in terms of information/data flow between functions.

If function f() computes a data item x, and function g() needs
information from this data item, then f() needs to return this data item
and g() needs to receive it:

g(f(....), ....);
or
my $thisresult = f(...);
g($thisresult);
or
f(..., $thisresult);
g($thisresult);

jue
 
J

Jürgen Exner

Danny Woods said:
I'm inclined to disagree here: Perl is a great language for functional
programming, as Mark Jason Dominus attests in his excellent book, Higher
Order Perl (legitimate free PDF at http://hop.perl.plover.com/#free).
Of course, you're entirely at liberty to disagree! Some languages
(those without closures and first-class functions) make it difficult to
program functionally, but the benefits (to me) outweigh the mental
gymnastics required to think in a functional manner.

While I agree I think the OP is nowhere near using HOFs, closures, or
functions as first-class objects. He is still struggling with the
basics.

jue
 
C

ccc31807

If function f() computes a data item x, and function g() needs
information from this data item, then f() needs to return this data item
and g() needs to receive it:

        g(f(....), ....);
or
        my $thisresult = f(...);
        g($thisresult);
or      
        f(..., $thisresult);
        g($thisresult);

jue

Or, maybe...

my %information_hash;
%build_hash;
%test_hash;
&use_hash;

.... where %information_hash is a data structure that contains tens of
thousands of records four layers deep, like this:
$information_hash{$level}{$site}{$term}

.... and

sub use_hash
{
foreach my $level (keys %information_hash)
{
foreach my $site (keys %{$information_hash{$level}})
{
foreach my $term (keys %{$information_hash{$level}{$site}
{$term}})
{
print "Dear $information_hash{$level}{$site}{$term}
{'name'} ...";
}
}
}
}

Frankly, it seems a lot easier to use one global hash than to either
pass a copy to a function or pass a reference to a function.

Yesterday, I completed a task that used an input file of appox 300,000
records, analyzed the data, created 258 charts (as gifs) and printed
the charts to a PDF document for distribution. In this case, I created
several subroutines to shake and bake the data, and used just one
global hash to throughout. Is this so wrong?

CC.
 
C

ccc31807

? If you need the keys as well then a

    while (my ($level, $level_hash) = each %information_hash) {

loop might be more appropriate.

I am using the keys to do other things, so yes, I need the keys, but
thanks for your suggestion. I find myself doing this a lot, so I'm
open to making it easier.
This is only because you've never been bitten by using globals when you
shouldn't have; probably because you've only ever written relatively
small programs, and never come back to a program six months later to add
a new feature.

Okay, let's consider an evolving programming style. Suppose you wrote
a very short script that looks like this:

my %hash;
#step_one
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
#step_two
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
exit(0);

Now, suppose you rewrote it like this:

my %hash;
step_one();
step_two();
exit(0);
sub step_one
{
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
}
sub step_two
{
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
}
exit(0);

Ben, I could make the case that the second version is clearer and
easier to maintain that the first version, even though the second
version breaks the rules and the first version doesn't. What's the
REAL difference between the two versions? And why should the
decomposition of code into subroutines NECESSARILY require a
functional style and variable localization?
If it works, then no, it isn't 'wrong'. It is bad style, though. If you
had written (say) the chart-creating code as a module with functions
that took parameters, then when you need another set of charts tomorrow
you could reuse it. As it is you have to copy/paste and modify it for
your new set of global data structures.

You are 100 percent correct. I don't know if I will ever run this
script again. If I do, I'll certainly revise it (as I wrote it like
version one above).
That may be practical when your programs are only ever run once, but
quickly becomes less so when you have many programs in long-term use
with almost-but-not-quite the same subroutine in: when you find a bug,
how are you going to find all the places you've copy/pasted it to
correct it?

Again, I agree totally. However, I'm a lot more interested in the
architecture of a script than the other issues that have been
mentioned. With this particular issue, I try my best to follow the DRY
practice, and the second or third time I write the same thing, I often
will place it in a function and call it from there.

CC.
 
U

Uri Guttman

c> I am using the keys to do other things, so yes, I need the keys, but
c> thanks for your suggestion. I find myself doing this a lot, so I'm
c> open to making it easier.

c> Okay, let's consider an evolving programming style. Suppose you wrote
c> a very short script that looks like this:

c> my %hash;
c> step_one();
c> step_two();

you pass no args to those subs. they are using the file global %hash

c> Ben, I could make the case that the second version is clearer and
c> easier to maintain that the first version, even though the second
c> version breaks the rules and the first version doesn't. What's the
c> REAL difference between the two versions? And why should the
c> decomposition of code into subroutines NECESSARILY require a
c> functional style and variable localization?

you didn't listen to the rules. it isn't about just globals or passing
args. it is WHEN and HOW do you choose to do either. a single global
hash is FINE in some cases as are a few top level file lexicals. doing
it ALL the time with every variable is bad. you need to learn the
balance of when to choose globals. the issue is blindly using globals
all over the place and using too many of them vs judicious use of
globals. you just about can't write any decent sized program without
file level globals so it isn't a hard and fast rule. the goal is to keep
the number of globals to a nice and easy to understand/maintain
minimum. sometimes that minimum can be zero.

c> Again, I agree totally. However, I'm a lot more interested in the
c> architecture of a script than the other issues that have been
c> mentioned. With this particular issue, I try my best to follow the DRY
c> practice, and the second or third time I write the same thing, I often
c> will place it in a function and call it from there.

it is easier to get into the habit of writing subs for all logical
sections. loading/parsing a file is a logical section. processing that
data is a logical section, etc. then you can pass in file names for
args, or the ref to the hash for an arg, etc. one way to avoid file
lexicals (not that i do this all the time) is to use a top level driver
sub

main() ;
exit ;

sub main {

my $file = shift @ARGV || 'default_name' ;
my $parsed_data = parse_file( $file ) ;
my $results = process_data( $parsed_data ) ;
output_report( $results ) ;
}

etc.

isolation is the goal. now no one can mess with those structures by
accident or even by ill will. they will be garbage collected when the
sub main exits which can be a good thing too in some cases. the logical
steps are clear and easy to follow. it is easy to add more steps or
modify each step. the subs could be reused if needed with data coming
from other places as they aren't hardwired to the file level
lexicals. the advantages of that style of code are major and the losses
for using too many globals are also big. there is a reason this style
has been developed, taught and espoused for years. it isn't a random
event. small programs develop into large ones all the time. bad habits
in small programs don't get changed when the scale of the program
grows. bad habits will kill you in larger programs so it is best to
practice good habits at all program scales, small and large.

uri
 
J

Jürgen Exner

ccc31807 said:
Or, maybe...

my %information_hash;
%build_hash;
%test_hash;
&use_hash;

Most definitely not. For once the second and third line will give you
syntax errors.

And even if you meant to write
my %information_hash;
&build_hash;
&test_hash;
&use_hash;
then
1: why on earth are you passing @_ to those functions?
2: why aren't you passing the hash to those functions instead:
build_hash(\%information_hash);
&test_hash(\%information_hash);
&use_hash(\%information_hash);
Then it would be obvious what data those functions are processing.
Otherwise you don't know.
... where %information_hash is a data structure that contains tens of
thousands of records four layers deep, like this:
$information_hash{$level}{$site}{$term}

... and

sub use_hash

Just do
my %information_hash = %{$_[0]};
and the rest of your code remains unchanged except that now you are not
operating on a global variable.
{
foreach my $level (keys %information_hash)
{
foreach my $site (keys %{$information_hash{$level}})
{
foreach my $term (keys %{$information_hash{$level}{$site}
{$term}})
{
print "Dear $information_hash{$level}{$site}{$term}
{'name'} ...";
}
}
}
}

Frankly, it seems a lot easier to use one global hash than to either
pass a copy to a function or pass a reference to a function.

As long as you are just installing a new shower head you can do that.
Once you start designing the plumbing for a high rise or a city block it
will bite you in your extended rear. Better get used to good practices
early. Unlearning bad habits is very hard.
Yesterday, I completed a task that used an input file of appox 300,000
records, analyzed the data, created 258 charts (as gifs) and printed
the charts to a PDF document for distribution. In this case, I created
several subroutines to shake and bake the data, and used just one
global hash to throughout. Is this so wrong?

In general: yes. If a student of mine did that we would have a very
serious talk about very basic programming principles.

jue
 
J

Jürgen Exner

ccc31807 said:
I am using the keys to do other things, so yes, I need the keys, but
thanks for your suggestion. I find myself doing this a lot, so I'm
open to making it easier.


Okay, let's consider an evolving programming style. Suppose you wrote
a very short script that looks like this:

my %hash;
#step_one
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
#step_two
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
exit(0);

Now, suppose you rewrote it like this:

my %hash;
step_one();
step_two();
exit(0);
sub step_one
{
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
}
sub step_two
{
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
}
exit(0);

I wouldn't. I would write this as

my %data; #no point in naming a hash hash
%data = get_data();
sort_data(%data);
print_data(%data);
Maybe with references as appropriate
..
Then
- I know what data items those functions are working on
- I know which data items those functions are _NOT_ working on (if it is
not in the parameter list then they don't touch them)
- and i can use the same functions to process a second or third or
fourth set of data, which maybe has a different input format and
therefore requires a different get_other_data() sub, but my internal
representation is the same such that I can reuse the sort_data() and
print_data() functions.
Ben, I could make the case that the second version is clearer and
easier to maintain that the first version,

I wouldn't even say that.

You are 100 percent correct. I don't know if I will ever run this
script again. If I do, I'll certainly revise it (as I wrote it like
version one above).

It is very hard to unlearn bad habits and even harder to refactor poorly
written code.

jue
 
D

Dr.Ruud

David said:
FWIW, I have never used a global variable in any production program.

I often have one (and only one) called "%default".

In that hash there are all kinds of defaults, like for optional
parameters, the date-time at the start, etc.
 
C

ccc31807

I'm not trying to be a devil's advocate, but simply to learn. I posted
the question because I didn't know the answer, and I've learned some
things from the responses.
I wouldn't. I would write this as

Okay, you declare the variable.
        my %data; #no point in naming a hash hash

Then, you initialize the variable by calling a function.
        %data = get_data();

Here is where we differ. Suppose %data is a very large hash. What do
you gain by passing a copy to the function? And if you pass a
reference to the function, you have to dereference it in the function.
To me, it just seems easier to modify the top level variable in the
function and protect yourself my giving the function a descriptive
name.
        sort_data(%data);

Same comment as above. Why create an extra copy of the data structure,
and why worry about dereferencing a reference?
        print_data(%data);
Maybe with references as appropriate.
Then - I know what data items those functions are working on
- I know which data items those functions are _NOT_ working on (if it is
not in the parameter list then they don't touch them)
- and i can use the same functions to process a second or third or
fourth set of data, which maybe has a different input format and
therefore requires a different get_other_data() sub, but my internal
representation is the same such that I can reuse the sort_data() and
print_data() functions.

As Uri pointed out, you have to smart rather than consistent. If I
have repeated code, then I put it into a function and pass an argument
to the function. That way, I localize the logic and create a modular
structure.
It is very hard to unlearn bad habits and even harder to refactor poorly
written code.

This is true. However, sometimes the circumstances can determine
whether a particular habit is good or bad. There are 'global' bad
habits (like smoking, drinking, and cheating on your wife) and then
there are habits that are bad only because of the specific
circumstance.

CC.
 
C

ccc31807

Points taken, and thanks for demonstrating the code. This has been
helpful to me. (I'm not stupid, merely ignorant.)

CC.
I might write that program something like this:

    #!/usr/bin/perl
    my $data = read_data("in.dat");
    write_csv($data, "out.csv");
    sub read_data {
        my ($file) = @_;
        open my $IN, "<", $file
            or die "can't read '$file': $!";
        my %hash;
        while (<$IN>) {
            chomp;
            my ($val1, $name, $id) = split /,/;
            $hash{$val1} = {name => $name, id => $id};
        }
        return \%hash;
    }
    sub write_csv {
        my ($data, $file) = @_;
        open my $OUT, ">", $file
            or die "can't write to '$file': $!";
        for my $key (sort keys %$hash) {
            my $person = $hash{$key};
            print $OUT qq("$person->{name}","$person->{id}"\n);
        }
        close $OUT or die "writing to '$file' failed: $!";
    }
The point here is that there's no *point* decomposing your code into
subs unless you're going to make those subs self-contained. You aren't
gaining anything.
 
J

Jürgen Exner

ccc31807 said:
Here is where we differ. Suppose %data is a very large hash. What do
you gain by passing a copy to the function? And if you pass a
reference to the function, you have to dereference it in the function.
To me, it just seems easier to modify the top level variable in the
function and protect yourself my giving the function a descriptive
name.

Same comment as above. Why create an extra copy of the data structure,
and why worry about dereferencing a reference?

No, you don't get it. The difference is that I can see what data items
the function is consuming and producing by simply looking at the
function call. I do not have to dig in some obscure documentation that
is not in sync with the actual code, I do not have to maintain comments
"Reads global variable X and Y, modifies Y and Z" which we all know are
never correct anyway, I do not have to inspect the code of the function
to know what data items it is using.

Instead the documentation of the input and output of that function is
right there in front of my eyes at each and every function call in form
of parameters. It's simply part of writing self-documenting code.
As Uri pointed out, you have to smart rather than consistent.

And I agree with his comments. But those justified uses of global
variable are rare. Your globals do not belong in that category.
If I
have repeated code, then I put it into a function and pass an argument
to the function. That way, I localize the logic and create a modular
structure.

At that point the benefits of parameters become obvious to you, too, but
they start way earlier. You just haven't seen the light yet.

jue
 
C

ccc31807

No, you don't get it. The difference is that I can see what data items
the function is consuming and producing by simply looking at the
function call. I do not have to dig in some obscure documentation that
is not in sync with the actual code, I do not have to maintain comments
"Reads global variable X and Y, modifies Y and Z" which we all know are
never correct anyway, I do not have to inspect the code of the function
to know what data items it is using.

You didn't answer the question, or maybe I didn't make the question
clear enough.

If you have a large data structure that you need to both modify and
read (at different times), why make a copy of the data structure to
pass as an argument to the function only to return the copy to
overwrite the original? In other words, why do this:
my %data;
%data = get_data(%data);
%data = modify_data(%data);

or this:
my %data;
get_data(\%data);
modify_data(\%data);

when you can just as clearly do this:
my %data;
&get_data;
&modify_data;

You do not have to dig in obscure documentation because you can see
clearly what the function is doing by the descriptive name. Also, I
guess I want to stress that this is a specific script for a specific
purpose, a special purpose tool as it were, and not a general purpose
script that applies to a general problem. FOR THIS LIMITED PURPOSE I
just don't see the point of passing an argument either by reference or
by value.

I agree that for more substantial programs for a more general purpose
that the principles expressed (about variable localization, passing
identifiable arguments, returning specific values, etc.) are best
practices.

CC.
 
U

Uri Guttman

c> You didn't answer the question, or maybe I didn't make the question
c> clear enough.

the latter.

c> If you have a large data structure that you need to both modify and
c> read (at different times), why make a copy of the data structure to
c> pass as an argument to the function only to return the copy to
c> overwrite the original? In other words, why do this:
c> my %data;
c> %data = get_data(%data);
c> %data = modify_data(%data);

you don't do that. you pass in a ref to the hash. no copies are made and
you still isolate the code logic so it doesn't directly access any
globals. having globals (really static data in this case) isn't bad but
accessing them in a global way is bad and a very bad habit you need to
unlearn.

c> or this:
c> my %data;
c> get_data(\%data);
c> modify_data(\%data);

c> when you can just as clearly do this:
c> my %data;
c> &get_data;
c> &modify_data;

STOP USING & for sub calls. this is true regardless of the globals!!

and this is a case where isolation wins over your incorrect perception
of clarity. if you wanted to move the subs elsewhere or reuse them you
need the better api of passing in the hash ref. your code is hardwired
to only use those variables and only be in that file with the
globals. do you see the difference? you will now argue that this code
will only live here. that is bogus as in other cases it won't stay there
forever. then you have to rewrite all the code. learn the better api
design now and practice it.

c> You do not have to dig in obscure documentation because you can see
c> clearly what the function is doing by the descriptive name. Also, I
c> guess I want to stress that this is a specific script for a specific
c> purpose, a special purpose tool as it were, and not a general purpose
c> script that applies to a general problem. FOR THIS LIMITED PURPOSE I
c> just don't see the point of passing an argument either by reference or
c> by value.

no. you don't get it. descriptive names can lie. you can't move the code
as i said above. that is worse than your claim of 'clarity' which is
actually false.

c> I agree that for more substantial programs for a more general purpose
c> that the principles expressed (about variable localization, passing
c> identifiable arguments, returning specific values, etc.) are best
c> practices.

that is true for all sizes of programs. you haven't demonstrated the
ability to code in that style and keep defending (poorly at that) why
your global style is good or even better. it isn't good or better in any
circumstances. passing in the refs is much cleaner, more mainatainable,
more extendable, easier to move, easier to reuse, etc. there isn't a
loss in the bunch there.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top