software engineering, program construction

Discussion in 'Perl Misc' started by ccc31807, Oct 28, 2009.

  1. ccc31807

    ccc31807 Guest

    I've been writing software for about ten years, mostly Perl but also
    Java, C, Python, some Lisp, Javascript, and the assorted stuff we all
    pick up (SQL, HTML, XML, etc.) I've never worked on a big project, my
    scripts typically running between several hundred and several thousand
    LOC. I've written in a number of different styles, evolving over the
    years. After reviewing some of the work I've done in the past couple
    of years, rewriting a lot and revising a lot (due to changing data
    requirements), I've noticed that I use a particular style.

    In the past, the technology I've used seems to influence the style.
    For example, at one time I was writing in C, and my Perl code
    consisted of modules that acted the same as header files. When I was
    writing some Lisp, my Perl code conformed a lot more to a functional
    style.

    Now, I don't know what I do. I've copied the guts of a program below,
    and would like comments from those who might have a lot more
    experience than I do.

    Essentially, what I do is declare all my variables as global
    variables, write the subroutines that manipulate the data, and call
    the subroutines one after another. The problem domain consists of data
    munging, slicing and dicing data files and preparing reports of one
    kind or another.

    Thoughts?

    Thanks, CC.
    ---------------------sample of
    script-----------------------------------
    #declare necessary variables
    my ($global1, $global2, $global3, $global4, $global5, $global6);
    my (@global1, @global2, @global3);
    my (%global1, %global2, %global3, %global4);

    #run subroutines
    &get_student_info;
    &get_letter_info;
    &check_site_exists;
    &test_hashes;
    &make_new_dir;
    &create_letters;

    #email to sites
    my $answer = 'n';
    print "Do you want to email the letters to the sites? [y|n] ";
    $answer = <STDIN>;
    &email_letters if $answer =~ /y/i;
    exit(0);

    #construct user defined functions
    sub get_student_info { ...}
    sub get_letter_info {... }
    #etc .
    ccc31807, Oct 28, 2009
    #1
    1. Advertising

  2. ccc31807

    Danny Woods Guest

    ccc31807 <> writes:

    > Essentially, what I do is declare all my variables as global
    > variables, write the subroutines that manipulate the data, and call
    > the subroutines one after another. The problem domain consists of data
    > munging, slicing and dicing data files and preparing reports of one
    > kind or another.
    >
    > Thoughts?


    This kind of stuff is fine if it's just a script that performs a
    specific task, but if it's something that you're likely to have to
    revisit or modify, you'll benefit from reducing the amount of global
    state you've got and instead handing required state to functions and
    returning the transformed data to feed into other functions. Since
    you've done some Lisp, you'll be familiar with trying to keep functions
    free of side effects, which is a great thing for testing since no
    function depends upon external state. Leave the side effects (like I/O)
    to tiny functions which are simple and easy to reason about.

    I'm generally against cutting a script up for the sake of making it
    modular unless I actually believe that I'm going to use that module
    somewhere else. If you *do* choose that route, stick the functions in a
    package in a .pm file and export the interesting functions with Exporter
    ('perldoc Exporter' for more information). If you want to go all object
    oriented (which doesn't appear to be necessary given the size of the
    script), and have a non-ancient version of Perl, have a look at Moose
    rather than going down the vanilla 'perltoot' path.

    Taking the additional complexity of Exporter or Moose into account,
    however, and the size of the script, I'd probably just stick with the
    functional refactorings.

    Cheers,
    Danny.
    Danny Woods, Oct 28, 2009
    #2
    1. Advertising

  3. ccc31807 <> wrote:
    >In the past, the technology I've used seems to influence the style.
    >For example, at one time I was writing in C, and my Perl code
    >consisted of modules that acted the same as header files. When I was
    >writing some Lisp, my Perl code conformed a lot more to a functional
    >style.


    That is pretty typical. Most programming languages encourage a
    particular programming style and if you are using that style for a while
    then you will program in that style in whatever language you are using.

    >Now, I don't know what I do. I've copied the guts of a program below,
    >and would like comments from those who might have a lot more
    >experience than I do.


    Your question is not about software engineering but about good practices
    and common sense for basic programming. Software engineering deals with
    how to design and modularize complex software systems and how to design
    interfaces between those software components.
    The difference is similar to a plumber installing a new bathroom in a
    home and an engineer planning and building the water supply for a city
    block or a whole city. There are rules and best practices for the
    bathroom which he should follow, but it really has little to do with
    engineering.

    >Essentially, what I do is declare all my variables as global
    >variables,


    Bad idea. Don't use globals unless you have a good reason to do so. One
    principle of programming is to keep data as local as reasonable such
    that code outside of your local area cannot accidently or deliberately
    step on that data. This applies even to single function calls.

    >write the subroutines that manipulate the data, and call
    >the subroutines one after another. The problem domain consists of data
    >munging, slicing and dicing data files and preparing reports of one
    >kind or another.


    >---------------------sample of
    >script-----------------------------------
    >#declare necessary variables
    >my ($global1, $global2, $global3, $global4, $global5, $global6);
    >my (@global1, @global2, @global3);
    >my (%global1, %global2, %global3, %global4);


    In addition to the comment above about use of global variables it is
    very much frowned upon to use non-descriptive names like that. Large
    IT-organizations even have very stringent rules how to compose variable
    names (sometimes excessively so), but they always contain a descriptive
    part..

    >#run subroutines
    >&get_student_info;
    >&get_letter_info;
    >&check_site_exists;
    >&test_hashes;
    >&make_new_dir;
    >&create_letters;


    Do you know what the '&' does? Do you really, really need that
    functionality? If the answer is no to either question, then for crying
    out loud drop that ampersand. Or are you stuck with Perl4 fro some odd
    reason?

    It appears as if your functions don't take arguments and don't return
    results, either, but communicate only by side effects on global
    variables. That is very poor coding style because it violates the
    principle of locality.

    >#email to sites
    >my $answer = 'n';
    >print "Do you want to email the letters to the sites? [y|n] ";
    >$answer = <STDIN>;
    >&email_letters if $answer =~ /y/i;


    Do you need the current @_ in 'email_letters'? If not, then why are you
    passing it to the sub?

    >exit(0);
    >
    >#construct user defined functions
    >sub get_student_info { ...}
    >sub get_letter_info {... }
    >#etc .


    If you define your subroutines at the beginning or at the end of your
    code is mostly a matter of personal preference. But you should most
    definitely use parameters and results to pass the necessary arguments
    between sub and caller.

    jue
    Jürgen Exner, Oct 28, 2009
    #3
  4. ccc31807

    David Filmer Guest

    On Oct 28, 7:00 am, ccc31807 <> wrote:
    > software engineering, program construction
    > Thoughts?


    I HIGHLY recommend the O'Reilly book, _Perl_Best_Practices_, by Dr.
    Damian Conway. Learn it, live it, love it.

    FWIW, I have never used a global variable in any production program.
    I always tightly scope my code (even if the variables have the same
    name). For example:

    my $dbh = [whatever you do to get a database handle];
    my $student_id = shift or die "No student ID\n"; #Not
    Damian-approved, FWIW
    my %student = %{ get_student_info($student_id, $dbh) };

    print "The e-mail address for student $student_id is $student
    {'email'}\n";

    ...

    sub get_student_info {
    my( $student_id, $dbh ) = @_;
    my $sql = qq{
    SELECT firstname, lastname, email
    FROM student_table
    WHERE student_id = $student_id
    };
    return $dbh->selectrow_hashref( $sql );
    }

    Now the sub is purely generic - you can move it to a module and call
    it from any program.

    Oh, and I HIGHLY recommend the O'Reilly book, _Perl_Best_Practices_,
    by Damian Conway.

    And, did I mention _Perl_Best_Practices_, by Damian Conway?
    David Filmer, Oct 28, 2009
    #4
  5. ccc31807

    ccc31807 Guest

    On Oct 28, 11:22 am, Danny Woods <> wrote:
    > This kind of stuff is fine if it's just a script that performs a
    > specific task, but if it's something that you're likely to have to
    > revisit or modify, you'll benefit from reducing the amount of global
    > state you've got and instead handing required state to functions and
    > returning the transformed data to feed into other functions. Since
    > you've done some Lisp, you'll be familiar with trying to keep functions
    > free of side effects, which is a great thing for testing since no
    > function depends upon external state.  Leave the side effects (like I/O)
    > to tiny functions which are simple and easy to reason about.


    Many of these scripts produce standard reports that essentially are
    static over long periods of time. The reason I gravitated toward
    variables global to the script is because I had trouble visualizing
    them when I scattered them. With all the declarations in one place, I
    can see the variable names and types.

    I agree with you about Lisp, but honestly, writing Perl in a
    functional style was more trouble that it was worth, given the limited
    scope of these kinds of scripts. FWIW, I like the fact that with Lisp
    you can play with your functions on the top level and save them when
    you have what you want. However, it's a different style of programming
    with different kinds of tasks.

    > I'm generally against cutting a script up for the sake of making it
    > modular unless I actually believe that I'm going to use that module
    > somewhere else.  If you *do* choose that route, stick the functions in a
    > package in a .pm file and export the interesting functions with Exporter
    > ('perldoc Exporter' for more information).  If you want to go all object
    > oriented (which doesn't appear to be necessary given the size of the
    > script), and have a non-ancient version of Perl, have a look at Moose
    > rather than going down the vanilla 'perltoot' path.


    I agree with your statement about modules. When I develop web apps, I
    do indeed modularize the functions, typically writing HTML.pm, SQL.mo,
    and CONSOLE.pm for the HTML, SQL, and program specific logic.

    I've never written any OO Perl, although I've studied both Conway's
    'OO Perl' and Schwartz's 'Learning PORM'. If I were going to write a
    large OO app, I'd use Java (because Perl's lack of enforced
    disciplines makes it too easy to ignore SWE practices.)

    > Taking the additional complexity of Exporter or Moose into account,
    > however, and the size of the script, I'd probably just stick with the
    > functional refactorings.


    That's one of the points of my post. Typically, I very much disfavor
    cutting and pasting, but with the 'modular' subroutines, I find myself
    cutting and pasting previously written subroutines between scripts. My
    conscience bothers me a little bit when I do this, but for the little
    bit of programming I do it's not hard to just cut and paste and it
    really does lead to appropriate refactoring.

    CC.
    ccc31807, Oct 28, 2009
    #5
  6. ccc31807

    Danny Woods Guest

    ccc31807 <> writes:

    > I agree with you about Lisp, but honestly, writing Perl in a
    > functional style was more trouble that it was worth, given the limited
    > scope of these kinds of scripts. FWIW, I like the fact that with Lisp
    > you can play with your functions on the top level and save them when
    > you have what you want. However, it's a different style of programming
    > with different kinds of tasks.


    I'm inclined to disagree here: Perl is a great language for functional
    programming, as Mark Jason Dominus attests in his excellent book, Higher
    Order Perl (legitimate free PDF at http://hop.perl.plover.com/#free).
    Of course, you're entirely at liberty to disagree! Some languages
    (those without closures and first-class functions) make it difficult to
    program functionally, but the benefits (to me) outweigh the mental
    gymnastics required to think in a functional manner.

    > That's one of the points of my post. Typically, I very much disfavor
    > cutting and pasting, but with the 'modular' subroutines, I find myself
    > cutting and pasting previously written subroutines between scripts. My
    > conscience bothers me a little bit when I do this, but for the little
    > bit of programming I do it's not hard to just cut and paste and it
    > really does lead to appropriate refactoring.


    Lots of big businesses don't like refactoring (every change to code,
    however innocent, has the potential for breakage, and business-types
    don't like it when the rationale is code purity). That said, nothing
    stops you from taking code that you realise you're about to cut and
    paste into your script and instead paste it into a module for future
    use: if you're going to re-use it once, chances are you'll think about
    it again.

    Cheers,
    Danny.
    Danny Woods, Oct 28, 2009
    #6
  7. ccc31807

    Guest

    On Oct 28, 8:52 am, David Filmer <> wrote:
    >
    >    sub get_student_info {
    >       my( $student_id, $dbh ) = @_;
    >       my $sql = qq{
    >          SELECT    firstname, lastname, email
    >          FROM      student_table
    >          WHERE     student_id = $student_id
    >       };
    >       return $dbh->selectrow_hashref( $sql );
    >    }


    I'd recommend using placeholders in that SQL (insert comment about
    little Bobby Tables and security), and possibly using prepare_cached
    and/or the Memoize module if that function is called a lot.

    HTH,
    -Doug
    , Oct 28, 2009
    #7
  8. ccc31807 <> wrote:
    >[...] The reason I gravitated toward
    >variables global to the script is because I had trouble visualizing
    >them when I scattered them. With all the declarations in one place, I
    >can see the variable names and types.


    Wrong way of thinking. Don't think in terms of variables. Instead think
    in terms of information/data flow between functions.

    If function f() computes a data item x, and function g() needs
    information from this data item, then f() needs to return this data item
    and g() needs to receive it:

    g(f(....), ....);
    or
    my $thisresult = f(...);
    g($thisresult);
    or
    f(..., $thisresult);
    g($thisresult);

    jue
    Jürgen Exner, Oct 28, 2009
    #8
  9. Danny Woods <> wrote:
    >I'm inclined to disagree here: Perl is a great language for functional
    >programming, as Mark Jason Dominus attests in his excellent book, Higher
    >Order Perl (legitimate free PDF at http://hop.perl.plover.com/#free).
    >Of course, you're entirely at liberty to disagree! Some languages
    >(those without closures and first-class functions) make it difficult to
    >program functionally, but the benefits (to me) outweigh the mental
    >gymnastics required to think in a functional manner.


    While I agree I think the OP is nowhere near using HOFs, closures, or
    functions as first-class objects. He is still struggling with the
    basics.

    jue
    Jürgen Exner, Oct 28, 2009
    #9
  10. ccc31807

    ccc31807 Guest

    On Oct 28, 1:27 pm, Jürgen Exner <> wrote:
    > If function f() computes a data item x, and function g() needs
    > information from this data item, then f() needs to return this data item
    > and g() needs to receive it:
    >
    >         g(f(....), ....);
    > or
    >         my $thisresult = f(...);
    >         g($thisresult);
    > or      
    >         f(..., $thisresult);
    >         g($thisresult);
    >
    > jue


    Or, maybe...

    my %information_hash;
    %build_hash;
    %test_hash;
    &use_hash;

    .... where %information_hash is a data structure that contains tens of
    thousands of records four layers deep, like this:
    $information_hash{$level}{$site}{$term}

    .... and

    sub use_hash
    {
    foreach my $level (keys %information_hash)
    {
    foreach my $site (keys %{$information_hash{$level}})
    {
    foreach my $term (keys %{$information_hash{$level}{$site}
    {$term}})
    {
    print "Dear $information_hash{$level}{$site}{$term}
    {'name'} ...";
    }
    }
    }
    }

    Frankly, it seems a lot easier to use one global hash than to either
    pass a copy to a function or pass a reference to a function.

    Yesterday, I completed a task that used an input file of appox 300,000
    records, analyzed the data, created 258 charts (as gifs) and printed
    the charts to a PDF document for distribution. In this case, I created
    several subroutines to shake and bake the data, and used just one
    global hash to throughout. Is this so wrong?

    CC.
    ccc31807, Oct 28, 2009
    #10
  11. ccc31807

    ccc31807 Guest

    On Oct 28, 3:27 pm, Ben Morrow <> wrote:
    > ? If you need the keys as well then a
    >
    >     while (my ($level, $level_hash) = each %information_hash) {
    >
    > loop might be more appropriate.


    I am using the keys to do other things, so yes, I need the keys, but
    thanks for your suggestion. I find myself doing this a lot, so I'm
    open to making it easier.

    > This is only because you've never been bitten by using globals when you
    > shouldn't have; probably because you've only ever written relatively
    > small programs, and never come back to a program six months later to add
    > a new feature.


    Okay, let's consider an evolving programming style. Suppose you wrote
    a very short script that looks like this:

    my %hash;
    #step_one
    open IN, '<', 'in.dat';
    while (<IN>)
    {
    chomp;
    my ($val1, $val2, $val3 ...) = split /,/;
    $hash($val1} = (name => $val2, $id => $val3 ...);
    }
    close IN;
    #step_two
    open OUT, '>', 'out.csv';
    foreach my $key (sort keys %hash)
    {
    print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
    }
    close OUT;
    exit(0);

    Now, suppose you rewrote it like this:

    my %hash;
    step_one();
    step_two();
    exit(0);
    sub step_one
    {
    open IN, '<', 'in.dat';
    while (<IN>)
    {
    chomp;
    my ($val1, $val2, $val3 ...) = split /,/;
    $hash($val1} = (name => $val2, $id => $val3 ...);
    }
    close IN;
    }
    sub step_two
    {
    open OUT, '>', 'out.csv';
    foreach my $key (sort keys %hash)
    {
    print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
    }
    close OUT;
    }
    exit(0);

    Ben, I could make the case that the second version is clearer and
    easier to maintain that the first version, even though the second
    version breaks the rules and the first version doesn't. What's the
    REAL difference between the two versions? And why should the
    decomposition of code into subroutines NECESSARILY require a
    functional style and variable localization?

    > If it works, then no, it isn't 'wrong'. It is bad style, though. If you
    > had written (say) the chart-creating code as a module with functions
    > that took parameters, then when you need another set of charts tomorrow
    > you could reuse it. As it is you have to copy/paste and modify it for
    > your new set of global data structures.


    You are 100 percent correct. I don't know if I will ever run this
    script again. If I do, I'll certainly revise it (as I wrote it like
    version one above).

    > That may be practical when your programs are only ever run once, but
    > quickly becomes less so when you have many programs in long-term use
    > with almost-but-not-quite the same subroutine in: when you find a bug,
    > how are you going to find all the places you've copy/pasted it to
    > correct it?


    Again, I agree totally. However, I'm a lot more interested in the
    architecture of a script than the other issues that have been
    mentioned. With this particular issue, I try my best to follow the DRY
    practice, and the second or third time I write the same thing, I often
    will place it in a function and call it from there.

    CC.
    ccc31807, Oct 28, 2009
    #11
  12. ccc31807

    Uri Guttman Guest

    >>>>> "c" == ccc31807 <> writes:

    c> On Oct 28, 3:27 pm, Ben Morrow <> wrote:
    >> ? If you need the keys as well then a
    >>
    >>     while (my ($level, $level_hash) = each %information_hash) {
    >>
    >> loop might be more appropriate.


    c> I am using the keys to do other things, so yes, I need the keys, but
    c> thanks for your suggestion. I find myself doing this a lot, so I'm
    c> open to making it easier.

    >> This is only because you've never been bitten by using globals when you
    >> shouldn't have; probably because you've only ever written relatively
    >> small programs, and never come back to a program six months later to add
    >> a new feature.


    c> Okay, let's consider an evolving programming style. Suppose you wrote
    c> a very short script that looks like this:

    c> my %hash;
    c> step_one();
    c> step_two();

    you pass no args to those subs. they are using the file global %hash

    c> Ben, I could make the case that the second version is clearer and
    c> easier to maintain that the first version, even though the second
    c> version breaks the rules and the first version doesn't. What's the
    c> REAL difference between the two versions? And why should the
    c> decomposition of code into subroutines NECESSARILY require a
    c> functional style and variable localization?

    you didn't listen to the rules. it isn't about just globals or passing
    args. it is WHEN and HOW do you choose to do either. a single global
    hash is FINE in some cases as are a few top level file lexicals. doing
    it ALL the time with every variable is bad. you need to learn the
    balance of when to choose globals. the issue is blindly using globals
    all over the place and using too many of them vs judicious use of
    globals. you just about can't write any decent sized program without
    file level globals so it isn't a hard and fast rule. the goal is to keep
    the number of globals to a nice and easy to understand/maintain
    minimum. sometimes that minimum can be zero.

    >> That may be practical when your programs are only ever run once, but
    >> quickly becomes less so when you have many programs in long-term use
    >> with almost-but-not-quite the same subroutine in: when you find a bug,
    >> how are you going to find all the places you've copy/pasted it to
    >> correct it?


    c> Again, I agree totally. However, I'm a lot more interested in the
    c> architecture of a script than the other issues that have been
    c> mentioned. With this particular issue, I try my best to follow the DRY
    c> practice, and the second or third time I write the same thing, I often
    c> will place it in a function and call it from there.

    it is easier to get into the habit of writing subs for all logical
    sections. loading/parsing a file is a logical section. processing that
    data is a logical section, etc. then you can pass in file names for
    args, or the ref to the hash for an arg, etc. one way to avoid file
    lexicals (not that i do this all the time) is to use a top level driver
    sub

    main() ;
    exit ;

    sub main {

    my $file = shift @ARGV || 'default_name' ;
    my $parsed_data = parse_file( $file ) ;
    my $results = process_data( $parsed_data ) ;
    output_report( $results ) ;
    }

    etc.

    isolation is the goal. now no one can mess with those structures by
    accident or even by ill will. they will be garbage collected when the
    sub main exits which can be a good thing too in some cases. the logical
    steps are clear and easy to follow. it is easy to add more steps or
    modify each step. the subs could be reused if needed with data coming
    from other places as they aren't hardwired to the file level
    lexicals. the advantages of that style of code are major and the losses
    for using too many globals are also big. there is a reason this style
    has been developed, taught and espoused for years. it isn't a random
    event. small programs develop into large ones all the time. bad habits
    in small programs don't get changed when the scale of the program
    grows. bad habits will kill you in larger programs so it is best to
    practice good habits at all program scales, small and large.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Oct 28, 2009
    #12
  13. ccc31807 <> wrote:
    >On Oct 28, 1:27 pm, Jürgen Exner <> wrote:
    >> If function f() computes a data item x, and function g() needs
    >> information from this data item, then f() needs to return this data item
    >> and g() needs to receive it:
    >>
    >>         g(f(....), ....);
    >> or
    >>         my $thisresult = f(...);
    >>         g($thisresult);
    >> or      
    >>         f(..., $thisresult);
    >>         g($thisresult);
    >>
    >> jue

    >
    >Or, maybe...
    >
    >my %information_hash;
    >%build_hash;
    >%test_hash;
    >&use_hash;


    Most definitely not. For once the second and third line will give you
    syntax errors.

    And even if you meant to write
    my %information_hash;
    &build_hash;
    &test_hash;
    &use_hash;
    then
    1: why on earth are you passing @_ to those functions?
    2: why aren't you passing the hash to those functions instead:
    build_hash(\%information_hash);
    &test_hash(\%information_hash);
    &use_hash(\%information_hash);
    Then it would be obvious what data those functions are processing.
    Otherwise you don't know.

    >... where %information_hash is a data structure that contains tens of
    >thousands of records four layers deep, like this:
    >$information_hash{$level}{$site}{$term}
    >
    >... and
    >
    >sub use_hash


    Just do
    my %information_hash = %{$_[0]};
    and the rest of your code remains unchanged except that now you are not
    operating on a global variable.

    >{
    > foreach my $level (keys %information_hash)
    > {
    > foreach my $site (keys %{$information_hash{$level}})
    > {
    > foreach my $term (keys %{$information_hash{$level}{$site}
    >{$term}})
    > {
    > print "Dear $information_hash{$level}{$site}{$term}
    >{'name'} ...";
    > }
    > }
    > }
    >}
    >
    >Frankly, it seems a lot easier to use one global hash than to either
    >pass a copy to a function or pass a reference to a function.


    As long as you are just installing a new shower head you can do that.
    Once you start designing the plumbing for a high rise or a city block it
    will bite you in your extended rear. Better get used to good practices
    early. Unlearning bad habits is very hard.

    >Yesterday, I completed a task that used an input file of appox 300,000
    >records, analyzed the data, created 258 charts (as gifs) and printed
    >the charts to a PDF document for distribution. In this case, I created
    >several subroutines to shake and bake the data, and used just one
    >global hash to throughout. Is this so wrong?


    In general: yes. If a student of mine did that we would have a very
    serious talk about very basic programming principles.

    jue
    Jürgen Exner, Oct 28, 2009
    #13
  14. ccc31807 <> wrote:
    >On Oct 28, 3:27 pm, Ben Morrow <> wrote:
    >> ? If you need the keys as well then a
    >>
    >>     while (my ($level, $level_hash) = each %information_hash) {
    >>
    >> loop might be more appropriate.

    >
    >I am using the keys to do other things, so yes, I need the keys, but
    >thanks for your suggestion. I find myself doing this a lot, so I'm
    >open to making it easier.
    >
    >> This is only because you've never been bitten by using globals when you
    >> shouldn't have; probably because you've only ever written relatively
    >> small programs, and never come back to a program six months later to add
    >> a new feature.

    >
    >Okay, let's consider an evolving programming style. Suppose you wrote
    >a very short script that looks like this:
    >
    >my %hash;
    >#step_one
    >open IN, '<', 'in.dat';
    >while (<IN>)
    >{
    > chomp;
    > my ($val1, $val2, $val3 ...) = split /,/;
    > $hash($val1} = (name => $val2, $id => $val3 ...);
    >}
    >close IN;
    >#step_two
    >open OUT, '>', 'out.csv';
    >foreach my $key (sort keys %hash)
    >{
    > print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
    >}
    >close OUT;
    >exit(0);
    >
    >Now, suppose you rewrote it like this:
    >
    >my %hash;
    >step_one();
    >step_two();
    >exit(0);
    >sub step_one
    >{
    > open IN, '<', 'in.dat';
    > while (<IN>)
    > {
    > chomp;
    > my ($val1, $val2, $val3 ...) = split /,/;
    > $hash($val1} = (name => $val2, $id => $val3 ...);
    > }
    > close IN;
    >}
    >sub step_two
    >{
    > open OUT, '>', 'out.csv';
    > foreach my $key (sort keys %hash)
    > {
    > print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
    > }
    > close OUT;
    >}
    >exit(0);


    I wouldn't. I would write this as

    my %data; #no point in naming a hash hash
    %data = get_data();
    sort_data(%data);
    print_data(%data);
    Maybe with references as appropriate
    ..
    Then
    - I know what data items those functions are working on
    - I know which data items those functions are _NOT_ working on (if it is
    not in the parameter list then they don't touch them)
    - and i can use the same functions to process a second or third or
    fourth set of data, which maybe has a different input format and
    therefore requires a different get_other_data() sub, but my internal
    representation is the same such that I can reuse the sort_data() and
    print_data() functions.

    >Ben, I could make the case that the second version is clearer and
    >easier to maintain that the first version,


    I wouldn't even say that.


    >You are 100 percent correct. I don't know if I will ever run this
    >script again. If I do, I'll certainly revise it (as I wrote it like
    >version one above).


    It is very hard to unlearn bad habits and even harder to refactor poorly
    written code.

    jue
    Jürgen Exner, Oct 28, 2009
    #14
  15. ccc31807

    Dr.Ruud Guest

    David Filmer wrote:

    > FWIW, I have never used a global variable in any production program.


    I often have one (and only one) called "%default".

    In that hash there are all kinds of defaults, like for optional
    parameters, the date-time at the start, etc.

    --
    Ruud
    Dr.Ruud, Oct 29, 2009
    #15
  16. ccc31807

    ccc31807 Guest

    On Oct 28, 5:03 pm, Jürgen Exner <> wrote:

    I'm not trying to be a devil's advocate, but simply to learn. I posted
    the question because I didn't know the answer, and I've learned some
    things from the responses.

    > I wouldn't. I would write this as


    Okay, you declare the variable.
    >         my %data; #no point in naming a hash hash


    Then, you initialize the variable by calling a function.
    >         %data = get_data();


    Here is where we differ. Suppose %data is a very large hash. What do
    you gain by passing a copy to the function? And if you pass a
    reference to the function, you have to dereference it in the function.
    To me, it just seems easier to modify the top level variable in the
    function and protect yourself my giving the function a descriptive
    name.
    >         sort_data(%data);


    Same comment as above. Why create an extra copy of the data structure,
    and why worry about dereferencing a reference?
    >         print_data(%data);


    > Maybe with references as appropriate.
    > Then - I know what data items those functions are working on
    > - I know which data items those functions are _NOT_ working on (if it is
    > not in the parameter list then they don't touch them)
    > - and i can use the same functions to process a second or third or
    > fourth set of data, which maybe has a different input format and
    > therefore requires a different get_other_data() sub, but my internal
    > representation is the same such that I can reuse the sort_data() and
    > print_data() functions.


    As Uri pointed out, you have to smart rather than consistent. If I
    have repeated code, then I put it into a function and pass an argument
    to the function. That way, I localize the logic and create a modular
    structure.

    > It is very hard to unlearn bad habits and even harder to refactor poorly
    > written code.


    This is true. However, sometimes the circumstances can determine
    whether a particular habit is good or bad. There are 'global' bad
    habits (like smoking, drinking, and cheating on your wife) and then
    there are habits that are bad only because of the specific
    circumstance.

    CC.
    ccc31807, Oct 29, 2009
    #16
  17. ccc31807

    ccc31807 Guest

    On Oct 28, 5:34 pm, Ben Morrow <> wrote:

    Points taken, and thanks for demonstrating the code. This has been
    helpful to me. (I'm not stupid, merely ignorant.)

    CC.

    > I might write that program something like this:
    >
    >     #!/usr/bin/perl
    >     my $data = read_data("in.dat");
    >     write_csv($data, "out.csv");
    >     sub read_data {
    >         my ($file) = @_;
    >         open my $IN, "<", $file
    >             or die "can't read '$file': $!";
    >         my %hash;
    >         while (<$IN>) {
    >             chomp;
    >             my ($val1, $name, $id) = split /,/;
    >             $hash{$val1} = {name => $name, id => $id};
    >         }
    >         return \%hash;
    >     }
    >     sub write_csv {
    >         my ($data, $file) = @_;
    >         open my $OUT, ">", $file
    >             or die "can't write to '$file': $!";
    >         for my $key (sort keys %$hash) {
    >             my $person = $hash{$key};
    >             print $OUT qq("$person->{name}","$person->{id}"\n);
    >         }
    >         close $OUT or die "writing to '$file' failed: $!";
    >     }


    > The point here is that there's no *point* decomposing your code into
    > subs unless you're going to make those subs self-contained. You aren't
    > gaining anything.


    > Hmm, you're still thinking about 'this script'. You need to be thinking
    > 'What will I be doing tomorrow, next week, next year? Can I take some of
    > this and make it reusable, to save myself some work tomorrow?'.


    > What about the second or third time you write a given function? Do you
    > place it in a module and import it from there? Once you start doing that
    > regularly, you'll start to see why relying on global state just makes
    > things harder in the long run.
    >
    > Ben
    ccc31807, Oct 29, 2009
    #17
  18. ccc31807 <> wrote:
    >On Oct 28, 5:03 pm, Jürgen Exner <> wrote:
    >Then, you initialize the variable by calling a function.
    >>         %data = get_data();

    >
    >Here is where we differ. Suppose %data is a very large hash. What do
    >you gain by passing a copy to the function? And if you pass a
    >reference to the function, you have to dereference it in the function.
    >To me, it just seems easier to modify the top level variable in the
    >function and protect yourself my giving the function a descriptive
    >name.
    >>         sort_data(%data);

    >
    >Same comment as above. Why create an extra copy of the data structure,
    >and why worry about dereferencing a reference?


    No, you don't get it. The difference is that I can see what data items
    the function is consuming and producing by simply looking at the
    function call. I do not have to dig in some obscure documentation that
    is not in sync with the actual code, I do not have to maintain comments
    "Reads global variable X and Y, modifies Y and Z" which we all know are
    never correct anyway, I do not have to inspect the code of the function
    to know what data items it is using.

    Instead the documentation of the input and output of that function is
    right there in front of my eyes at each and every function call in form
    of parameters. It's simply part of writing self-documenting code.

    >As Uri pointed out, you have to smart rather than consistent.


    And I agree with his comments. But those justified uses of global
    variable are rare. Your globals do not belong in that category.

    >If I
    >have repeated code, then I put it into a function and pass an argument
    >to the function. That way, I localize the logic and create a modular
    >structure.


    At that point the benefits of parameters become obvious to you, too, but
    they start way earlier. You just haven't seen the light yet.

    jue
    Jürgen Exner, Oct 29, 2009
    #18
  19. ccc31807

    ccc31807 Guest

    On Oct 29, 10:10 am, Jürgen Exner <> wrote:
    > No, you don't get it. The difference is that I can see what data items
    > the function is consuming and producing by simply looking at the
    > function call. I do not have to dig in some obscure documentation that
    > is not in sync with the actual code, I do not have to maintain comments
    > "Reads global variable X and Y, modifies Y and Z" which we all know are
    > never correct anyway, I do not have to inspect the code of the function
    > to know what data items it is using.


    You didn't answer the question, or maybe I didn't make the question
    clear enough.

    If you have a large data structure that you need to both modify and
    read (at different times), why make a copy of the data structure to
    pass as an argument to the function only to return the copy to
    overwrite the original? In other words, why do this:
    my %data;
    %data = get_data(%data);
    %data = modify_data(%data);

    or this:
    my %data;
    get_data(\%data);
    modify_data(\%data);

    when you can just as clearly do this:
    my %data;
    &get_data;
    &modify_data;

    You do not have to dig in obscure documentation because you can see
    clearly what the function is doing by the descriptive name. Also, I
    guess I want to stress that this is a specific script for a specific
    purpose, a special purpose tool as it were, and not a general purpose
    script that applies to a general problem. FOR THIS LIMITED PURPOSE I
    just don't see the point of passing an argument either by reference or
    by value.

    I agree that for more substantial programs for a more general purpose
    that the principles expressed (about variable localization, passing
    identifiable arguments, returning specific values, etc.) are best
    practices.

    CC.
    ccc31807, Oct 29, 2009
    #19
  20. ccc31807

    Uri Guttman Guest

    >>>>> "c" == ccc31807 <> writes:

    c> You didn't answer the question, or maybe I didn't make the question
    c> clear enough.

    the latter.

    c> If you have a large data structure that you need to both modify and
    c> read (at different times), why make a copy of the data structure to
    c> pass as an argument to the function only to return the copy to
    c> overwrite the original? In other words, why do this:
    c> my %data;
    c> %data = get_data(%data);
    c> %data = modify_data(%data);

    you don't do that. you pass in a ref to the hash. no copies are made and
    you still isolate the code logic so it doesn't directly access any
    globals. having globals (really static data in this case) isn't bad but
    accessing them in a global way is bad and a very bad habit you need to
    unlearn.

    c> or this:
    c> my %data;
    c> get_data(\%data);
    c> modify_data(\%data);

    c> when you can just as clearly do this:
    c> my %data;
    c> &get_data;
    c> &modify_data;

    STOP USING & for sub calls. this is true regardless of the globals!!

    and this is a case where isolation wins over your incorrect perception
    of clarity. if you wanted to move the subs elsewhere or reuse them you
    need the better api of passing in the hash ref. your code is hardwired
    to only use those variables and only be in that file with the
    globals. do you see the difference? you will now argue that this code
    will only live here. that is bogus as in other cases it won't stay there
    forever. then you have to rewrite all the code. learn the better api
    design now and practice it.

    c> You do not have to dig in obscure documentation because you can see
    c> clearly what the function is doing by the descriptive name. Also, I
    c> guess I want to stress that this is a specific script for a specific
    c> purpose, a special purpose tool as it were, and not a general purpose
    c> script that applies to a general problem. FOR THIS LIMITED PURPOSE I
    c> just don't see the point of passing an argument either by reference or
    c> by value.

    no. you don't get it. descriptive names can lie. you can't move the code
    as i said above. that is worse than your claim of 'clarity' which is
    actually false.

    c> I agree that for more substantial programs for a more general purpose
    c> that the principles expressed (about variable localization, passing
    c> identifiable arguments, returning specific values, etc.) are best
    c> practices.

    that is true for all sizes of programs. you haven't demonstrated the
    ability to code in that style and keep defending (poorly at that) why
    your global style is good or even better. it isn't good or better in any
    circumstances. passing in the refs is much cleaner, more mainatainable,
    more extendable, easier to move, easier to reuse, etc. there isn't a
    loss in the bunch there.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Oct 29, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page