Initialising a hash

Discussion in 'Perl Misc' started by Dave Saville, Feb 5, 2014.

  1. This is a design question: How should 'small, mostly static key/ value
    database which occasionally need to be changed be stored and
    accessed?'. The obvious choices would be

    - store them in a text file in some "program-independent"
    format, similar to existing database of this kind, eg,
    services or protocols

    - put the database directly into the code in a suitable way

    Link with a third-party module which has a more-or-less up-to-date and
    more-or-less maintained or unmaintained list internally hard-coded into
    it and provides a more-or-less cumbersome and inefficient interface for
    accessing it seems to be a singularly 'efficient' way to combine the
    drawbacks of both approaches without providing the advantages of either
    and "But someone else wrote that!" is not really a reason to use the
    internally hard-coded list in this way: It can be transformed into a real
    database file easily or alternatively, it can be put directly into the
    code via copy'n'paste, thus yiedling either "can be maintained with
    regard for the code using it, could use different versions on different
    occasions (eg, a localized map), can be used by any kind of code" or
    "everything together in a single file which can be edited easily".

    I'd gladly listen to a more rational argument in favour of using the
    'module with hard-coded list in it' than "It already exists".
     
    Rainer Weikusat, Feb 6, 2014
    #21
    1. Advertisements

  2. Dave Saville

    Jim Gibson Guest

    The reasons to use HTTP::Status instead writing your own include, but
    are not limited to, the following:

    1. It already exists, so I won't have to waste my time re-writing it,
    and I can get on with the actual problem I am trying to solve.

    2. Somebody else wrote it, and they are probably a better programmer
    than I and more knowledgable about HTTP status codes.

    3. The module is complete and I won't have to worry about tracking down
    every possible status code or searching down the relevant RFCs.

    4. The CPAN module comes with documentation built-in.

    5. The CPAN module comes with test code to ensure it has been installed
    correctly. These tests are performed automatically when I install the
    module, and can be rerun at any time very easily. Tests are performed
    automatically on any module that is submitted to CPAN, and on platforms
    for which I do not have access.

    6. Documentation is available on the web.

    7. I only have to put one line in my program to use it.

    8. Lots of people use the module, so there is a good chance that any
    bugs will be fixed. (That is certainly not true for the programs I
    write, for which I am usually the only user.)

    9. If the module does need maintenance, there is a good chance the
    module author will do it.

    10. HTTP status codes haven't changed in a long time, and probably
    won't change in a non-compatible manner (e.g., more codes may be added,
    but the current codes are not likely to change.)


    Your aversion to using CPAN modules is well-known to readers of this
    news group. I do not share that opinion, and your constant harping
    against third party software becomes a little tiresome. Maybe you could
    just limit yourself to critiquing the specifics of individual modules
    instead of making blanket statements against all of them.

    Lots of people have put in a lot of time and energy providing CPAN
    modules. I even wrote a couple myself, and any reports of bugs in those
    would immediately spur me into action to try to fix them. I suspect
    most CPAN module authors would feel the same.

    Of course, there are no doubt bad CPAN modules, and lots that have been
    abandoned by their authors. It is alway a problem separating the wheat
    from the chaff. Discussion on Usenet is one way to discover good
    modules and bad ones. Generalization that any use of third-party
    modules is bad serves no purpose.

    Thank you for your many positive contributions to this group. You are
    obviously very knowledgable about Perl.
     
    Jim Gibson, Feb 7, 2014
    #22
    1. Advertisements

  3. Not applicable since HTTP::Status is just window dressing on top of an
    otherwise inaccessible hash table. The hash table can be put into 'other
    code' via copy'n'paste which makes using it less work than going through
    the procedural interface.
    The 'bulk' of HTTP::Status is

    sub status_message ($) { $StatusCode{$_[0]}; }

    according to a clpm chestnut, this is bad because it means passing an
    array to status_message will result in the array size being interpreted
    as status code instead of the first element of the array (this is also
    undocumented). Hash lookups in Perl aren't exactly rocket science, so
    your statement doesn't really seem to be applicable here, either.
    Not applicable since you can use the table the module contains, as I've
    already written a couple of times. Also, that you're willing to believe
    that the table is complete doesn't mean it actually is so and there's
    still related problem of how to update it in case this becomes
    necessary.
    If you aren't using the module, you won't need its documentation.
    There's nothing to test here, since we're still talking about a hash
    lookup.
    See above.
    This is wrong on two accounts:

    1. You are 'putting' all the lines contained in the module in your
    program, you just don't have to write them.

    2. You still need to call the function which does the hash lookup for
    you everytime you want to use it.
    Not applicable in this case.
    Unsubstantiated belief. According to my experience, this is also usually
    wrong: If the code is found lacking, chances are that this is the case
    because you're using it in a way the author doesn't which means you get
    to fix it yourself (not that I'd mind doing that -- if the issue affects
    me, I'm naturally in the position to do something about it).
    I have no idea what this is supposed to express here.
    This is a statement about me and not about the module in question. It is
    also a wrong statement as I was criticizing a specific module and have
    praised other modules in the past, eg, most-recently, File::Temp.
    Lost of people have put a lot of time and energy into all kinds of tasks
    such as 'building a perpetuum mobile' or 'making gold out of lesser
    materials'.

    [...]
    I usually have the exact opposite impression: It is assumed than
    "downloading something from the internet" would be a worthy end in
    itself, IOW, anything someone else wrote and published on the web is
    prima facie better than anything not published on the web. This is
    illogical, not the least because "putting something on CPAN" comes with
    a considerable, bureaucratic/ administrative overhead and not everyone
    has so much spare time available and even if he has it, he might not
    care that much about "getting his name printed on the web". It doesn't
    follow that all these people are "inferior programmers" because they
    don't want to spend time on this (and vice versa).

    I'd rather advocate making intelligent choices here based on a default]
    policy of "if in doubt, don't use it": Make sure that you get the cream
    but stay clear of everything which is just 'average'/ mediocre.
     
    Rainer Weikusat, Feb 7, 2014
    #23
  4. Dave Saville

    Keith Keller Guest

    Clearly not, since the "intelligent choice" would be to submit a patch,
    not write your own code from scratch.

    --keith
     
    Keith Keller, Feb 8, 2014
    #24
  5. Am 07.02.2014 21:47, schrieb Rainer Weikusat:
    Even copy+paste means more work than just using HTTP::Status.
    It's not like you try to get Angular.js to run.

    ok, that's still shorter than any hand written code and much easier to
    maintain.
    a method call status_message
    is very unlikely to get called in list contet
    Well, HTTP::Status doesn't claim to be rocket science, it solves one
    little problem, but it solves ist.
    You can use it of course.
    But do you update it also?
    And even if you use it, it's still more time, more work and less
    maintainable than to use this module direct.

    Copy+Paste makes it even worse
    So you have to write your own documentation.
    Come on, it's past 1975, an undocumented, untested part of our code is a
    bug, nor a feature.
    It still has to work.
    Even when installed in customers environment or whereever.
    Again, 1975 is over.
    So, it's better if you copy+paste them??
    What's the problem.
    If you want, you can add any Memorize-ic function to avoid it.

    That's very applicable.

    O.K., so you copy+pastied it,
    what is better than?
    File::Temp is a module usually distributed,
    it's part of perldoc.
    Wow, you are using standard installed modules.
    HTTP::Status works and does what it claims.
    Year, people tried to build perpetuum mobile, well, they nether worked
    and aren't part of CPAN IIRC.
    Nobody said that,
    but most of the time it's worth to use it,
    and often it's at least better than to reinvent it,
    for at least at the first days,
    and only some problems are worth it to invest a manweek+ just because we
    are paronoid about internet solutions.

    Also, explain me, why copy+paste is superior than to just use what you
    intend to c+p?

    So, you have all this overhead available?
    Lucky guy!


    Be glad in 1975.

    Greetings,
    Janeks
     
    Janek Schleicher, Feb 8, 2014
    #25
  6. Even as high as DH5 we still sometimes see deliberate
    dishonesty, as when someone picks out minor points of an
    argument and refutes those. Sometimes the spirit in which this
    is done makes it more of a sophisticated form of ad hominem than
    actual refutation.

    http://www.paulgraham.com/disagree.html

    The exercise above, however, isn't even about 'picking out minor points'
    but really just a wholesale, malicious fabrication, in order to get to

    DH0. Name-calling.

    [...]

    u r a fag!!!!!!!!!!

    But it's important to realize that more articulate name-calling
    has just as little weight. A comment like

    The author is a self-important dilettante.

    is really nothing more than a pretentious version of "u r a
    fag."
    [same location]
     
    Rainer Weikusat, Feb 9, 2014
    #26
  7. This seems like a patently absurd statement to me ...
    The exact equivalents would:

    print(status_message(200));

    vs

    print($http_status{200});

    That's less code for the second case. Of course, using a name a la

    $htsc{200}

    would be as possible. But the difference is really rather negligent on
    the 'coding side'. Using direct hash lookups is - however - faster,
    which might have noticable effect when 'processing log files' as these
    can easily be 'large' (millions of lines) and it offers some otherwise
    unavailable features, eg, get access to a list of all status codes.

    [intelligible text restored]
    If it was called as a method (which would require creating an
    out-of-module constructor for HTTP::Status objects first), this wouldn't
    matter because prototypes aren't checked on method calls. Also, I didn't
    said that I'd agree this was a problem (I'm using prototypes myself,
    just not when posting code here), I just pointed out that there's a
    double standard applied here: Someone who posts code with prototypes
    used for something else than "enabling hand-optimized 'chaotic' syntax"
    is invariable told that he shouldn't be doing this. If this was a
    measure of 'bad code quality', it should surely apply to code written by
    "module authors" as well.
    In the context of "too difficult to solve the problem without the
    module", this doesn't seem to make sense. I didn't claim that the hash
    lookup subroutine wouldn't work, just that 'using a hash lookup
    instead' would amount to using a basic language feature.
    I was specifically writing about the problem of maintaining the
    table. This is easy for a separate database file, fairly easy if the
    code using the table contains it directly, and problematic when the
    table comes 'internally hard-coded' with some software distribution as
    any 'naive' update of that would overwrite local changes.

    You've now made the 'more work and less maintainable' statement
    twice. Would you care to elaborate why you consider this to be true?
    It's not self-evident to me.
    Documentation about 'using hashes' already comes with Perl. Also, your
    idea about 'documentation' seems a little unrealistic to me: Assuming
    that documentation is written at all (I assure you that the people who
    pay the salaries for me an my colleagues considers this a better spare
    time activity which should rather be avoided in order to perform 'useful
    work'), it is pretty certain that no one will ever read it on the
    grounds that this is just too much effort
    I'm unaware if 'software written in 1975' wasn't supposed to work as I
    was three by that time. I'm, however, inclined to believe that it was.
    In case the useful part of a module is smaller than the module itself
    and more useful without it, yes. There's - of course - also the option
    of putting the table into a proper database file and use that.
    The function call itself? How?
    There is no non-trivial code in this module. Because of this, the
    assumption isn't applicable here.
    Because 9. is an 'lone' assertion without any to substantiate it.
    I don't understand what this is supposed to communicate in the given
    context.
    The statement about me you deleted contained the claim that I was
    principally opposed to using third-party developed modules. This is
    wrong, as pointed out above.
    That's besides the point: The original statement was "lots of people put
    lots of time and effort into developing CPAN modules". As shown by the
    two examples itself, this is irrelevant in the given context: The result
    matters.
    Well, you just said that.
     
    Rainer Weikusat, Feb 9, 2014
    #27
  8. Dave Saville

    John Bokma Guest

    Assuming that one needs this, converting the status code of many lines
    to something human readable [1], the solution is simple: since per your
    earlier post the look up table (no matter where it resides) can be
    incomplete, one has to take care of the case the status code not being
    in the table (or even not valid at all). If an expensive way of
    looking-up the code is used, one can use a hash table to cache such look
    ups (as earlier has been suggested: memoization) and build the table
    "on-the-fly".

    unless ( exists $readable_for{ $status_code } ) {
    my $readable = expensive_look_up( $status_code );
    defined $readable
    or $readable = "No readable message for $status_code";
    $readabe_for{ $status_code } = $readable;
    ....
    }

    [1] my experience with log files is generating summaries, and hence
    doing the look up is done on the summarized data (of course).
     
    John Bokma, Feb 9, 2014
    #28
  9. Assuming that the interface provided by the module is not only ill
    thought-out but positively unusable, an obvious workaround which enables
    linking with it without actually having to use it - of course - to make an
    incremental copy of the locked-in, hard-coded hash table at run time: In
    this way, even the 'benefit' of "not having to write code which
    does hash lookups" goes obviously out of the window.
    'Memoization' in the context of Perl (AFAIK) comes from Mark Jason
    Dominus and it is supposed to refer to 'transparently avoiding repeated,
    expensive calculations by putting previously calculated results into a
    hash table using the arguments given to the function in question as a
    key and replacing a call to the "backing function" with a hash lookup
    whenever possible'. IIRC, his examples were naively implemented
    recursive functions, eg, calculating Fibnoacci numbers recursively. The
    key point here is that this can be made to work transparently with the
    help of function-manipulating functions (as one would expect in the
    context of a book named "Higher Order Perl"):

    We would like to tell Perl that we want caching behavior enabled
    on a function. Perl should be able to perform the required
    transformation automatically. Such automatic transformation of
    a function to add caching behavior is called memoization and the
    function is said to be memoized.3

    [...]

    3. The term memoization was coined in 1968 by Donald Michie.
    [Higher Order Perl, p. 69]

    Considering this, your use of the term for the proposed hack seems
    inapproriate to me.
     
    Rainer Weikusat, Feb 11, 2014
    #29
  10. Dave Saville

    John Bokma Guest

    John Bokma, Feb 11, 2014
    #30
  11. I don't quite understand what this is supposed to communicate in the
    given context, especially considering that this context is sufficiently
    context-less that nobody can possibly relate this text either to the
    original procedure ('individually hand-coded, incremental copying of an
    otherwise inaccessible hash table') nor to my text which quoted a
    definition of the term 'memoization' from Higher Order Perl and stated
    that this definition was not applicable to said procedure and hence, the
    term improperly used. Insofar you'd like to dispute the merit of the
    definition in it's own right or the applicability of the text I quoted
    in a discussion centering on Perl, I suggest that you do that instead.
     
    Rainer Weikusat, Feb 11, 2014
    #31
  12. Dave Saville

    John Bokma Guest

    I stand corrected, let's call my "hack": caching of the results of an
    expensive look up in order to avoid such expensive look ups in the near
    future (or 'caching') since it's (the code given) not a function.
    Which is (also) called 'automatic memoization' to distinguish this from
    explicit manually coded caching in the function itself, which also is
    called memoization by people who distinguish between those two using
    aforementioned terms (I mistakenly (?) assumed that this was the problem
    you had with my "hack"; that it wasn't automatic memoization, but
    explicitly manually coded).
     
    John Bokma, Feb 11, 2014
    #32
  13. Dave Saville

    Justin C Guest

    Oh look, he's fallen back into the kill file.


    Justin.
     
    Justin C, Feb 12, 2014
    #33
  14. As a matter of fact, I didn't write the text Mr Keller pierced together
    from disparate parts of a text I had written and he didn't post anything
    except some personal abuse targetted at me (remarkably similar to your
    own performance, actually).
     
    Rainer Weikusat, Feb 12, 2014
    #34
  15. The original paper is not accessible on the web, however, considering
    the use in 'Higher Order Perl' and this here:

    http://www.google.co.uk/url?q=http:...4QFjAE&usg=AFQjCNH61ymTLntkFXTnVL6s5tRpuPnHPg

    I'd still vote in favour of using the term 'memoization' to refer to an
    (automatic) operation which can be applied to a pre-existing function in order to
    turn it into a memoized function. This seems to make more sense in the
    context if AI/ machine learning where it came from. Insofar this refers
    to schematically performed manual code change, something like 'the
    memoization pattern' seems more appropriate.
     
    Rainer Weikusat, Feb 12, 2014
    #35
  16. Dave Saville

    John Bokma Guest

    http://www.cs.utexas.edu/~hunt/research/hash-cons/hash-cons-papers/michie-memo-nature-1968.pdf

    In which "Memo function" is used (at least I couldn't find 'memoization'
    doing a quick scan of the what seems to be scanned text).
    Fair enough. So I guess we can then also talk about a /do/ block [1]
    using the memoization pattern, or a Schwartzian transform [2] using the
    memoization pattern?

    [1]

    my $message = $message_for{ $code } // do {
    $message_for{ $code } = expensive_look_up( $code ) // $default_message
    };

    [2]

    my @sorted_files = map { $_->[ 0 ] }
    sort { $a->[ 1 ] <=> $b->[ 1 ] }
    map { [ $_, sequence_no( $_ ) ] } @unsorted_files;
     
    John Bokma, Feb 12, 2014
    #36
  17. Dave Saville

    John Bokma Guest

    Not a fan of pissing on someone else's language (well, except PHP,
    maybe, in the past, that is). Moreover, the classic "Design Patterns"
    was written before Java was released to the public ...
    Sure, I call it a cache as well. Not because I am "real people" (whatever
    that means), but because that's what it is at a low level.
    And yet there are plenty of papers written on automatic memoization,
    which would be a pleonasm if it was that simple.

    This technique is called “memoization.†The manual version of the
    technique generally involves building lookup tables and is a standard
    technique in dynamic programming. This process, however, is often
    tedious and time-consuming, and it requires significant mod-
    ification and debugging of existing code. An “automatic†memoization
    facility is one in which existing functions can be programmatically
    changed to cache previously seen results in a table,

    http://techdigest.jhuapl.edu/td/td1802/hall.pdf p254-255.

    (unless I misunderstand automatic memoization)

    Also http://acl.ldc.upenn.edu/J/J91/J91-1004.pdf calls it "Automatic
    memoization"; the use of a higher order function to add a cache to an
    existing function.

    Anyway, I like "memoization pattern" to refer to the
    look-up-and-add-if-new code but probably will keep calling it most of
    the time just caching ;-).
     
    John Bokma, Feb 12, 2014
    #37
  18. Dave Saville

    John Bokma Guest

    I hope I write it better this time:

    If memoization always stood for automatic memoization, the latter would
    be a pleonasm.
     
    John Bokma, Feb 12, 2014
    #38
  19. Dave Saville

    John Bokma Guest

    In "An Introduction to Algorithms" this is called memoization
    in the chapter on lineair programming. Due to the limits of the
    Pascal-like pseudo code used in the book that's all that can be
    done. So, I would call this (now) manually memoization.
    Since a caching is /programmatically/ added to fib, I would call this
    automatic memoization.
    From the documentation of "Memoize":

    You could do the memoization yourself, by rewriting the function, like this:

    # Compute Fibonacci numbers, memoized version
    { my @fib;
    sub fib {
    my $n = shift;
    return $fib[$n] if defined $fib[$n];
    return $fib[$n] = $n if $n < 2;
    $fib[$n] = fib($n-1) + fib($n-2);
    }
    }

    I read this as "this is manual memoization".
    Doesn't that depend on the programming language (in this case Perl)? At
    least having a peek at the source of Memoize gives me the impression
    that it's far from trivial in Perl :).
     
    John Bokma, Feb 13, 2014
    #39
  20. Assuming the following example:

    sub fob
    {
    $_[0] < 3 ? 1 : fob($_[0] - 2) + fob($_[0] - 1);
    }

    is fib now a memoized version of fob? And even if it was

    sub fub {
    my ($n) = @_;
    $n < 3 and return 1;
    fub($n - 1) + fub($n - 2);
    }

    would fib be a memoized version of fub?
    In contrast to this, there are actually two functions in this example:
    There's the one $orig refers to and a 2nd which calculates exactly the
    same result as $orig, except that transparent caching of already
    calculated results is performed. If 'memoizing fib' causes 'fib' to
    disappear entirely, how can the result be 'a memoized fib' instead of
    just 'fib'?

    [...]
    I read this as "this is equivalent to what memoize had produced had
    it been applied to some hypothetical 'alternate-reality fib function'
    with suitable parameters" (a subroutine which uses a cache to store
    intermediate results for re-use because that's faster than recalculating
    them). The result is nevertheless genuinely different from the last
    example because assuming there ever was a (all untested)

    sub fib
    {
    my $n = shift;
    return $n if $n < 2;
    fib($n - 1) + fib($n - 2);
    }

    the other code could have been derived from that by performing a source
    code transformation prior to compiling the code which could principally
    be performed automatically as well. This becomes possibly somewhat
    clearer when removing some of the 'human touch':

    sub fib
    {
    $_[0] < 2 ? $_[0] : fib($_[0] - 1) + fib($_[0] - 2);
    }

    and

    sub fib
    {
    state @fibs;

    if ($#fibs < $_[0]) {
    $fibs[$_[0]] = $_[0] < 2 ? $_[0] : fib($_[0] - 1) + fib($_[0] - 2);
    }

    $fibs[$_[0];
    }

    This contains the original code unchanged but 'annotated' with the code
    for the caching. Implementing this for side-effect free functions taking
    a single integer argument as Lisp macro shouldn't be overly difficult (I
    haven't tried, though).

    The other process actually creates a new function object at run time and
    arranges for the original one to call that instead of itself by changing
    the symbol table.
    With Perl, there's the additional complication that functions can be
    context-aware and produce completely different results depending on
    that,

    sub irregular
    {
    wantarray() ? 'Denkste' : -45;
    }

    but the main 'inherent' problem is generalizing the 'single integer
    argument' case to arbitrary argument lists while providing an 'it just
    works' implementation for 'most cases': This means a hash table has to
    be used instead of an array and that the argument list needs to be
    transformed into a suitable key. This is non-trivial because of the
    possibity that arguments with the same 'effective content' might
    stringify differently, eg, when passing [1,2,3] as an argument, and that
    semantically equivalent (in the sense of producing identical results)
    argument lists might be different.
     
    Rainer Weikusat, Feb 13, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.