Initialising a hash

Rainer Weikusat · Feb 6, 2014

Ben Morrow said:
Quoth Rainer Weikusat said:

"Someone else wrote it and I know how to get it for free!" may also be a
sensible approach, but only [BLAH BLAH NIH BLAH]

Click to expand...

Oh, do shut up.

This is a design question: How should 'small, mostly static key/ value
database which occasionally need to be changed be stored and
accessed?'. The obvious choices would be

- store them in a text file in some "program-independent"
format, similar to existing database of this kind, eg,
services or protocols

- put the database directly into the code in a suitable way

Link with a third-party module which has a more-or-less up-to-date and
more-or-less maintained or unmaintained list internally hard-coded into
it and provides a more-or-less cumbersome and inefficient interface for
accessing it seems to be a singularly 'efficient' way to combine the
drawbacks of both approaches without providing the advantages of either
and "But someone else wrote that!" is not really a reason to use the
internally hard-coded list in this way: It can be transformed into a real
database file easily or alternatively, it can be put directly into the
code via copy'n'paste, thus yiedling either "can be maintained with
regard for the code using it, could use different versions on different
occasions (eg, a localized map), can be used by any kind of code" or
"everything together in a single file which can be edited easily".

I'd gladly listen to a more rational argument in favour of using the
'module with hard-coded list in it' than "It already exists".

Jim Gibson · Feb 7, 2014

Rainer said:
This is a design question: How should 'small, mostly static key/ value
database which occasionally need to be changed be stored and
accessed?'. The obvious choices would be

- store them in a text file in some "program-independent"
format, similar to existing database of this kind, eg,
services or protocols

- put the database directly into the code in a suitable way

Link with a third-party module which has a more-or-less up-to-date and
more-or-less maintained or unmaintained list internally hard-coded into
it and provides a more-or-less cumbersome and inefficient interface for
accessing it seems to be a singularly 'efficient' way to combine the
drawbacks of both approaches without providing the advantages of either
and "But someone else wrote that!" is not really a reason to use the
internally hard-coded list in this way: It can be transformed into a real
database file easily or alternatively, it can be put directly into the
code via copy'n'paste, thus yiedling either "can be maintained with
regard for the code using it, could use different versions on different
occasions (eg, a localized map), can be used by any kind of code" or
"everything together in a single file which can be edited easily".

I'd gladly listen to a more rational argument in favour of using the
'module with hard-coded list in it' than "It already exists".

The reasons to use HTTP::Status instead writing your own include, but
are not limited to, the following:

1. It already exists, so I won't have to waste my time re-writing it,
and I can get on with the actual problem I am trying to solve.

2. Somebody else wrote it, and they are probably a better programmer
than I and more knowledgable about HTTP status codes.

3. The module is complete and I won't have to worry about tracking down
every possible status code or searching down the relevant RFCs.

4. The CPAN module comes with documentation built-in.

5. The CPAN module comes with test code to ensure it has been installed
correctly. These tests are performed automatically when I install the
module, and can be rerun at any time very easily. Tests are performed
automatically on any module that is submitted to CPAN, and on platforms
for which I do not have access.

6. Documentation is available on the web.

7. I only have to put one line in my program to use it.

8. Lots of people use the module, so there is a good chance that any
bugs will be fixed. (That is certainly not true for the programs I
write, for which I am usually the only user.)

9. If the module does need maintenance, there is a good chance the
module author will do it.

10. HTTP status codes haven't changed in a long time, and probably
won't change in a non-compatible manner (e.g., more codes may be added,
but the current codes are not likely to change.)

Your aversion to using CPAN modules is well-known to readers of this
news group. I do not share that opinion, and your constant harping
against third party software becomes a little tiresome. Maybe you could
just limit yourself to critiquing the specifics of individual modules
instead of making blanket statements against all of them.

Lots of people have put in a lot of time and energy providing CPAN
modules. I even wrote a couple myself, and any reports of bugs in those
would immediately spur me into action to try to fix them. I suspect
most CPAN module authors would feel the same.

Of course, there are no doubt bad CPAN modules, and lots that have been
abandoned by their authors. It is alway a problem separating the wheat
from the chaff. Discussion on Usenet is one way to discover good
modules and bad ones. Generalization that any use of third-party
modules is bad serves no purpose.

Thank you for your many positive contributions to this group. You are
obviously very knowledgable about Perl.

Rainer Weikusat · Feb 7, 2014

Jim Gibson said:
The reasons to use HTTP::Status instead writing your own include, but
are not limited to, the following:

1. It already exists, so I won't have to waste my time re-writing it,
and I can get on with the actual problem I am trying to solve.

Not applicable since HTTP::Status is just window dressing on top of an
otherwise inaccessible hash table. The hash table can be put into 'other
code' via copy'n'paste which makes using it less work than going through
the procedural interface.

2. Somebody else wrote it, and they are probably a better programmer
than I and more knowledgable about HTTP status codes.

The 'bulk' of HTTP::Status is

sub status_message ($) { $StatusCode{$_[0]}; }

according to a clpm chestnut, this is bad because it means passing an
array to status_message will result in the array size being interpreted
as status code instead of the first element of the array (this is also
undocumented). Hash lookups in Perl aren't exactly rocket science, so
your statement doesn't really seem to be applicable here, either.

3. The module is complete and I won't have to worry about tracking down
every possible status code or searching down the relevant RFCs.

Not applicable since you can use the table the module contains, as I've
already written a couple of times. Also, that you're willing to believe
that the table is complete doesn't mean it actually is so and there's
still related problem of how to update it in case this becomes
necessary.

4. The CPAN module comes with documentation built-in.

If you aren't using the module, you won't need its documentation.

5. The CPAN module comes with test code to ensure it has been installed
correctly. These tests are performed automatically when I install the
module, and can be rerun at any time very easily. Tests are performed
automatically on any module that is submitted to CPAN, and on platforms
for which I do not have access.

There's nothing to test here, since we're still talking about a hash
lookup.

6. Documentation is available on the web.

See above.

7. I only have to put one line in my program to use it.

This is wrong on two accounts:

1. You are 'putting' all the lines contained in the module in your
program, you just don't have to write them.

2. You still need to call the function which does the hash lookup for
you everytime you want to use it.

8. Lots of people use the module, so there is a good chance that any
bugs will be fixed. (That is certainly not true for the programs I
write, for which I am usually the only user.)

Not applicable in this case.

9. If the module does need maintenance, there is a good chance the
module author will do it.

Unsubstantiated belief. According to my experience, this is also usually
wrong: If the code is found lacking, chances are that this is the case
because you're using it in a way the author doesn't which means you get
to fix it yourself (not that I'd mind doing that -- if the issue affects
me, I'm naturally in the position to do something about it).

10. HTTP status codes haven't changed in a long time, and probably
won't change in a non-compatible manner (e.g., more codes may be added,
but the current codes are not likely to change.)

I have no idea what this is supposed to express here.

Your aversion to using CPAN modules is well-known to readers of this
news group. I do not share that opinion, and your constant harping
against third party software becomes a little tiresome. Maybe you could
just limit yourself to critiquing the specifics of individual modules
instead of making blanket statements against all of them.

This is a statement about me and not about the module in question. It is
also a wrong statement as I was criticizing a specific module and have
praised other modules in the past, eg, most-recently, File::Temp.

Lots of people have put in a lot of time and energy providing CPAN
modules.

Lost of people have put a lot of time and energy into all kinds of tasks
such as 'building a perpetuum mobile' or 'making gold out of lesser
materials'.

[...]

Of course, there are no doubt bad CPAN modules, and lots that have been
abandoned by their authors. It is alway a problem separating the wheat
from the chaff. Discussion on Usenet is one way to discover good
modules and bad ones. Generalization that any use of third-party
modules is bad serves no purpose.

I usually have the exact opposite impression: It is assumed than
"downloading something from the internet" would be a worthy end in
itself, IOW, anything someone else wrote and published on the web is
prima facie better than anything not published on the web. This is
illogical, not the least because "putting something on CPAN" comes with
a considerable, bureaucratic/ administrative overhead and not everyone
has so much spare time available and even if he has it, he might not
care that much about "getting his name printed on the web". It doesn't
follow that all these people are "inferior programmers" because they
don't want to spend time on this (and vice versa).

I'd rather advocate making intelligent choices here based on a default]
policy of "if in doubt, don't use it": Make sure that you get the cream
but stay clear of everything which is just 'average'/ mediocre.

Keith Keller · Feb 8, 2014

The 'bulk' of HTTP::Status is

sub status_message ($) { $StatusCode{$_[0]}; }

according to a clpm chestnut, this is bad because it means passing an
array to status_message will result in the array size being interpreted
as status code instead of the first element of the array (this is also
undocumented).
[snip]

I'd rather advocate making intelligent choices

Clearly not, since the "intelligent choice" would be to submit a patch,
not write your own code from scratch.

--keith

Janek Schleicher · Feb 8, 2014

Am 07.02.2014 21:47, schrieb Rainer Weikusat:

Not applicable since HTTP::Status is just window dressing on top of an
otherwise inaccessible hash table. The hash table can be put into 'other
code' via copy'n'paste which makes using it less work than going through
the procedural interface.

Even copy+paste means more work than just using HTTP::Status.
It's not like you try to get Angular.js to run.

The 'bulk' of HTTP::Status is

sub status_message ($) { $StatusCode{$_[0]}; }

ok, that's still shorter than any hand written code and much easier to
maintain.

according to a clpm chestnut, this is bad because it means passing an
array to status_message will

a method call status_message
is very unlikely to get called in list contet

result in the array size being interpreted
as status code instead of the first element of the array (this is also
undocumented). Hash lookups in Perl aren't exactly rocket science, so
your statement doesn't really seem to be applicable here, either.

Well, HTTP::Status doesn't claim to be rocket science, it solves one
little problem, but it solves ist.

Not applicable since you can use the table the module contains, as I've
already written a couple of times.

You can use it of course.
But do you update it also?
And even if you use it, it's still more time, more work and less
maintainable than to use this module direct.

Also, that you're willing to believe
that the table is complete doesn't mean it actually is so and there's
still related problem of how to update it in case this becomes
necessary.

Copy+Paste makes it even worse

If you aren't using the module, you won't need its documentation.

So you have to write your own documentation.
Come on, it's past 1975, an undocumented, untested part of our code is a
bug, nor a feature.

There's nothing to test here, since we're still talking about a hash
lookup.

It still has to work.
Even when installed in customers environment or whereever.
Again, 1975 is over.

This is wrong on two accounts:

1. You are 'putting' all the lines contained in the module in your
program, you just don't have to write them.

So, it's better if you copy+paste them??

2. You still need to call the function which does the hash lookup for
you everytime you want to use it.

What's the problem.
If you want, you can add any Memorize-ic function to avoid it.

Not applicable in this case.

That's very applicable.

Unsubstantiated belief.
Why?

According to my experience, this is also usually
wrong: If the code is found lacking, chances are that this is the case
because you're using it in a way the author doesn't which means you get
to fix it yourself (not that I'd mind doing that -- if the issue affects
me, I'm naturally in the position to do something about it).

O.K., so you copy+pastied it,
what is better than?

This is a statement about me and not about the module in question. It is
also a wrong statement as I was criticizing a specific module and have
praised other modules in the past, eg, most-recently, File::Temp.

File::Temp is a module usually distributed,
it's part of perldoc.
Wow, you are using standard installed modules.

Lost of people have put a lot of time and energy into all kinds of tasks
such as 'building a perpetuum mobile' or 'making gold out of lesser
materials'.

HTTP::Status works and does what it claims.
Year, people tried to build perpetuum mobile, well, they nether worked
and aren't part of CPAN IIRC.

I usually have the exact opposite impression: It is assumed than
"downloading something from the internet" would be a worthy end in
itself, IOW, anything someone else wrote and published on the web is
prima facie better than anything not published on the web.

Nobody said that,
but most of the time it's worth to use it,
and often it's at least better than to reinvent it,
for at least at the first days,
and only some problems are worth it to invest a manweek+ just because we
are paronoid about internet solutions.

Also, explain me, why copy+paste is superior than to just use what you
intend to c+p?

This is
illogical, not the least because "putting something on CPAN" comes with
a considerable, bureaucratic/ administrative overhead and not everyone
has so much spare time available and even if he has it ...

So, you have all this overhead available?
Lucky guy!

Be glad in 1975.

Greetings,
Janeks

Rainer Weikusat · Feb 9, 2014

Keith Keller said:
The 'bulk' of HTTP::Status is

sub status_message ($) { $StatusCode{$_[0]}; }

according to a clpm chestnut, this is bad because it means passing an
array to status_message will result in the array size being interpreted
as status code instead of the first element of the array (this is also
undocumented).
[snip]

I'd rather advocate making intelligent choices

Click to expand...

Clearly not, since the "intelligent choice" would be to submit a patch,
not write your own code from scratch.

Even as high as DH5 we still sometimes see deliberate
dishonesty, as when someone picks out minor points of an
argument and refutes those. Sometimes the spirit in which this
is done makes it more of a sophisticated form of ad hominem than
actual refutation.

http://www.paulgraham.com/disagree.html

The exercise above, however, isn't even about 'picking out minor points'
but really just a wholesale, malicious fabrication, in order to get to

DH0. Name-calling.

[...]

u r a fag!!!!!!!!!!

But it's important to realize that more articulate name-calling
has just as little weight. A comment like

The author is a self-important dilettante.

is really nothing more than a pretentious version of "u r a
fag."
[same location]

Rainer Weikusat · Feb 9, 2014

Janek Schleicher said:
Am 07.02.2014 21:47, schrieb Rainer Weikusat:

Even copy+paste means more work than just using HTTP::Status.
It's not like you try to get Angular.js to run.

This seems like a patently absurd statement to me ...

The 'bulk' of HTTP::Status is

sub status_message ($) { $StatusCode{$_[0]}; }

Click to expand...

ok, that's still shorter than any hand written code and much easier to
maintain.

The exact equivalents would:

print(status_message(200));

vs

print($http_status{200});

That's less code for the second case. Of course, using a name a la

$htsc{200}

would be as possible. But the difference is really rather negligent on
the 'coding side'. Using direct hash lookups is - however - faster,
which might have noticable effect when 'processing log files' as these
can easily be 'large' (millions of lines) and it offers some otherwise
unavailable features, eg, get access to a list of all status codes.

[intelligible text restored]

a method call status_message
is very unlikely to get called in list contet

If it was called as a method (which would require creating an
out-of-module constructor for HTTP::Status objects first), this wouldn't
matter because prototypes aren't checked on method calls. Also, I didn't
said that I'd agree this was a problem (I'm using prototypes myself,
just not when posting code here), I just pointed out that there's a
double standard applied here: Someone who posts code with prototypes
used for something else than "enabling hand-optimized 'chaotic' syntax"
is invariable told that he shouldn't be doing this. If this was a
measure of 'bad code quality', it should surely apply to code written by
"module authors" as well.

Well, HTTP::Status doesn't claim to be rocket science, it solves one
little problem, but it solves ist.

In the context of "too difficult to solve the problem without the
module", this doesn't seem to make sense. I didn't claim that the hash
lookup subroutine wouldn't work, just that 'using a hash lookup
instead' would amount to using a basic language feature.

You can use it of course. But do you update it also?
And even if you use it, it's still more time, more work and less
maintainable than to use this module direct.

I was specifically writing about the problem of maintaining the
table. This is easy for a separate database file, fairly easy if the
code using the table contains it directly, and problematic when the
table comes 'internally hard-coded' with some software distribution as
any 'naive' update of that would overwrite local changes.

You've now made the 'more work and less maintainable' statement
twice. Would you care to elaborate why you consider this to be true?
It's not self-evident to me.

Copy+Paste makes it even worse
Why?

So you have to write your own documentation.
Come on, it's past 1975, an undocumented, untested part of our code is
a bug, nor a feature.

Documentation about 'using hashes' already comes with Perl. Also, your
idea about 'documentation' seems a little unrealistic to me: Assuming
that documentation is written at all (I assure you that the people who
pay the salaries for me an my colleagues considers this a better spare
time activity which should rather be avoided in order to perform 'useful
work'), it is pretty certain that no one will ever read it on the
grounds that this is just too much effort

It still has to work.
Even when installed in customers environment or whereever.
Again, 1975 is over.

I'm unaware if 'software written in 1975' wasn't supposed to work as I
was three by that time. I'm, however, inclined to believe that it was.

So, it's better if you copy+paste them??

In case the useful part of a module is smaller than the module itself
and more useful without it, yes. There's - of course - also the option
of putting the table into a proper database file and use that.

What's the problem.
If you want, you can add any Memorize-ic function to avoid it.

The function call itself? How?

That's very applicable.

There is no non-trivial code in this module. Because of this, the
assumption isn't applicable here.

Why?

Because 9. is an 'lone' assertion without any to substantiate it.

O.K., so you copy+pastied it, what is better than?

I don't understand what this is supposed to communicate in the given
context.

File::Temp is a module usually distributed,
it's part of perldoc.
Wow, you are using standard installed modules.

The statement about me you deleted contained the claim that I was
principally opposed to using third-party developed modules. This is
wrong, as pointed out above.

HTTP::Status works and does what it claims.
Year, people tried to build perpetuum mobile, well, they nether worked
and aren't part of CPAN IIRC.

That's besides the point: The original statement was "lots of people put
lots of time and effort into developing CPAN modules". As shown by the
two examples itself, this is irrelevant in the given context: The result
matters.

Nobody said that,
but most of the time it's worth to use it,
and often it's at least better than to reinvent it,

Well, you just said that.

John Bokma · Feb 9, 2014

Rainer Weikusat said:
would be as possible. But the difference is really rather negligent on
the 'coding side'. Using direct hash lookups is - however - faster,
which might have noticable effect when 'processing log files' as these
can easily be 'large' (millions of lines)

Assuming that one needs this, converting the status code of many lines
to something human readable [1], the solution is simple: since per your
earlier post the look up table (no matter where it resides) can be
incomplete, one has to take care of the case the status code not being
in the table (or even not valid at all). If an expensive way of
looking-up the code is used, one can use a hash table to cache such look
ups (as earlier has been suggested: memoization) and build the table
"on-the-fly".

unless ( exists $readable_for{ $status_code } ) {
my $readable = expensive_look_up( $status_code );
defined $readable
or $readable = "No readable message for $status_code";
$readabe_for{ $status_code } = $readable;
....
}

[1] my experience with log files is generating summaries, and hence
doing the look up is done on the summarized data (of course).

Rainer Weikusat · Feb 11, 2014

John Bokma said:
Rainer Weikusat said:

would be as possible. But the difference is really rather negligent on
the 'coding side'. Using direct hash lookups is - however - faster,
which might have noticable effect when 'processing log files' as these
can easily be 'large' (millions of lines)

Click to expand...

Assuming that one needs this, converting the status code of many lines
to something human readable [1], the solution is simple:
[...]

If an expensive way of looking-up the code is used, one can use a hash
table to cache such look ups

Assuming that the interface provided by the module is not only ill
thought-out but positively unusable, an obvious workaround which enables
linking with it without actually having to use it - of course - to make an
incremental copy of the locked-in, hard-coded hash table at run time: In
this way, even the 'benefit' of "not having to write code which
does hash lookups" goes obviously out of the window.

(as earlier has been suggested: memoization)

'Memoization' in the context of Perl (AFAIK) comes from Mark Jason
Dominus and it is supposed to refer to 'transparently avoiding repeated,
expensive calculations by putting previously calculated results into a
hash table using the arguments given to the function in question as a
key and replacing a call to the "backing function" with a hash lookup
whenever possible'. IIRC, his examples were naively implemented
recursive functions, eg, calculating Fibnoacci numbers recursively. The
key point here is that this can be made to work transparently with the
help of function-manipulating functions (as one would expect in the
context of a book named "Higher Order Perl"):

We would like to tell Perl that we want caching behavior enabled
on a function. Perl should be able to perform the required
transformation automatically. Such automatic transformation of
a function to add caching behavior is called memoization and the
function is said to be memoized.3

[...]

3. The term memoization was coined in 1968 by Donald Michie.
[Higher Order Perl, p. 69]

Considering this, your use of the term for the proposed hack seems
inapproriate to me.

John Bokma · Feb 11, 2014

[ Memoization ]

Considering this, your use of the term for the proposed hack seems
inapproriate to me.

automatic memoization [1].

[1] https://www.google.com/search?q=automatic memoization filetype:pdf

Rainer Weikusat · Feb 11, 2014

John Bokma said:
[ Memoization ]

Considering this, your use of the term for the proposed hack seems
inapproriate to me.

Click to expand...

automatic memoization [1].

[1] https://www.google.com/search?q=automatic memoization filetype:pdf

I don't quite understand what this is supposed to communicate in the
given context, especially considering that this context is sufficiently
context-less that nobody can possibly relate this text either to the
original procedure ('individually hand-coded, incremental copying of an
otherwise inaccessible hash table') nor to my text which quoted a
definition of the term 'memoization' from Higher Order Perl and stated
that this definition was not applicable to said procedure and hence, the
term improperly used. Insofar you'd like to dispute the merit of the
definition in it's own right or the applicability of the text I quoted
in a discussion centering on Perl, I suggest that you do that instead.

John Bokma · Feb 11, 2014

Rainer Weikusat said:
John Bokma said:

[ Memoization ]

Considering this, your use of the term for the proposed hack seems
inapproriate to me.

Click to expand...

automatic memoization [1].

[1] https://www.google.com/search?q=automatic memoization filetype:pdf

Click to expand...

I don't quite understand what this is supposed to communicate in the
given context, especially considering that this context is sufficiently
context-less that nobody can possibly relate this text either to the
original procedure ('individually hand-coded, incremental copying of an
otherwise inaccessible hash table')

I stand corrected, let's call my "hack": caching of the results of an
expensive look up in order to avoid such expensive look ups in the near
future (or 'caching') since it's (the code given) not a function.

nor to my text which quoted a definition of the term 'memoization'
from Higher Order Perl and stated

Which is (also) called 'automatic memoization' to distinguish this from
explicit manually coded caching in the function itself, which also is
called memoization by people who distinguish between those two using
aforementioned terms (I mistakenly (?) assumed that this was the problem
you had with my "hack"; that it wasn't automatic memoization, but
explicitly manually coded).

Justin C · Feb 12, 2014

The exercise above, however, isn't even about 'picking out minor points'
but really just a wholesale, malicious fabrication, in order to get to

Oh look, he's fallen back into the kill file.

Justin.

Rainer Weikusat · Feb 12, 2014

Justin C said:
Oh look, he's fallen back into the kill file.

As a matter of fact, I didn't write the text Mr Keller pierced together
from disparate parts of a text I had written and he didn't post anything
except some personal abuse targetted at me (remarkably similar to your
own performance, actually).

Rainer Weikusat · Feb 12, 2014

John Bokma said:
Rainer Weikusat said:

John Bokma said:

[ Memoization ]
Considering this, your use of the term for the proposed hack seems
inapproriate to me.

automatic memoization [1].

[1] https://www.google.com/search?q=automatic memoization filetype:pdf

Click to expand...

I don't quite understand what this is supposed to communicate in the
given context, especially considering that this context is sufficiently
context-less that nobody can possibly relate this text either to the
original procedure ('individually hand-coded, incremental copying of an
otherwise inaccessible hash table')

Click to expand...

I stand corrected, let's call my "hack": caching of the results of an
expensive look up in order to avoid such expensive look ups in the near
future (or 'caching') since it's (the code given) not a function.

nor to my text which quoted a definition of the term 'memoization'
from Higher Order Perl and stated

Click to expand...

Which is (also) called 'automatic memoization' to distinguish this from
explicit manually coded caching in the function itself,

The original paper is not accessible on the web, however, considering
the use in 'Higher Order Perl' and this here:

http://www.google.co.uk/url?q=http:...4QFjAE&usg=AFQjCNH61ymTLntkFXTnVL6s5tRpuPnHPg

I'd still vote in favour of using the term 'memoization' to refer to an
(automatic) operation which can be applied to a pre-existing function in order to
turn it into a memoized function. This seems to make more sense in the
context if AI/ machine learning where it came from. Insofar this refers
to schematically performed manual code change, something like 'the
memoization pattern' seems more appropriate.

John Bokma · Feb 12, 2014

Rainer Weikusat said:
[..]

Which is (also) called 'automatic memoization' to distinguish this from
explicit manually coded caching in the function itself,

Click to expand...

The original paper is not accessible on the web,

http://www.cs.utexas.edu/~hunt/research/hash-cons/hash-cons-papers/michie-memo-nature-1968.pdf

In which "Memo function" is used (at least I couldn't find 'memoization'
doing a quick scan of the what seems to be scanned text).

I'd still vote in favour of using the term 'memoization' to refer to
an (automatic) operation which can be applied to a pre-existing
function in order to turn it into a memoized function. This seems to
make more sense in the context if AI/ machine learning where it came
from. Insofar this refers to schematically performed manual code
change, something like 'the memoization pattern' seems more
appropriate.

Fair enough. So I guess we can then also talk about a /do/ block [1]
using the memoization pattern, or a Schwartzian transform [2] using the
memoization pattern?

[1]

my $message = $message_for{ $code } // do {
$message_for{ $code } = expensive_look_up( $code ) // $default_message
};

[2]

my @sorted_files = map { $_->[ 0 ] }
sort { $a->[ 1 ] <=> $b->[ 1 ] }
map { [ $_, sequence_no( $_ ) ] } @unsorted_files;

John Bokma · Feb 12, 2014

Ben Morrow said:
Quoth John Bokma said:

Rainer Weikusat said:

I'd still vote in favour of using the term 'memoization' to refer to
an (automatic) operation which can be applied to a pre-existing
function in order to turn it into a memoized function. This seems to
make more sense in the context if AI/ machine learning where it came
from. Insofar this refers to schematically performed manual code
change, something like 'the memoization pattern' seems more
appropriate.

Click to expand...

Fair enough. So I guess we can then also talk about a /do/ block [1]
using the memoization pattern, or a Schwartzian transform [2] using the
memoization pattern?

Click to expand...

'Patterns' are the sort of thing Java programmers talk about.

Not a fan of pissing on someone else's language (well, except PHP,
maybe, in the past, that is). Moreover, the classic "Design Patterns"
was written before Java was released to the public ...

Real people call this a 'cache'.

Sure, I call it a cache as well. Not because I am "real people" (whatever
that means), but because that's what it is at a low level.

(And yes, Rainer is right that memoisation usually refers to adding
caching in front of an existing naively-implemented function to make it
more efficient.)

And yet there are plenty of papers written on automatic memoization,
which would be a pleonasm if it was that simple.

This technique is called â€œmemoization.â€ The manual version of the
technique generally involves building lookup tables and is a standard
technique in dynamic programming. This process, however, is often
tedious and time-consuming, and it requires significant mod-
ification and debugging of existing code. An â€œautomaticâ€ memoization
facility is one in which existing functions can be programmatically
changed to cache previously seen results in a table,

http://techdigest.jhuapl.edu/td/td1802/hall.pdf p254-255.

(unless I misunderstand automatic memoization)

Also http://acl.ldc.upenn.edu/J/J91/J91-1004.pdf calls it "Automatic
memoization"; the use of a higher order function to add a cache to an
existing function.

Anyway, I like "memoization pattern" to refer to the
look-up-and-add-if-new code but probably will keep calling it most of
the time just caching ;-).

John Bokma · Feb 12, 2014

Ben Morrow said:
Going from step three to step four is non-trivial, hence the papers.

I hope I write it better this time:

If memoization always stood for automatic memoization, the latter would
be a pleonasm.

John Bokma · Feb 13, 2014

Ben Morrow said:
This is a modified function which uses a cache:

my @cache;
sub fib {
my ($n) = @_;
$n < 3 and return 1;
$cache[$n] ||= fib($n - 1) + fib($n - 2);
}

In "An Introduction to Algorithms" this is called memoization
in the chapter on lineair programming. Due to the limits of the
Pascal-like pseudo code used in the book that's all that can be
done. So, I would call this (now) manually memoization.

This is how one might memoise it manually:

# sub fib as the first example above

my @cache;
my $orig = \&fib;
no warnings "redefine";
*fib = sub {
my ($n) = @_;
$cache[$n] ||= $orig->($n);
};

Since a caching is /programmatically/ added to fib, I would call this
automatic memoization.

This is how one might memoise it automatically:

# sub fib as before

use Memoize;
memoize "fib";

From the documentation of "Memoize":

You could do the memoization yourself, by rewriting the function, like this:

# Compute Fibonacci numbers, memoized version
{ my @fib;
sub fib {
my $n = shift;
return $fib[$n] if defined $fib[$n];
return $fib[$n] = $n if $n < 2;
$fib[$n] = fib($n-1) + fib($n-2);
}
}

I read this as "this is manual memoization".

Going from step three to step four is non-trivial, hence the papers.

Doesn't that depend on the programming language (in this case Perl)? At
least having a peek at the source of Memoize gives me the impression
that it's far from trivial in Perl

.

Rainer Weikusat · Feb 13, 2014

John Bokma said:
Ben Morrow said:

This is a modified function which uses a cache:

my @cache;
sub fib {
my ($n) = @_;
$n < 3 and return 1;
$cache[$n] ||= fib($n - 1) + fib($n - 2);
}

Click to expand...

In "An Introduction to Algorithms" this is called memoization
in the chapter on lineair programming. Due to the limits of the
Pascal-like pseudo code used in the book that's all that can be
done. So, I would call this (now) manually memoization.

Assuming the following example:

sub fob
{
$_[0] < 3 ? 1 : fob($_[0] - 2) + fob($_[0] - 1);
}

is fib now a memoized version of fob? And even if it was

sub fub {
my ($n) = @_;
$n < 3 and return 1;
fub($n - 1) + fub($n - 2);
}

would fib be a memoized version of fub?

This is how one might memoise it manually:

# sub fib as the first example above

my @cache;
my $orig = \&fib;
no warnings "redefine";
*fib = sub {
my ($n) = @_;
$cache[$n] ||= $orig->($n);
};

Click to expand...

Since a caching is /programmatically/ added to fib, I would call this
automatic memoization.

In contrast to this, there are actually two functions in this example:
There's the one $orig refers to and a 2nd which calculates exactly the
same result as $orig, except that transparent caching of already
calculated results is performed. If 'memoizing fib' causes 'fib' to
disappear entirely, how can the result be 'a memoized fib' instead of
just 'fib'?

[...]

From the documentation of "Memoize":

You could do the memoization yourself, by rewriting the function, like this:

# Compute Fibonacci numbers, memoized version
{ my @fib;
sub fib {
my $n = shift;
return $fib[$n] if defined $fib[$n];
return $fib[$n] = $n if $n < 2;
$fib[$n] = fib($n-1) + fib($n-2);
}
}

I read this as "this is manual memoization".

I read this as "this is equivalent to what memoize had produced had
it been applied to some hypothetical 'alternate-reality fib function'
with suitable parameters" (a subroutine which uses a cache to store
intermediate results for re-use because that's faster than recalculating
them). The result is nevertheless genuinely different from the last
example because assuming there ever was a (all untested)

sub fib
{
my $n = shift;
return $n if $n < 2;
fib($n - 1) + fib($n - 2);
}

the other code could have been derived from that by performing a source
code transformation prior to compiling the code which could principally
be performed automatically as well. This becomes possibly somewhat
clearer when removing some of the 'human touch':

sub fib
{
$_[0] < 2 ? $_[0] : fib($_[0] - 1) + fib($_[0] - 2);
}

and

sub fib
{
state @fibs;

if ($#fibs < $_[0]) {
$fibs[$_[0]] = $_[0] < 2 ? $_[0] : fib($_[0] - 1) + fib($_[0] - 2);
}

$fibs[$_[0];
}

This contains the original code unchanged but 'annotated' with the code
for the caching. Implementing this for side-effect free functions taking
a single integer argument as Lisp macro shouldn't be overly difficult (I
haven't tried, though).

The other process actually creates a new function object at run time and
arranges for the original one to call that instead of itself by changing
the symbol table.

Doesn't that depend on the programming language (in this case Perl)? At
least having a peek at the source of Memoize gives me the impression
that it's far from trivial in Perl .

With Perl, there's the additional complication that functions can be
context-aware and produce completely different results depending on
that,

sub irregular
{
wantarray() ? 'Denkste' : -45;
}

but the main 'inherent' problem is generalizing the 'single integer
argument' case to arbitrary argument lists while providing an 'it just
works' implementation for 'most cases': This means a hash table has to
be used instead of an array and that the argument list needs to be
transformed into a suitable key. This is non-trivial because of the
possibity that arguments with the same 'effective content' might
stringify differently, eg, when passing [1,2,3] as an argument, and that
semantically equivalent (in the sense of producing identical results)
argument lists might be different.

Help with a hash	4	Apr 11, 2012
Read a hash	5	Apr 28, 2011
traversing a hash structure	3	Aug 29, 2011
data to hash	1	Jul 29, 2011
Push regex search result into hash with multiple values	14	May 19, 2014
hash of arrays	1	Sep 13, 2012
multiple text replacements from a hash	2	Apr 1, 2013
simple hash	3	Oct 2, 2009

Initialising a hash

Rainer Weikusat

Jim Gibson

Rainer Weikusat

Keith Keller

Janek Schleicher

Rainer Weikusat

Rainer Weikusat

John Bokma

Rainer Weikusat

John Bokma

Rainer Weikusat

John Bokma

Justin C

Rainer Weikusat

Rainer Weikusat

John Bokma

John Bokma

John Bokma

John Bokma

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads