R
Rainer Weikusat
NB: This is a lenghty multi-section texts starting with an explanation
of 'the basics' for anybody who might have been so unfortunate to
encounter Perl at version 5.16 or later. There's something which isn't
an explanation at the end. Any kind of constructive criticism is
welcome.
NB^2: I wrote this because I wanted to write it for some time. In line
with 'the base of any sound order is a large wastepaper bin' (Kurt
Tucholsky), "Stop shaking your dick in public, people are staring
already" and similar cheesinesses are just a short way into my
scorefile, alongside the large set of 'less than instructive posters'
who reside there already.
1. Objects, References and Methods
----------------------------------
One of the more unique features of Perl OO is that there is no such
thing as a unique, language-enforced representation for 'object
instances'. Perl has a number different kinds of builtin objects, the
most prominent ones would be scalars, arrays and hashes. glob objects
are used to store symbol table entries and also, to access file and
directory handles. code objects represent subroutines. There's also a
probably lesser-known one, namely, the lvalue object. Presumably (I
didn't test this) it is used as actual 'return value' of an lvalue
subroutine and it can also be 'caught in flight' when using certain
built-in subroutines/ operators, eg
perl -e '$w = "Hello"; $lv = \substr($w, 1, 3); print $lv, "\n"; $$lv = "aird"; print $w, "\n"'
will print
LVALUE(0x620d70)
Hairdo
References to objects of all these kinds can be created by applying
the backslash operator to an object of a certain kind or by using a
special 'anonymous object' construction syntax, eg
\%hash
return a reference to the hash %hash and
[1,2,3]
returns a reference to an anonymous array containing the values 1, 2
and 3.
Any reference can be associated with a package with the help of the
bless function. When a reference is associated with a package, the
syntax
$ref->name(argument, ...)
can be used to request a 'method call': This will search for a
subroutine named name in the package $ref is currently associated with
and in all packages this package declared as its parent
packages. Because this 'subroutine search' happens at runtime using
the string 'name' as key, this is really rather a message-passing
mechanism than something like the 'virtual methods' provided by C++.
The implication of this is that any Perl object can acquire OO-like
features via attaching a reference to the object to a package while
otherwise retaining its built-in behaviour.
Example: File Handle Objects
----------------------------
A relatively simple uncommon example: In a certain program, I'm using
Linux 'queued realtime signals' for I/O event notification. This
requires manipulating the file status flags associated with the
corresponding file descriptor (via fcntl) to set the owner of the file
and the signal which is supposed to be sent in case of an 'I/O
readiness' event for this file descriptor to to set or clear the
O_ASYNC flag when such signals should or shouldn't be sent. fcntl is
an all-or-nothing operation which changes all flags of a file
descriptor. In order to modify individual flags selectively, the
corresponding bits need to be set or cleared in an integer
representing the set of all flags bits. Since file handles are
nowadays usually references to anonymous globs, they can be blessed
into a package in order to support calling methods on them. In this
case, this would be an enable_async method which enables asynchronous
notifcations and a disable_async method. The current flag set is
stored in the scalar slot of the glob. Because the glob is stilla
glob, it can otherwise be used just like any other file handle.
,----
| This is what the Perl 6 OO design document refers to as 'It is too
| minimal': It supports too many features many people never use because
| they have no idea what they'd be good for (because
| $better_known_oo_language didn't provide them)
`----
2. And Records?
---------------
Unfortunately, something Perl doesn't provide is a complex datatype
whose components can be accessed by name instead of using an integer
(array) or string (hash) key. Such record or structure types (or
constructs conceptually derived from them) are usually used as object
representation in other languages, not the least because they nest
properly.
Example in C
------------
Assuming the type definitions
struct person {
char *forename, *surname;
};
and
struct citizen {
struct person person;
char *nationality;
}
and the variable definitions
struct citizen someone = { .person = { .forename = 'Zarathustra' }'};
struct person *person;
the person pointer can point to a struct citizen,
person = (struct person *)&someone;
and can be used to access the 'struct person' fields,
puts(person->forname);
This obviously lends itself to implementing 'data inheritance' in
addition to 'method inheritance.
Hashes perhaps?
---------------
An obvious idea ('the obvious idea') to provide a similar facility in
Perl are hashes because these are 'dictionary' (or 'associative array')
data structures mapping string keys to abitrary 'other' (scalar)
values.
,----
| At this point in the story, the PHP (or Java) programmer
| probably leans back with a sigh of relief because 'the solution' just
| presented itself and thus, there's no further requirement to think
| about all this complicated stuff just in order to "get shit done
| quickly" aka 'web development' (the 'get ...' is a quote from an
| article about 'web frameworks' I read a while ago).
`----
But often, hashes/ references to anonymous hashes are not really a
good choice for representing objects because 'a hash' is essentially
an allocation scheme for array slots suposed to map all members of a
'large' (or numerically discontinuous) set of 'keys' to such array
slots by running a (seriously) lossy compression function on the key
in order to transform it into a slot number based on the assumption
that the actual number of different keys in the hash will always be
much smaller than the number of elements in the key set and combining
that with a mechansism to deal with so-called 'hash collision', cases
where the result of the lossy compression function is identical for
two keys which are actually different. *If* the number of actual
collisions in the hash table is small, key insert, delete and lookup
operations performed on it will complete in constant time, although
they are still fairly expensive because of the compression
function. In order to ensure that the number of collisions will be
small, hash tables are usually (and possibly significantly) larger
than what would be required if the keys could just be packed into
successive slots of the table.
Another problem with using hashes as object representation is that the
'namespace' of each individual hash is 'global': If two related
packages use the same name for an object property, they will end up
using the same 'virtual property slot' and there's no easy way for
preventing that.
A possible way would be to prefix the package name to each property
name and hope that all other packages also do this. But since package
names are often longish, this is an unwiedly workaround. A workaround
for the workaround would be to use declared constants whose values are
the long property names and whose names are short enough to be used
comfortably. But - alas - the {} autoquotes the key if it 'looks like
a string' and this means that
use constant NAME => __PACKAGE__.'name';
$h->{NAME} = 'Paul';
doesn't work. A workaround for the workaround for the workaround would
be to invoke the 'constant subroutine' explicitly,
$h->{NAME()} = 'Paul';
but the two added noise characters are rather ugly.
Lastly, the hash lookup which is a rather costly operation happens
every time such a named property is accessed.
A Better Idea: Arrays
---------------------
Accessing slots of an array is a cheap O(1) operation and arrays don't
need to be larger than necessary to accommodate the number of elements
stored in them in order to workaround the fact that slots are
allocated based on the result of a (seriously) lossy compression
function applied to the key. Since the Perl OO system is 'too minimal'
aka 'provides to many weird features', array references can also
serve as 'invocant objects' for method calls.
Two remaining problems are
1. Obviously, accessing instance properties by index number is not a
good idea except in very trivial cases.
2. Slot allocation for related classes.
(1) can be solved by declaring named constants with integer values and
using these in the source code. The nice properties of this are that
names declared in different packages don't collide, that the
translation into slot number happens at compile time and that the []
require no synactical workarounds for this case.
use constant NAME => 0;
$h->[NAME] = 'Emil';
works as intended.
It is possible to solve (2) by manageing slot numbers manually and
such a scheme can be made to work for single-inheritance class
hierarchies by using a 'magical name' to refer to the last slot number
used by a certain package. Slot numbers for 'derived class' can then
be calculated relative to the 'last used slot' number of the
superclass. But this is still fairly inconvenient and the
use constant PORP_A => 0;
use constant PORP_B => 1;
use constant PORP_C => 2;
requires a lot of typing.
Automatic Compile-Time Allocation of Array Slots
------------------------------------------------
The
use something (...)
feature can be used to execute a subroutine named something::import at
compile time. By combining this with the existing constant module and
some fairly simple state-tracking code, it is possible to create slot
name constants based on a use statement in a way which works for
single-inheritance class hierachies:
---------------
package slots;
use feature 'state';
use constant;
sub import
{
state %counters;
my ($ctr, $fields);
if (ref($_[1])) {
$fields = $_[1];
$ctr = 0;
} else {
$ctr = $counters{$_[1]};
$fields = $_[2];
}
@_ = ($_[0], {map { $_, $ctr++; } @$fields});
$counters{caller()} = $ctr;
goto &constant::import;
}
1;
---------------
Assuming this code exists as slots.pm somewhere where perl can find
it, a non-derived class could declare a set of 'slot names' by doing
use slots [qw(FORENAME SURENAME)];
and a class derived from this class could do
use slots ('Person', ['NATIONALITY']);
('Person' being the name of the superclass package) to request an
additional field which will have the next 'free' slot number.
NB: This is an idea I had yesterday and since the implementation was
so exceedingly simple, I thought sharing it might be a good idea. I
expect that there situations this simple code doesn't handle but it is
'good enough' to be useful.
of 'the basics' for anybody who might have been so unfortunate to
encounter Perl at version 5.16 or later. There's something which isn't
an explanation at the end. Any kind of constructive criticism is
welcome.
NB^2: I wrote this because I wanted to write it for some time. In line
with 'the base of any sound order is a large wastepaper bin' (Kurt
Tucholsky), "Stop shaking your dick in public, people are staring
already" and similar cheesinesses are just a short way into my
scorefile, alongside the large set of 'less than instructive posters'
who reside there already.
1. Objects, References and Methods
----------------------------------
One of the more unique features of Perl OO is that there is no such
thing as a unique, language-enforced representation for 'object
instances'. Perl has a number different kinds of builtin objects, the
most prominent ones would be scalars, arrays and hashes. glob objects
are used to store symbol table entries and also, to access file and
directory handles. code objects represent subroutines. There's also a
probably lesser-known one, namely, the lvalue object. Presumably (I
didn't test this) it is used as actual 'return value' of an lvalue
subroutine and it can also be 'caught in flight' when using certain
built-in subroutines/ operators, eg
perl -e '$w = "Hello"; $lv = \substr($w, 1, 3); print $lv, "\n"; $$lv = "aird"; print $w, "\n"'
will print
LVALUE(0x620d70)
Hairdo
References to objects of all these kinds can be created by applying
the backslash operator to an object of a certain kind or by using a
special 'anonymous object' construction syntax, eg
\%hash
return a reference to the hash %hash and
[1,2,3]
returns a reference to an anonymous array containing the values 1, 2
and 3.
Any reference can be associated with a package with the help of the
bless function. When a reference is associated with a package, the
syntax
$ref->name(argument, ...)
can be used to request a 'method call': This will search for a
subroutine named name in the package $ref is currently associated with
and in all packages this package declared as its parent
packages. Because this 'subroutine search' happens at runtime using
the string 'name' as key, this is really rather a message-passing
mechanism than something like the 'virtual methods' provided by C++.
The implication of this is that any Perl object can acquire OO-like
features via attaching a reference to the object to a package while
otherwise retaining its built-in behaviour.
Example: File Handle Objects
----------------------------
A relatively simple uncommon example: In a certain program, I'm using
Linux 'queued realtime signals' for I/O event notification. This
requires manipulating the file status flags associated with the
corresponding file descriptor (via fcntl) to set the owner of the file
and the signal which is supposed to be sent in case of an 'I/O
readiness' event for this file descriptor to to set or clear the
O_ASYNC flag when such signals should or shouldn't be sent. fcntl is
an all-or-nothing operation which changes all flags of a file
descriptor. In order to modify individual flags selectively, the
corresponding bits need to be set or cleared in an integer
representing the set of all flags bits. Since file handles are
nowadays usually references to anonymous globs, they can be blessed
into a package in order to support calling methods on them. In this
case, this would be an enable_async method which enables asynchronous
notifcations and a disable_async method. The current flag set is
stored in the scalar slot of the glob. Because the glob is stilla
glob, it can otherwise be used just like any other file handle.
,----
| This is what the Perl 6 OO design document refers to as 'It is too
| minimal': It supports too many features many people never use because
| they have no idea what they'd be good for (because
| $better_known_oo_language didn't provide them)
`----
2. And Records?
---------------
Unfortunately, something Perl doesn't provide is a complex datatype
whose components can be accessed by name instead of using an integer
(array) or string (hash) key. Such record or structure types (or
constructs conceptually derived from them) are usually used as object
representation in other languages, not the least because they nest
properly.
Example in C
------------
Assuming the type definitions
struct person {
char *forename, *surname;
};
and
struct citizen {
struct person person;
char *nationality;
}
and the variable definitions
struct citizen someone = { .person = { .forename = 'Zarathustra' }'};
struct person *person;
the person pointer can point to a struct citizen,
person = (struct person *)&someone;
and can be used to access the 'struct person' fields,
puts(person->forname);
This obviously lends itself to implementing 'data inheritance' in
addition to 'method inheritance.
Hashes perhaps?
---------------
An obvious idea ('the obvious idea') to provide a similar facility in
Perl are hashes because these are 'dictionary' (or 'associative array')
data structures mapping string keys to abitrary 'other' (scalar)
values.
,----
| At this point in the story, the PHP (or Java) programmer
| probably leans back with a sigh of relief because 'the solution' just
| presented itself and thus, there's no further requirement to think
| about all this complicated stuff just in order to "get shit done
| quickly" aka 'web development' (the 'get ...' is a quote from an
| article about 'web frameworks' I read a while ago).
`----
But often, hashes/ references to anonymous hashes are not really a
good choice for representing objects because 'a hash' is essentially
an allocation scheme for array slots suposed to map all members of a
'large' (or numerically discontinuous) set of 'keys' to such array
slots by running a (seriously) lossy compression function on the key
in order to transform it into a slot number based on the assumption
that the actual number of different keys in the hash will always be
much smaller than the number of elements in the key set and combining
that with a mechansism to deal with so-called 'hash collision', cases
where the result of the lossy compression function is identical for
two keys which are actually different. *If* the number of actual
collisions in the hash table is small, key insert, delete and lookup
operations performed on it will complete in constant time, although
they are still fairly expensive because of the compression
function. In order to ensure that the number of collisions will be
small, hash tables are usually (and possibly significantly) larger
than what would be required if the keys could just be packed into
successive slots of the table.
Another problem with using hashes as object representation is that the
'namespace' of each individual hash is 'global': If two related
packages use the same name for an object property, they will end up
using the same 'virtual property slot' and there's no easy way for
preventing that.
A possible way would be to prefix the package name to each property
name and hope that all other packages also do this. But since package
names are often longish, this is an unwiedly workaround. A workaround
for the workaround would be to use declared constants whose values are
the long property names and whose names are short enough to be used
comfortably. But - alas - the {} autoquotes the key if it 'looks like
a string' and this means that
use constant NAME => __PACKAGE__.'name';
$h->{NAME} = 'Paul';
doesn't work. A workaround for the workaround for the workaround would
be to invoke the 'constant subroutine' explicitly,
$h->{NAME()} = 'Paul';
but the two added noise characters are rather ugly.
Lastly, the hash lookup which is a rather costly operation happens
every time such a named property is accessed.
A Better Idea: Arrays
---------------------
Accessing slots of an array is a cheap O(1) operation and arrays don't
need to be larger than necessary to accommodate the number of elements
stored in them in order to workaround the fact that slots are
allocated based on the result of a (seriously) lossy compression
function applied to the key. Since the Perl OO system is 'too minimal'
aka 'provides to many weird features', array references can also
serve as 'invocant objects' for method calls.
Two remaining problems are
1. Obviously, accessing instance properties by index number is not a
good idea except in very trivial cases.
2. Slot allocation for related classes.
(1) can be solved by declaring named constants with integer values and
using these in the source code. The nice properties of this are that
names declared in different packages don't collide, that the
translation into slot number happens at compile time and that the []
require no synactical workarounds for this case.
use constant NAME => 0;
$h->[NAME] = 'Emil';
works as intended.
It is possible to solve (2) by manageing slot numbers manually and
such a scheme can be made to work for single-inheritance class
hierarchies by using a 'magical name' to refer to the last slot number
used by a certain package. Slot numbers for 'derived class' can then
be calculated relative to the 'last used slot' number of the
superclass. But this is still fairly inconvenient and the
use constant PORP_A => 0;
use constant PORP_B => 1;
use constant PORP_C => 2;
requires a lot of typing.
Automatic Compile-Time Allocation of Array Slots
------------------------------------------------
The
use something (...)
feature can be used to execute a subroutine named something::import at
compile time. By combining this with the existing constant module and
some fairly simple state-tracking code, it is possible to create slot
name constants based on a use statement in a way which works for
single-inheritance class hierachies:
---------------
package slots;
use feature 'state';
use constant;
sub import
{
state %counters;
my ($ctr, $fields);
if (ref($_[1])) {
$fields = $_[1];
$ctr = 0;
} else {
$ctr = $counters{$_[1]};
$fields = $_[2];
}
@_ = ($_[0], {map { $_, $ctr++; } @$fields});
$counters{caller()} = $ctr;
goto &constant::import;
}
1;
---------------
Assuming this code exists as slots.pm somewhere where perl can find
it, a non-derived class could declare a set of 'slot names' by doing
use slots [qw(FORENAME SURENAME)];
and a class derived from this class could do
use slots ('Person', ['NATIONALITY']);
('Person' being the name of the superclass package) to request an
additional field which will have the next 'free' slot number.
NB: This is an idea I had yesterday and since the implementation was
so exceedingly simple, I thought sharing it might be a good idea. I
expect that there situations this simple code doesn't handle but it is
'good enough' to be useful.