Entities.pm - How does decode_entities work?

D

Dave Saville

I needed to massage some text to put in a web page. I started out with
some s/!/!;/g; type lines and then twigged that perl would most
likely have a module to do it. HTML::Entities.pm. This works fine, but
I then wanted to see *how* it did it.

Having found the module, Entities.pm, I copied it to a tmp directory
and modified the start of my test script from

use HTML::Entities;

to

use lib '.';
use Entities;

I then started sticking in print statements and eventually worked out
how the encode worked. I then tried to do the same with the decode
only to get an error:

Undefined subroutine &Entities::decode_entities called at try.pl line
18.

I then see that the sub line in Entities.pm is sub
decode_entities_old. OK so it's not amazing it could not find it. But
the question is how on earth does it work when the use HTML::Entities
is in effect? Which it does. I ran a search down the entire perl tree
looking for any file with a "sub decode_entities" in it and
Entities.pm is the only file and then it is decode_entities_old. So
how *does* it work?

Is there some way to find out where perl is getting a particular
routine from - rather like the *nix command line "which"?

TIA
 
U

Uri Guttman

DS> I needed to massage some text to put in a web page. I started out with
DS> some s/!/!;/g; type lines and then twigged that perl would most
DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
DS> I then wanted to see *how* it did it.

DS> Having found the module, Entities.pm, I copied it to a tmp directory
DS> and modified the start of my test script from

DS> use HTML::Entities;

DS> to

DS> use lib '.';
DS> use Entities;

what happened to the HTTP:: part? why did you think you could drop it?


DS> I then started sticking in print statements and eventually worked out
DS> how the encode worked. I then tried to do the same with the decode
DS> only to get an error:

DS> Undefined subroutine &Entities::decode_entities called at try.pl line
DS> 18.

you broke its exporting.

uri
 
U

Uri Guttman

DS> I then see that the sub line in Entities.pm is sub
DS> decode_entities_old. OK so it's not amazing it could not find it. But
DS> the question is how on earth does it work when the use HTML::Entities
DS> is in effect? Which it does. I ran a search down the entire perl tree
DS> looking for any file with a "sub decode_entities" in it and
DS> Entities.pm is the only file and then it is decode_entities_old. So
DS> how *does* it work?

DS> Is there some way to find out where perl is getting a particular
DS> routine from - rather like the *nix command line "which"?

if you read the source and look for decode_entities there is a comment
which says where it is located.

uri
 
D

Dave Saville

DS> I needed to massage some text to put in a web page. I started out with
DS> some s/!/!;/g; type lines and then twigged that perl would most
DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
DS> I then wanted to see *how* it did it.

DS> Having found the module, Entities.pm, I copied it to a tmp directory
DS> and modified the start of my test script from

DS> use HTML::Entities;

DS> to

DS> use lib '.';
DS> use Entities;

what happened to the HTTP:: part? why did you think you could drop it?

Er, what HTTP part? Did you mean the HTML part? I dropped it because
my test version was not in an HTML directory. It was in the same
directory as the test code.

DS> I then started sticking in print statements and eventually worked out
DS> how the encode worked. I then tried to do the same with the decode
DS> only to get an error:

DS> Undefined subroutine &Entities::decode_entities called at try.pl line
DS> 18.

you broke its exporting.

How? It can find encode_entities fine. Why didn't that break? The
EXPORT lines don't mention the higher HTML layer except in the package
header and I removed that portion of it.
 
D

Dave Saville

DS> I then see that the sub line in Entities.pm is sub
DS> decode_entities_old. OK so it's not amazing it could not find it. But
DS> the question is how on earth does it work when the use HTML::Entities
DS> is in effect? Which it does. I ran a search down the entire perl tree
DS> looking for any file with a "sub decode_entities" in it and
DS> Entities.pm is the only file and then it is decode_entities_old. So
DS> how *does* it work?

DS> Is there some way to find out where perl is getting a particular
DS> routine from - rather like the *nix command line "which"?

if you read the source and look for decode_entities there is a comment
which says where it is located.

Yes I see - a require for HTML::parser - But Parser does not have a
decode_entities so I repeat *how* does the routine reference get
resolved. I do not understand.
 
U

Uri Guttman

DS> I needed to massage some text to put in a web page. I started out with
DS> some s/!/!;/g; type lines and then twigged that perl would most
DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
DS> I then wanted to see *how* it did it.DS> Having found the module, Entities.pm, I copied it to a tmp directory
DS> and modified the start of my test script fromDS> use lib '.';
DS> use Entities;
DS> Er, what HTTP part? Did you mean the HTML part? I dropped it because
DS> my test version was not in an HTML directory. It was in the same
DS> directory as the test code.

DS> I then started sticking in print statements and eventually worked out
DS> how the encode worked. I then tried to do the same with the decode
DS> only to get an error:DS> Undefined subroutine &Entities::decode_entities called at try.pl line
DS> 18.
DS> How? It can find encode_entities fine. Why didn't that break? The
DS> EXPORT lines don't mention the higher HTML layer except in the package
DS> header and I removed that portion of it.

because you used only Entities. rtfm on how use works. it first finds
and loads the module you requested (that worked). but then it calls the
import method on the class name passed to use. the module's class is
HTML::Entities but you used only Entities so it called
Entities->import() which doesn't exist and so nothing got exported.

you must use the proper class name when using a module to get
exporting.

uri
 
U

Uri Guttman

DS> I then see that the sub line in Entities.pm is sub
DS> decode_entities_old. OK so it's not amazing it could not find it. But
DS> the question is how on earth does it work when the use HTML::Entities
DS> is in effect? Which it does. I ran a search down the entire perl tree
DS> looking for any file with a "sub decode_entities" in it and
DS> Entities.pm is the only file and then it is decode_entities_old. So
DS> how *does* it work?DS> Is there some way to find out where perl is getting a particular
DS> routine from - rather like the *nix command line "which"?
DS> Yes I see - a require for HTML::parser - But Parser does not have a
DS> decode_entities so I repeat *how* does the routine reference get
DS> resolved. I do not understand.

it is in XS as the comment says. so it is somewhere else in the build
for HTML::parser. you need to explore deeper. and it will be in c for
speed.

uri
 
X

Xho Jingleheimerschmidt

Dave said:
Yes I see - a require for HTML::parser - But Parser does not have a
decode_entities so I repeat *how* does the routine reference get
resolved. I do not understand.

The module is not pure Perl. It links in a C library. That is what
XSLoader does.

Xho
 
D

Dave Saville

DS> I needed to massage some text to put in a web page. I started out with
DS> some s/!/!;/g; type lines and then twigged that perl would most
DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
DS> I then wanted to see *how* it did it.
DS> Having found the module, Entities.pm, I copied it to a tmp directory
DS> and modified the start of my test script from
DS> use lib '.';
DS> use Entities;

DS> Er, what HTTP part? Did you mean the HTML part? I dropped it because
DS> my test version was not in an HTML directory. It was in the same
DS> directory as the test code.


DS> I then started sticking in print statements and eventually worked out
DS> how the encode worked. I then tried to do the same with the decode
DS> only to get an error:
DS> Undefined subroutine &Entities::decode_entities called at try.pl line
DS> 18.

DS> How? It can find encode_entities fine. Why didn't that break? The
DS> EXPORT lines don't mention the higher HTML layer except in the package
DS> header and I removed that portion of it.

because you used only Entities. rtfm on how use works. it first finds
and loads the module you requested (that worked). but then it calls the
import method on the class name passed to use. the module's class is
HTML::Entities but you used only Entities so it called
Entities->import() which doesn't exist and so nothing got exported.

you must use the proper class name when using a module to get
exporting.

Please read what I wrote. :)

Calling script has use "Entities;"

Entities.pm has "package Entities;"

HTML:: is not mentioned anywhere else.
 
D

Dave Saville

DS> Yes I see - a require for HTML::parser - But Parser does not have a
DS> decode_entities so I repeat *how* does the routine reference get
DS> resolved. I do not understand.

it is in XS as the comment says. so it is somewhere else in the build
for HTML::parser. you need to explore deeper. and it will be in c for
speed.

OK found it thanks. You are right it's in c - never thought to look in
other than .pm files :-(
 
D

Dave Saville

it is in XS as the comment says. so it is somewhere else in the build
for HTML::parser. you need to explore deeper. and it will be in c for
speed.

Having had a poke around the c code, I still don't understand *how*
the routine is found.

HTML::Entities exports encode_entities and decode_entities, which it
does not have, plus a few other things.
It requires HTML:: Parser (which requires HTML::Entities) which does
have decode_entities buried in the .XS but not in the .pm. My
understanding of "use" is that it looks for <whatever>.pm in @INC
directories or if of the form FOO::bar it looks for FOO/bar.pm in
@INC. Further, "use lib some-directory;" prepends that directory to
@INC. so

use lib dir1;
use lib dir2;

results in searching dir2, dir1 and then @INC.

So HTML::Entities is exporting a routine it does not have and
HTML::parser is supplying it but does not appear to export it - and it
works.

What would be nice would be:

print which <some exported thingy> and it tells you *where* it comes
from.
 
D

Dave Saville

it is in XS as the comment says. so it is somewhere else in the build
for HTML::parser. you need to explore deeper. and it will be in c for
speed.

Just read up on XS which I had not met before - I *think* I now
understand. A module defined with XS puts it's "exported" stuff direct
to the interpreter using the perl API - which I guess is searched
*before* @INC.
 
U

Uri Guttman

DS> On Thu, 16 Dec 2010 23:40:01 UTC, "Uri Guttman" <[email protected]>
DS> wrote:

DS> Please read what I wrote. :)

DS> Calling script has use "Entities;"

DS> Entities.pm has "package Entities;"

DS> HTML:: is not mentioned anywhere else.

from /usr/lib/perl5/HTML/Entities.pm:

package HTML::Entities;

and since it doesn't even export that sub anymore, you need to check the
XS code and see what package it uses. of course it will be
HTML::Entities since that is the proper name for the module. again, you
need to use what IT wants and not what you think it wants to import the
sub.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top