Entities.pm - How does decode_entities work?

Discussion in 'Perl Misc' started by Dave Saville, Dec 16, 2010.

  1. Dave Saville

    Dave Saville Guest

    I needed to massage some text to put in a web page. I started out with
    some s/!/!;/g; type lines and then twigged that perl would most
    likely have a module to do it. HTML::Entities.pm. This works fine, but
    I then wanted to see *how* it did it.

    Having found the module, Entities.pm, I copied it to a tmp directory
    and modified the start of my test script from

    use HTML::Entities;

    to

    use lib '.';
    use Entities;

    I then started sticking in print statements and eventually worked out
    how the encode worked. I then tried to do the same with the decode
    only to get an error:

    Undefined subroutine &Entities::decode_entities called at try.pl line
    18.

    I then see that the sub line in Entities.pm is sub
    decode_entities_old. OK so it's not amazing it could not find it. But
    the question is how on earth does it work when the use HTML::Entities
    is in effect? Which it does. I ran a search down the entire perl tree
    looking for any file with a "sub decode_entities" in it and
    Entities.pm is the only file and then it is decode_entities_old. So
    how *does* it work?

    Is there some way to find out where perl is getting a particular
    routine from - rather like the *nix command line "which"?

    TIA

    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 16, 2010
    #1
    1. Advertising

  2. Dave Saville

    Uri Guttman Guest

    >>>>> "DS" == Dave Saville <> writes:

    DS> I needed to massage some text to put in a web page. I started out with
    DS> some s/!/!;/g; type lines and then twigged that perl would most
    DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
    DS> I then wanted to see *how* it did it.

    DS> Having found the module, Entities.pm, I copied it to a tmp directory
    DS> and modified the start of my test script from

    DS> use HTML::Entities;

    DS> to

    DS> use lib '.';
    DS> use Entities;

    what happened to the HTTP:: part? why did you think you could drop it?


    DS> I then started sticking in print statements and eventually worked out
    DS> how the encode worked. I then tried to do the same with the decode
    DS> only to get an error:

    DS> Undefined subroutine &Entities::decode_entities called at try.pl line
    DS> 18.

    you broke its exporting.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Dec 16, 2010
    #2
    1. Advertising

  3. Dave Saville

    Uri Guttman Guest

    >>>>> "DS" == Dave Saville <> writes:

    DS> I then see that the sub line in Entities.pm is sub
    DS> decode_entities_old. OK so it's not amazing it could not find it. But
    DS> the question is how on earth does it work when the use HTML::Entities
    DS> is in effect? Which it does. I ran a search down the entire perl tree
    DS> looking for any file with a "sub decode_entities" in it and
    DS> Entities.pm is the only file and then it is decode_entities_old. So
    DS> how *does* it work?

    DS> Is there some way to find out where perl is getting a particular
    DS> routine from - rather like the *nix command line "which"?

    if you read the source and look for decode_entities there is a comment
    which says where it is located.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Dec 16, 2010
    #3
  4. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 17:44:02 UTC, "Uri Guttman" <>
    wrote:

    > >>>>> "DS" == Dave Saville <> writes:

    >
    > DS> I needed to massage some text to put in a web page. I started out with
    > DS> some s/!/!;/g; type lines and then twigged that perl would most
    > DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
    > DS> I then wanted to see *how* it did it.
    >
    > DS> Having found the module, Entities.pm, I copied it to a tmp directory
    > DS> and modified the start of my test script from
    >
    > DS> use HTML::Entities;
    >
    > DS> to
    >
    > DS> use lib '.';
    > DS> use Entities;
    >
    > what happened to the HTTP:: part? why did you think you could drop it?
    >


    Er, what HTTP part? Did you mean the HTML part? I dropped it because
    my test version was not in an HTML directory. It was in the same
    directory as the test code.


    >
    > DS> I then started sticking in print statements and eventually worked out
    > DS> how the encode worked. I then tried to do the same with the decode
    > DS> only to get an error:
    >
    > DS> Undefined subroutine &Entities::decode_entities called at try.pl line
    > DS> 18.
    >
    > you broke its exporting.
    >


    How? It can find encode_entities fine. Why didn't that break? The
    EXPORT lines don't mention the higher HTML layer except in the package
    header and I removed that portion of it.

    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 16, 2010
    #4
  5. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 17:45:55 UTC, "Uri Guttman" <>
    wrote:

    > >>>>> "DS" == Dave Saville <> writes:

    >
    > DS> I then see that the sub line in Entities.pm is sub
    > DS> decode_entities_old. OK so it's not amazing it could not find it. But
    > DS> the question is how on earth does it work when the use HTML::Entities
    > DS> is in effect? Which it does. I ran a search down the entire perl tree
    > DS> looking for any file with a "sub decode_entities" in it and
    > DS> Entities.pm is the only file and then it is decode_entities_old. So
    > DS> how *does* it work?
    >
    > DS> Is there some way to find out where perl is getting a particular
    > DS> routine from - rather like the *nix command line "which"?
    >
    > if you read the source and look for decode_entities there is a comment
    > which says where it is located.


    Yes I see - a require for HTML::parser - But Parser does not have a
    decode_entities so I repeat *how* does the routine reference get
    resolved. I do not understand.

    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 16, 2010
    #5
  6. Dave Saville

    Uri Guttman Guest

    >>>>> "DS" == Dave Saville <> writes:

    DS> On Thu, 16 Dec 2010 17:44:02 UTC, "Uri Guttman" <>
    DS> wrote:

    >> >>>>> "DS" == Dave Saville <> writes:

    >>

    DS> I needed to massage some text to put in a web page. I started out with
    DS> some s/!/!;/g; type lines and then twigged that perl would most
    DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
    DS> I then wanted to see *how* it did it.
    >>

    DS> Having found the module, Entities.pm, I copied it to a tmp directory
    DS> and modified the start of my test script from
    >>

    DS> use HTML::Entities;
    >>

    DS> to
    >>

    DS> use lib '.';
    DS> use Entities;
    >>
    >> what happened to the HTTP:: part? why did you think you could drop it?
    >>


    DS> Er, what HTTP part? Did you mean the HTML part? I dropped it because
    DS> my test version was not in an HTML directory. It was in the same
    DS> directory as the test code.


    >>

    DS> I then started sticking in print statements and eventually worked out
    DS> how the encode worked. I then tried to do the same with the decode
    DS> only to get an error:
    >>

    DS> Undefined subroutine &Entities::decode_entities called at try.pl line
    DS> 18.
    >>
    >> you broke its exporting.
    >>


    DS> How? It can find encode_entities fine. Why didn't that break? The
    DS> EXPORT lines don't mention the higher HTML layer except in the package
    DS> header and I removed that portion of it.

    because you used only Entities. rtfm on how use works. it first finds
    and loads the module you requested (that worked). but then it calls the
    import method on the class name passed to use. the module's class is
    HTML::Entities but you used only Entities so it called
    Entities->import() which doesn't exist and so nothing got exported.

    you must use the proper class name when using a module to get
    exporting.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Dec 16, 2010
    #6
  7. Dave Saville

    Uri Guttman Guest

    >>>>> "DS" == Dave Saville <> writes:

    DS> On Thu, 16 Dec 2010 17:45:55 UTC, "Uri Guttman" <>
    DS> wrote:

    >> >>>>> "DS" == Dave Saville <> writes:

    >>

    DS> I then see that the sub line in Entities.pm is sub
    DS> decode_entities_old. OK so it's not amazing it could not find it. But
    DS> the question is how on earth does it work when the use HTML::Entities
    DS> is in effect? Which it does. I ran a search down the entire perl tree
    DS> looking for any file with a "sub decode_entities" in it and
    DS> Entities.pm is the only file and then it is decode_entities_old. So
    DS> how *does* it work?
    >>

    DS> Is there some way to find out where perl is getting a particular
    DS> routine from - rather like the *nix command line "which"?
    >>
    >> if you read the source and look for decode_entities there is a comment
    >> which says where it is located.


    DS> Yes I see - a require for HTML::parser - But Parser does not have a
    DS> decode_entities so I repeat *how* does the routine reference get
    DS> resolved. I do not understand.

    it is in XS as the comment says. so it is somewhere else in the build
    for HTML::parser. you need to explore deeper. and it will be in c for
    speed.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Dec 16, 2010
    #7
  8. Dave Saville wrote:
    > On Thu, 16 Dec 2010 17:45:55 UTC, "Uri Guttman" <>
    > wrote:
    >
    >>>>>>> "DS" == Dave Saville <> writes:

    >> DS> I then see that the sub line in Entities.pm is sub
    >> DS> decode_entities_old. OK so it's not amazing it could not find it. But
    >> DS> the question is how on earth does it work when the use HTML::Entities
    >> DS> is in effect? Which it does. I ran a search down the entire perl tree
    >> DS> looking for any file with a "sub decode_entities" in it and
    >> DS> Entities.pm is the only file and then it is decode_entities_old. So
    >> DS> how *does* it work?
    >>
    >> DS> Is there some way to find out where perl is getting a particular
    >> DS> routine from - rather like the *nix command line "which"?
    >>
    >> if you read the source and look for decode_entities there is a comment
    >> which says where it is located.

    >
    > Yes I see - a require for HTML::parser - But Parser does not have a
    > decode_entities so I repeat *how* does the routine reference get
    > resolved. I do not understand.


    The module is not pure Perl. It links in a C library. That is what
    XSLoader does.

    Xho
     
    Xho Jingleheimerschmidt, Dec 17, 2010
    #8
  9. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 23:40:01 UTC, "Uri Guttman" <>
    wrote:

    > >>>>> "DS" == Dave Saville <> writes:

    >
    > DS> On Thu, 16 Dec 2010 17:44:02 UTC, "Uri Guttman" <>
    > DS> wrote:
    >
    > >> >>>>> "DS" == Dave Saville <> writes:
    > >>

    > DS> I needed to massage some text to put in a web page. I started out with
    > DS> some s/!/!;/g; type lines and then twigged that perl would most
    > DS> likely have a module to do it. HTML::Entities.pm. This works fine, but
    > DS> I then wanted to see *how* it did it.
    > >>

    > DS> Having found the module, Entities.pm, I copied it to a tmp directory
    > DS> and modified the start of my test script from
    > >>

    > DS> use HTML::Entities;
    > >>

    > DS> to
    > >>

    > DS> use lib '.';
    > DS> use Entities;
    > >>
    > >> what happened to the HTTP:: part? why did you think you could drop it?
    > >>

    >
    > DS> Er, what HTTP part? Did you mean the HTML part? I dropped it because
    > DS> my test version was not in an HTML directory. It was in the same
    > DS> directory as the test code.
    >
    >
    > >>

    > DS> I then started sticking in print statements and eventually worked out
    > DS> how the encode worked. I then tried to do the same with the decode
    > DS> only to get an error:
    > >>

    > DS> Undefined subroutine &Entities::decode_entities called at try.pl line
    > DS> 18.
    > >>
    > >> you broke its exporting.
    > >>

    >
    > DS> How? It can find encode_entities fine. Why didn't that break? The
    > DS> EXPORT lines don't mention the higher HTML layer except in the package
    > DS> header and I removed that portion of it.
    >
    > because you used only Entities. rtfm on how use works. it first finds
    > and loads the module you requested (that worked). but then it calls the
    > import method on the class name passed to use. the module's class is
    > HTML::Entities but you used only Entities so it called
    > Entities->import() which doesn't exist and so nothing got exported.
    >
    > you must use the proper class name when using a module to get
    > exporting.
    >


    Please read what I wrote. :)

    Calling script has use "Entities;"

    Entities.pm has "package Entities;"

    HTML:: is not mentioned anywhere else.


    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 17, 2010
    #9
  10. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 23:40:51 UTC, "Uri Guttman" <>
    wrote:

    > >>>>> "DS" == Dave Saville <> writes:


    <snip>

    > DS> Yes I see - a require for HTML::parser - But Parser does not have a
    > DS> decode_entities so I repeat *how* does the routine reference get
    > DS> resolved. I do not understand.
    >
    > it is in XS as the comment says. so it is somewhere else in the build
    > for HTML::parser. you need to explore deeper. and it will be in c for
    > speed.
    >


    OK found it thanks. You are right it's in c - never thought to look in
    other than .pm files :-(


    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 17, 2010
    #10
  11. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 23:40:51 UTC, "Uri Guttman" <>
    wrote:

    <snip>
    > it is in XS as the comment says. so it is somewhere else in the build
    > for HTML::parser. you need to explore deeper. and it will be in c for
    > speed.


    Having had a poke around the c code, I still don't understand *how*
    the routine is found.

    HTML::Entities exports encode_entities and decode_entities, which it
    does not have, plus a few other things.
    It requires HTML:: Parser (which requires HTML::Entities) which does
    have decode_entities buried in the .XS but not in the .pm. My
    understanding of "use" is that it looks for <whatever>.pm in @INC
    directories or if of the form FOO::bar it looks for FOO/bar.pm in
    @INC. Further, "use lib some-directory;" prepends that directory to
    @INC. so

    use lib dir1;
    use lib dir2;

    results in searching dir2, dir1 and then @INC.

    So HTML::Entities is exporting a routine it does not have and
    HTML::parser is supplying it but does not appear to export it - and it
    works.

    What would be nice would be:

    print which <some exported thingy> and it tells you *where* it comes
    from.
    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 17, 2010
    #11
  12. Dave Saville

    Dave Saville Guest

    On Thu, 16 Dec 2010 23:40:51 UTC, "Uri Guttman" <>
    wrote:

    <snip>

    > it is in XS as the comment says. so it is somewhere else in the build
    > for HTML::parser. you need to explore deeper. and it will be in c for
    > speed.


    Just read up on XS which I had not met before - I *think* I now
    understand. A module defined with XS puts it's "exported" stuff direct
    to the interpreter using the perl API - which I guess is searched
    *before* @INC.
    --
    Regards
    Dave Saville
     
    Dave Saville, Dec 17, 2010
    #12
  13. Dave Saville

    Uri Guttman Guest

    >>>>> "DS" == Dave Saville <> writes:

    DS> On Thu, 16 Dec 2010 23:40:01 UTC, "Uri Guttman" <>
    DS> wrote:

    DS> Please read what I wrote. :)

    DS> Calling script has use "Entities;"

    DS> Entities.pm has "package Entities;"

    DS> HTML:: is not mentioned anywhere else.

    from /usr/lib/perl5/HTML/Entities.pm:

    package HTML::Entities;

    and since it doesn't even export that sub anymore, you need to check the
    XS code and see what package it uses. of course it will be
    HTML::Entities since that is the proper name for the module. again, you
    need to use what IT wants and not what you think it wants to import the
    sub.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Dec 17, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. R Paley
    Replies:
    2
    Views:
    520
    Paul Uiterlinden
    Nov 20, 2004
  2. Richard Bell

    decode_entities possible bug?

    Richard Bell, May 29, 2004, in forum: Perl
    Replies:
    6
    Views:
    610
    Richard Bell
    May 31, 2004
  3. TB
    Replies:
    2
    Views:
    3,788
  4. Richard Bell

    decode_entities possible bug?

    Richard Bell, May 29, 2004, in forum: Perl Misc
    Replies:
    14
    Views:
    160
  5. Jim Higson
    Replies:
    3
    Views:
    250
    Eric Amick
    Jul 25, 2004
Loading...

Share This Page