Confused about Schwartz idiom utilizing map & split

Discussion in 'Perl Misc' started by weston, Mar 3, 2006.

  1. weston

    weston Guest

    In an article on Stonehenge.com on using libxml2 to strip html from a
    document, I came across a part of the listing that I'm having trouble
    understanding. Randall apparently creates a hash of approved tags and
    their attributes with these lines:

    =9= my %PERMITTED =
    =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
    =11= split /\n/, <<'END';
    =12= a href name target class title
    =13= b
    =14= big
    =15= blockquote class
    ....
    =49= END

    (See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

    I keep trying to parse line 10 in my head and am not getting a lot of
    mental traction in really understanding how this works. Anybody want to
    help?
     
    weston, Mar 3, 2006
    #1
    1. Advertising

  2. weston

    Dr.Ruud Guest

    weston schreef:
    > In an article on Stonehenge.com on using libxml2 to strip html from a
    > document, I came across a part of the listing that I'm having trouble
    > understanding. Randall apparently creates a hash of approved tags and
    > their attributes with these lines:
    >
    > =9= my %PERMITTED =
    > =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
    > =11= split /\n/, <<'END';
    > =12= a href name target class title
    > =13= b
    > =14= big
    > =15= blockquote class
    > ....
    > =49= END
    >
    > (See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )
    >
    > I keep trying to parse line 10 in my head and am not getting a lot of
    > mental traction in really understanding how this works. Anybody want
    > to help?


    Maybe this helps:

    #!/usr/bin/perl
    use strict; use warnings;
    use Data::Dumper;

    my %PERMITTED =
    map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
    split /\n/, <<'END';
    a href name target class title
    b
    big
    blockquote class
    ....
    END

    print Data::Dumper->Dump( [\%PERMITTED]
    , [qw(%PERMITTED)]
    ), "\n";

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Mar 4, 2006
    #2
    1. Advertising

  3. >>>>> "weston" == weston <> writes:

    weston> In an article on Stonehenge.com on using libxml2 to strip html from a
    weston> document, I came across a part of the listing that I'm having trouble
    weston> understanding. Randall apparently creates a hash of approved tags and
    weston> their attributes with these lines:

    weston> =9= my %PERMITTED =
    weston> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
    weston> =11= split /\n/, <<'END';
    weston> =12= a href name target class title
    weston> =13= b
    weston> =14= big
    weston> =15= blockquote class
    weston> ....
    weston> =49= END

    weston> (See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

    weston> I keep trying to parse line 10 in my head and am not getting a lot of
    weston> mental traction in really understanding how this works. Anybody want to
    weston> help?

    Heh.

    The split on line 11 creates elements like:

    "a href name target class title",
    "b",
    "big",
    "blockquote class",

    etc. The map on the beginning of line 10 sets $_ equal to each of those,
    and looks for a list-valued return from the block.

    The split in the middle of line 10 breaks each of those elements listed above
    into a list, and assigns the first to $k, and any remaining ones to @v.

    The second map on line 10 converts @v to a list of elements of @v alternating
    with the value "1", and then turns that into a hashref, so that @v becomes
    keys, with values 1. That hashref is then added along with $k to be
    two values that eventually contribute to %PERMITTED.

    But didn't I say all this in the article? :)

    print "Just another Perl hacker,"; # the original
    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    *** Free account sponsored by SecureIX.com ***
    *** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***
     
    Randal L. Schwartz, Mar 4, 2006
    #3
  4. weston <> wrote:

    > In an article on Stonehenge.com on using libxml2 to strip html from a
    > document, I came across a part of the listing that I'm having trouble
    > understanding. Randall apparently creates a hash of approved tags and
    > their attributes with these lines:


    > =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }


    > I keep trying to parse line 10 in my head and am not getting a lot of
    > mental traction in really understanding how this works. Anybody want to
    > help?



    Does this help?

    ------------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use Data::Dumper;

    my %PERMITTED =
    map { my($k, @v) = split; # 1st space-sep'd field is tag, rest are its attrs
    ($k, {map {$_, 1} @v}) # a 2-element list. 1st is tag,
    # 2nd is a hash-ref with keys as attr names,
    # and values set to one
    }
    split /\n/, <<'END';
    a href name target class title
    b
    big
    blockquote class
    END

    print Dumper \%PERMITTED;
    ------------------------------


    Or maybe it would help to "unroll" the maps into foreachs:

    ------------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use Data::Dumper;

    my %PERMITTED;

    foreach (split /\n/, <<'END')
    a href name target class title
    b
    big
    blockquote class
    END
    {
    my($k, @v) = split;
    my %h;
    foreach ( @v ) { # "unroll" {map {$_, 1} @v
    $h{$_} = 1;
    }
    $PERMITTED{$k} = \%h;
    }

    print Dumper \%PERMITTED;
    ------------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 4, 2006
    #4
  5. weston

    Anno Siegel Guest

    weston <> wrote in comp.lang.perl.misc:
    > In an article on Stonehenge.com on using libxml2 to strip html from a
    > document, I came across a part of the listing that I'm having trouble
    > understanding. Randall apparently creates a hash of approved tags and


    Who is this Randall you speak of?

    > their attributes with these lines:


    Randal's code constructs a hash of hashes. The first word in each data
    line is a primary key. The rest of the words in each line (if any)
    become the keys of an inner hash, all with the value 1. Presumably
    the inner hash represents a set of whatever, associated with the primary
    key.

    > =9= my %PERMITTED =
    > =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
    > =11= split /\n/, <<'END';
    > =12= a href name target class title
    > =13= b
    > =14= big
    > =15= blockquote class
    > ....
    > =49= END


    How does it do that? Rewriting the code with fewer map's and more
    variable names may help. (untested)

    my @lines = split /\n/, <<'END';
    a href name target class title
    b
    big
    blockquote class
    END

    my %PERMITTED;

    for my $line ( @lines ) {
    my ($primary_key, @words) = split; # ($k, @v) in the original code
    # build wordlist
    my @wordlist; # alternating one word and one 1 (for hash initialization)
    for my $word ( @v ) {
    push @wordlist, ( $word => 1);
    }
    # build a hash out of @wordlist and assign it to its place
    $PERMITTED{ $k} = { @wordlist};
    }

    > I keep trying to parse line 10 in my head and am not getting a lot of
    > mental traction in really understanding how this works. Anybody want to
    > help?


    Line 10 does basically what the (outer) for-loop does in my code. The
    inner for-loop does the job of the nested map.

    Randal's code is that of a fluent speaker of Perl. Its parts (the two map's)
    are two well-known idioms for hash-building. Applied together, they may
    look like a mess, but once you recognize the pattern of each their
    interaction becomes clear too.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Mar 4, 2006
    #5
  6. weston

    Dr.Ruud Guest

    Tad McClellan schreef:

    > print Dumper \%PERMITTED;



    Alternative:

    print Data::Dumper->Dump( [\%var], ['%var'] );

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Mar 4, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TC
    Replies:
    5
    Views:
    7,728
    =?Utf-8?B?R2lyaXNoS3VtYXI=?=
    Sep 1, 2004
  2. John Harrison

    schwartz counters?

    John Harrison, Jun 26, 2004, in forum: C++
    Replies:
    1
    Views:
    1,121
    John Carson
    Jun 27, 2004
  3. Dave Townsend

    Schwartz counters.

    Dave Townsend, Jun 26, 2004, in forum: C++
    Replies:
    2
    Views:
    365
    Alf P. Steinbach
    Jun 27, 2004
  4. Ramon F Herrera
    Replies:
    0
    Views:
    276
    Ramon F Herrera
    May 8, 2008
  5. hudson

    and Randal L. Schwartz the hacker

    hudson, Aug 20, 2003, in forum: Perl Misc
    Replies:
    10
    Views:
    154
    Matt Garrish
    Aug 22, 2003
Loading...

Share This Page