Anyone care to explain this one?

Discussion in 'Perl Misc' started by David Liang, Sep 11, 2009.

  1. David Liang

    David Liang Guest

    So, on my machine this gives me

    $ echo "abc" | perl -pe 'tr/a-z/a-m/cd'
    abck

    From reading the man pages it seems to me it should have deleted the
    complement of a-z unless a character is in the replacement list, but
    where did the "k" come from?

    Even more perplexing is

    $ echo "abc123op" | perl -pe 'tr/a-z/0-k/cd'
    abcabcop:

    My LC_COLLATE is "C", LANG is "en_US.utf8", and I'm running Perl
    5.10.0.
    David Liang, Sep 11, 2009
    #1
    1. Advertising

  2. David Liang

    Alan Curry Guest

    In article <>,
    David Liang <> wrote:
    >So, on my machine this gives me
    >
    >$ echo "abc" | perl -pe 'tr/a-z/a-m/cd'
    >abck
    >
    >From reading the man pages it seems to me it should have deleted the
    >complement of a-z unless a character is in the replacement list, but
    >where did the "k" come from?


    echo "abc" generates 4 characters including the newline. So you have the
    equivalent of

    perl -e '$_ = "abc\n" ; tr/a-z/a-m/cd; print'

    Notice that your "abck" was not followed by a newline on perl's stdout! The
    result when I ran it actually looked like this:

    $ echo "abc" | perl -pe 'tr/a-z/a-m/cd'
    abck$

    with the shell prompt glued to the k.

    Why k? Well, what is the complement of the set a-z? It's the set of all
    characters, that aren't a-z. The first character that's not in a-z is "\0".
    The next is "\1", then "\2", etc. So the next equivalent to your original
    operation is:

    perl -e '$_ = "abc\n" ; tr/\0-`{-\377/a-m/d; print'

    (I'm not sure \377 is the proper upper limit in this age of large charsets,
    but you get the idea.) The "`" is the character before "a" in ASCII, and the
    "{" is the character after "z".

    So what happened? The 13 replacement characters a-m were matched up against
    the first 13 characters in the search list:

    \0 \1 \2 \3 \4 \5 \6 \7 \10 \11 \12 \13 \14
    a b c d e f g h i j k l m

    \12 (octal 12, decimal 10) is also known as \n, the newline character. So it
    got translated to k. The "abc" input characters didn't match anything in the
    search list (they belong to the a-z set that was complemented out) so they
    pass through unchanged. If you had provided any input characters that were
    neither a-z nor \0-\14 they would have been matched and removed because of
    the /d modifier.

    I don't know if it would ever be a good idea to use the /c modifier, the /d
    modifier, and a non-empty replacement list all in a single tr/// operation.
    Having explained in detail what it did and why, it seems like even if that's
    what you wanted to do, you should find a less obfuscated way to do it.

    >
    >Even more perplexing is
    >
    >$ echo "abc123op" | perl -pe 'tr/a-z/0-k/cd'
    >abcabcop:


    Just like above, this means a-z pass through unchanged, but this time the
    replacement list is much longer. "0" is "\x30" and "k" is "\x6b" in ASCII, so
    you have input characters "\0" through "\x3b" being translated to 0-k. This
    happens to include 0-9 ("\x30" through "\x39") being translated to "\x60"
    through "\x69" (a-i). And the newline became a colon this time. The complete
    translation table you've asked for is:

    \0 \1 \2 \3 \4 \5 \6 \7 \10 \11 \12 \13 \14 \15 \16 \17
    0 1 2 3 4 5 6 7 8 9 : ; < = > ?

    \20 \21 \22 \23 \24 \25 \26 \27 \30 \31 \32 \33 \34 \35 \36 \37
    @ A B C D E F G H I J K L M N O

    SPC ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ;
    P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k

    and anything in the input that's neither a-z nor \0-\x3b would be deleted,
    but once again you didn't include any of those.

    --
    Alan Curry
    Alan Curry, Sep 11, 2009
    #2
    1. Advertising

  3. David Liang

    David Liang Guest

    Ahh... Thanks for that elucidating explanation!
    David Liang, Sep 11, 2009
    #3
  4. David Liang

    David Liang Guest

    On Sep 11, 3:53 am, (Alan Curry) wrote:

    > I don't know if it would ever be a good idea to use the /c modifier, the /d
    > modifier, and a non-empty replacement list all in a single tr/// operation.
    > Having explained in detail what it did and why, it seems like even if that's
    > what you wanted to do, you should find a less obfuscated way to do it.
    >


    I was trying out those strange cases because I was writing a semi-
    clone of the transliteration operator for Python:

    http://github.com/bmdavll/StringTransform

    Thanks again for the explanation--it really cleared up the whole
    complements deal for me.
    David Liang, Sep 19, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gregory Huffman

    How to include don't care minterms

    Gregory Huffman, Jan 26, 2004, in forum: VHDL
    Replies:
    3
    Views:
    7,466
    Gregory Huffman
    Jan 28, 2004
  2. bittor

    Don't care signals

    bittor, Feb 14, 2005, in forum: VHDL
    Replies:
    1
    Views:
    5,366
    Tim Hubberstey
    Feb 14, 2005
  3. Darrel
    Replies:
    3
    Views:
    458
  4. Richard
    Replies:
    7
    Views:
    385
    Richard
    Jan 26, 2004
  5. Chris Gehlker

    Does anyone care about ri color scheme?

    Chris Gehlker, Aug 21, 2006, in forum: Ruby
    Replies:
    8
    Views:
    126
    Marc Heiler
    Aug 24, 2006
Loading...

Share This Page