handle tab-delimited file

Discussion in 'Perl Misc' started by Ela, Mar 15, 2008.

  1. Ela

    Ela Guest

    \t matches BOTH tab and space.

    How can I split the following line into 2 words instead of 5?

    1234\tI am a boy\n
    Ela, Mar 15, 2008
    #1
    1. Advertising

  2. Ela wrote:
    > \t matches BOTH tab and space.


    No, it doesn't.

    > How can I split the following line into 2 words instead of 5?
    >
    > 1234\tI am a boy\n


    split /\t/, "1234\tI am a boy\n"

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Mar 15, 2008
    #2
    1. Advertising

  3. Ela <> wrote:

    > \t matches BOTH tab and space.



    No it doesn't.

    \s matches tab and space (and 3 other characters).

    Is that what you meant?

    (we wouldn't need to ask this if you had posted real Perl code.)


    > How can I split the following line into 2 words instead of 5?
    >
    > 1234\tI am a boy\n



    use PSI::ESP;

    By spliting on \t rather than spliting on \s


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Mar 15, 2008
    #3
  4. Ela

    Ben Bullock Guest

    On Sat, 15 Mar 2008 14:10:12 +0000, Tad J McClellan wrote:

    > \s matches tab and space (and 3 other characters).


    Don't forget your Ogham space mark:

    #!/usr/bin/perl
    use warnings;
    use strict;
    use Unicode::UCD 'charinfo';
    sub count_match
    {
    my ($re)=@_;
    my $c;
    for my $n (0x00 .. 0xD7FF, 0xE000 .. 0xFDCF, 0xFDF0.. 0xFFFD) {
    if (chr($n) =~ /$re/) {
    my $ci = charinfo($n);
    print sprintf ('%02X', $n), " which is ", $$ci{name}
    , " matches\n";
    $c++;
    }
    }
    print "There are $c characters matching \"$re\".\n";
    }
    count_match('\s');

    which gives:

    09 which is <control> matches
    0A which is <control> matches
    0C which is <control> matches
    0D which is <control> matches
    20 which is SPACE matches
    1680 which is OGHAM SPACE MARK matches
    180E which is MONGOLIAN VOWEL SEPARATOR matches
    2000 which is EN QUAD matches
    2001 which is EM QUAD matches
    2002 which is EN SPACE matches
    2003 which is EM SPACE matches
    2004 which is THREE-PER-EM SPACE matches
    2005 which is FOUR-PER-EM SPACE matches
    2006 which is SIX-PER-EM SPACE matches
    2007 which is FIGURE SPACE matches
    2008 which is PUNCTUATION SPACE matches
    2009 which is THIN SPACE matches
    200A which is HAIR SPACE matches
    2028 which is LINE SEPARATOR matches
    2029 which is PARAGRAPH SEPARATOR matches
    202F which is NARROW NO-BREAK SPACE matches
    205F which is MEDIUM MATHEMATICAL SPACE matches
    3000 which is IDEOGRAPHIC SPACE matches
    There are 23 characters matching "\s".
    Ben Bullock, Mar 16, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DC Gringo
    Replies:
    2
    Views:
    1,896
    DC Gringo
    Apr 25, 2005
  2. =?ISO-8859-1?Q?KLEIN_St=E9phane?=
    Replies:
    3
    Views:
    441
    hanumizzle
    Oct 6, 2006
  3. RyanL
    Replies:
    6
    Views:
    671
    Paul McGuire
    Aug 28, 2007
  4. Replies:
    1
    Views:
    318
    ZedGama3
    Apr 14, 2004
  5. Srikant
    Replies:
    1
    Views:
    464
    Greg Bacon
    Sep 29, 2007
Loading...

Share This Page