Don't understand this syntax -

Discussion in 'Perl Misc' started by Larry, Feb 16, 2005.

  1. Larry

    Larry Guest

    I'm a novice trying to learn some perl, and am trying to understand some
    syntax used in a formmail script to seperate a list of names.

    The use is - (split /\s*,\s*/, $recipient)

    I understand split, but can't see why the \s*,\s* is used as the marker. It
    seems to evaluate to s,s so why the escapes and asterisks?

    I'm sure it's quite simple, but I can't see it??

    Thanks much,
    Larry L
    Larry, Feb 16, 2005
    #1
    1. Advertising

  2. Larry wrote:

    > I'm a novice trying to learn some perl, and am trying to understand some
    > syntax used in a formmail script to seperate a list of names.


    First off, understand this: If you're referring to Matt Wright's formmail,
    don't even bother trying to understand it. It's horribly written, and the
    only useful purpose it has is to serve as an example of how *not* to write
    Perl.

    > The use is - (split /\s*,\s*/, $recipient)
    >
    > I understand split, but can't see why the \s*,\s* is used as the marker.
    > It seems to evaluate to s,s so why the escapes and asterisks?


    Check the docs for the function - "perldoc -f split" and see what it says.
    Notice how the first argument is listed as /PATTERN/? That means it's a
    regex, so the escapes and asterisks have the same meaning they do in other
    regexes. Let's break it down:

    / # Begin the pattern
    \s* # Match any number (including zero) of whitespace (\s)
    # characters
    , # Match a comma
    \s* # Again, any number of whitespace characters
    / # End pattern

    So "," " ," " , " would all match the pattern.

    Have a look at the following perldocs for more about regexes:

    perldoc perlrequick
    perldoc perlretut
    perldoc perlre

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Feb 16, 2005
    #2
    1. Advertising

  3. Larry

    Guest

    Sherm Pendley wrote:

    > Check the docs for the function - "perldoc -f split" and see what it

    says.
    > Notice how the first argument is listed as /PATTERN/? That means it's

    a
    > regex, so the escapes and asterisks have the same meaning they do in

    other
    > regexes. Let's break it down:


    if I have a string:

    my $string = "hello, how are you?";
    my @string = split ' ', $string;

    The extra white spaces are all removed as if I had used /\s+/, except,
    the docs say that they are not quite the same because leading white
    space is removed with ' '. By 'leading white space' do they mean
    before each word or just at the beginning of the string? Because this
    produces completely different results:

    my $string = "hello::::how::::are:::::::you?";
    my @string = split ':', $string;

    now you get a bunch of null fields throughout the array.

    There seems to be more to the special properties of ' ' then the docs
    state.

    wana
    , Feb 16, 2005
    #3
  4. wrote:

    > if I have a string:
    >
    > my $string = "hello, how are you?";
    > my @string = split ' ', $string;
    >
    > The extra white spaces are all removed as if I had used /\s+/, except,
    > the docs say that they are not quite the same because leading white
    > space is removed with ' '. By 'leading white space' do they mean
    > before each word or just at the beginning of the string?


    The beginning of the string. Try it and see:

    #!/usr/bin/perl
    use warnings;
    use strict;
    use Data::Dumper;

    my $string = ' Hello, I am fine, thanks.';

    my @string = split ' ', $string;
    print Dumper(\@string);

    my @string2 = split /\s+/, $string;
    print Dumper(\@string2);

    Notice the difference. Splitting on ' ' doesn't give an empty element at the
    beginning of the list, whereas /\s+/ does give them.

    > There seems to be more to the special properties of ' ' then the docs
    > state.


    Not at all. The docs state:

    ... If PATTERN is also omitted, splits on whitespace (after skipping
    any leading whitespace). ...

    Later on, they say:

    As a special case, specifying a PATTERN of space (' ') will
    split on white space just as "split" with no arguments does.
    Thus, "split(' ')" can be used to emulate awk's default behav-
    ior, whereas "split(/ /)" will give you as many null initial
    fields as there are leading spaces.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Feb 16, 2005
    #4
  5. Larry

    Guest

    Sherm Pendley wrote:

    > > There seems to be more to the special properties of ' ' then the

    docs
    > > state.

    >
    > Not at all. The docs state:
    >
    > ... If PATTERN is also omitted, splits on whitespace (after

    skipping
    > any leading whitespace). ...
    >
    > Later on, they say:
    >
    > As a special case, specifying a PATTERN of space (' ') will
    > split on white space just as "split" with no arguments does.
    > Thus, "split(' ')" can be used to emulate awk's default behav-
    > ior, whereas "split(/ /)" will give you as many null initial
    > fields as there are leading spaces.


    so

    my @string = split ' ', $string;

    is like

    $string =~ s/^\s*//;
    my @string = split /\s+/, $string;

    which is two special properties that make it different from 'c' where c
    is any single character other than a single white space character. The
    first special property is the trimming of leading white space. The
    second is to match one or more white space characters as the delimiter.
    I just thought that maybe this wasn't clear in the doc's explanation.


    The comparison to awk is wasted on me because I don't have experience
    with it. I only started learning how to use a Unix-type system
    relatively recently. It's too bad it took Apple so long to adopt Jobs'
    concept of a Unix OS for the Mac or I might have had an additional 15
    years or so experience with it. I know, there's Linux, but hardware
    support was poor until now.

    wana
    , Feb 16, 2005
    #5
  6. wrote:

    > so
    >
    > my @string = split ' ', $string;
    >
    > is like
    >
    > $string =~ s/^\s*//;
    > my @string = split /\s+/, $string;


    Except that it doesn't change $string, so it's more like:

    $string =~ m/^\s*(.*)/;
    my @string = $1 ? split(/\s+/, $1) : ();

    > It's too bad it took Apple so long to adopt Jobs'
    > concept of a Unix OS for the Mac or I might have had an additional 15
    > years or so experience with it.


    There was MacPerl. There was also MachTen, although that wasn't all that
    great an environment to work in.

    But anyway - I don't see the reference to awk as being all that important.
    "Splits on whitespace (after skipping any leading whitespace)" is pretty
    clear all by itself, and you don't have to be familiar with awk or any
    other Unix-isms to understand that.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Feb 16, 2005
    #6
  7. Larry

    Larry Guest

    In article <>, Sherm Pendley <> wrote:
    >Larry wrote:
    >
    >> I'm a novice trying to learn some perl, and am trying to understand some
    >> syntax used in a formmail script to seperate a list of names.

    >
    >First off, understand this: If you're referring to Matt Wright's formmail,
    >don't even bother trying to understand it. It's horribly written, and the
    >only useful purpose it has is to serve as an example of how *not* to write
    >Perl.
    >
    >> The use is - (split /\s*,\s*/, $recipient)
    >>
    >> I understand split, but can't see why the \s*,\s* is used as the marker.
    >> It seems to evaluate to s,s so why the escapes and asterisks?

    >
    >Check the docs for the function - "perldoc -f split" and see what it says.
    >Notice how the first argument is listed as /PATTERN/? That means it's a
    >regex, so the escapes and asterisks have the same meaning they do in other
    >regexes. Let's break it down:
    >
    > / # Begin the pattern
    > \s* # Match any number (including zero) of whitespace (\s)
    > # characters
    > , # Match a comma
    > \s* # Again, any number of whitespace characters
    > / # End pattern
    >
    >So "," " ," " , " would all match the pattern.
    >
    >Have a look at the following perldocs for more about regexes:
    >
    > perldoc perlrequick
    > perldoc perlretut
    > perldoc perlre
    >
    >sherm--


    Sherm,

    Thanks much. It was, of course "casual to the most obvious observer". I had
    actually looked at numerous lists of the common escape characters, but none
    had the \s on them. I've used lots of others, but apparantly just never needed
    a space before, so didn't know that one.

    Back to studying regexes a little more closely.

    Larry L
    Larry, Feb 16, 2005
    #7
  8. Larry wrote:

    > had actually looked at numerous lists of the common escape characters, but
    > none had the \s on them. I've used lots of others, but apparantly just
    > never needed a space before, so didn't know that one.


    A bit of a nit-pick: "\s" doesn't just match "a space", it matches *any*
    whitespace character. That includes spaces, tabs, carriage returns,
    newlines, and possibly other whitespace characters I'm forgetting at the
    moment.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Feb 16, 2005
    #8
  9. Sherm Pendley wrote:
    > Larry wrote:
    >
    >>had actually looked at numerous lists of the common escape characters, but
    >>none had the \s on them. I've used lots of others, but apparantly just
    >>never needed a space before, so didn't know that one.

    >
    > A bit of a nit-pick: "\s" doesn't just match "a space", it matches *any*
    > whitespace character. That includes spaces, tabs, carriage returns,
    > newlines, and possibly other whitespace characters I'm forgetting at the
    > moment.


    \s covers those four and the formfeed character. If you want the vertical tab
    as well, you have to use the POSIX character class [:space:].


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Feb 17, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Saunders

    Don't understand syntax error

    Chris Saunders, May 16, 2005, in forum: C Programming
    Replies:
    9
    Views:
    462
    CBFalconer
    May 17, 2005
  2. Stef Mientki
    Replies:
    0
    Views:
    575
    Stef Mientki
    Oct 20, 2008
  3. Terry Reedy
    Replies:
    0
    Views:
    684
    Terry Reedy
    Oct 20, 2008
  4. Antoon Pardon

    I don't understand this syntax error

    Antoon Pardon, Jan 19, 2009, in forum: Python
    Replies:
    1
    Views:
    253
    Peter Otten
    Jan 19, 2009
  5. Albert Schlef

    Syntax error I don't understand

    Albert Schlef, Dec 28, 2008, in forum: Ruby
    Replies:
    9
    Views:
    218
    Brian Candler
    Dec 28, 2008
Loading...

Share This Page