Help with split using multiple delimiters

Discussion in 'Perl Misc' started by geeknc@yahoo.com, Jul 27, 2005.

  1. Guest

    I have a file that contains 5 elements per line each seperated by white
    space, however the 4th element is surrounded by quotes.

    Each line in a file looks like this:

    ItemA ItemB 1.1.1.1.1 "xxx xx xxxxxx" ItemD

    I was hoping to do something like this....

    ($a,$b,$c,$d,$e) = split(/split on white space or "...."/, $string);

    and end up with....

    $a = "ItemA";
    $b = "ItemB";
    $c = "1.1.1.1.1";
    $d = "xxx xx xxxxxx";
    $e = "ItemD";

    I have tried multiple delimiters, but nothing seems to return 5
    elements. Thank you, in advance, for any help you can offer.
     
    , Jul 27, 2005
    #1
    1. Advertising

  2. don't use split--use a regex.

    ($a, $b, $c, $d, $e) = $string =~
    /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

    or if using $_

    ($a, $b, $c, $d, $e) = /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

    you can wrap each element in double quotes later.

    you may be able to do

    @array = /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

    for (@array) {
    $_ = qq{"$_"};
    }
     
    it_says_BALLS_on_your forehead, Jul 27, 2005
    #2
    1. Advertising

  3. Paul Lalli Guest

    wrote:
    > I have a file that contains 5 elements per line each seperated by white
    > space, however the 4th element is surrounded by quotes.


    Can you explain what was wrong with the solution you found in the FAQ?
    You did, of course, search the FAQ before asking hundreds of other
    people for help, right?

    perldoc -q split
    How can I split a [character] delimited string except when
    inside [character]? (Comma-separated files)

    In your case, the first [character] is a space, the second is a
    double-quotes.

    Paul Lalli
     
    Paul Lalli, Jul 27, 2005
    #3
  4. James Taylor Guest

    In article <>,
    <> wrote:
    >
    > don't use split--use a regex.
    >
    > ($a, $b, $c, $d, $e) = $string =~
    > /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;


    If you don't know in advance which fields will be quoted,
    you can use this regex instead:

    my ($a, $b, $c, $d, $e) = $string =~ /("[^"]*"|\S+)/g;
    # but then you need to remove any quotes by saying:
    s/^"([^"]*)"$/$1/ foreach $a, $b, $c, $d, $e;

    If you don't mind the fields all going in one array, you
    could do it all in one go like this:

    my @fields;
    push @fields, $+ while $string =~ /"([^"]*)"|(\S+)/g;

    Of course, nothing stops you then assigning the @fields
    array to individual scalar variables:

    my ($a, $b, $c, $d, $e) = @fields;

    If a single line while loop with a fairly simple regex seems too
    easy or too efficient, you can always spend time reading up on
    the various CPAN modules suggested by the FAQ (perldoc -q split)
    work out how to setup the necessary OO object instances, how
    to call the provided methods to get the result you require,
    test that it does what you expect, pray that there are no
    earlier versions of the module around that are buggy, pray
    that no future versions will be buggy, load the whole module
    at compile time and hope that this and the method call interface
    don't hit performance too much, and then sit back and enjoy
    the somewhat dubious pleasures of OPC (Other People's Code)
    in the knowledge that at least you didn't have to do the
    work yourself. (Irony intended.)

    Even if you wanted to use a module, I note that the FAQ
    entry "How can I split a [character] delimited string except
    when inside [character]?" recommends the use of Text::CVS or
    Text::CVS_XS but I don't believe CVS is what's needed here. :)

    --
    James Taylor, London, UK PGP key: 3FBE1BF9
    To protect against spam, the address in the "From:" header is not valid.
    In any case, you should reply to the group so that everyone can benefit.
    If you must send me a private email, use james at oakseed demon co uk.
     
    James Taylor, Jul 29, 2005
    #4
  5. i don't know if that would work because of greedy matching. you may
    need a ? after your asterisk, to make it stingy matching.
     
    it_says_BALLS_on_your forehead, Jul 29, 2005
    #5
  6. Anno Siegel Guest

    James Taylor <> wrote in comp.lang.perl.misc:
    > In article <>,
    > <> wrote:


    [...]

    > Even if you wanted to use a module, I note that the FAQ
    > entry "How can I split a [character] delimited string except
    > when inside [character]?" recommends the use of Text::CVS or
    > Text::CVS_XS but I don't believe CVS is what's needed here. :)


    That must be a typo in the FAQ. s/CVS/CSV/g.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jul 29, 2005
    #6
  7. James Taylor Guest

    Simon, I'm not sure which bit of my post you were replying
    to, or even if it was me you were replying to, as you did
    not quote any context. I will therefore attempt to rebuild
    the relevant context below with the correct attributions.
    You probably need to get a better news reader if you can.

    In article <>,
    <> wrote:
    >
    > In article <>,
    > James Taylor wrote:
    > >
    > > If you don't know in advance which fields will be quoted,
    > > you can use this regex instead:
    > >
    > > my ($a, $b, $c, $d, $e) = $string =~ /("[^"]*"|\S+)/g;
    > > # but then you need to remove any quotes by saying:
    > > s/^"([^"]*)"$/$1/ foreach $a, $b, $c, $d, $e;
    > >
    > > If you don't mind the fields all going in one array, you
    > > could do it all in one go like this:
    > >
    > > my @fields;
    > > push @fields, $+ while $string =~ /"([^"]*)"|(\S+)/g;

    >
    > i don't know if that would work because of greedy matching. you may
    > need a ? after your asterisk, to make it stingy matching.


    If we're sure that the OP's input lines contain simple
    double quoted strings that do not themselves contain double
    quotes (and this is what his example illustrated) then a
    greedy [^"]* will swallow everything up to the next double
    quote just as we require. Obviously, if the closing quote was
    missing, it wouldn't capture the correct thing. (I think it
    would backtrack and treat the opening quote as part of a
    space delimited word instead). The OP could check there are
    an even number of double quotes beforehand by saying:

    die "Bad input line: $string\n" if $string =~ tr/"// % 2;

    If the input lines were similar to CSV in allowing strings
    that themselves contain double quotes, doubled up like this:

    ItemA ItemB 1.1.1.1.1 "He said ""Hello"" to me" ItemD

    then a more complex regex would be required. If this is what the
    OP wants he can ask, but I don't believe it is. What he shouldn't
    do, though, is use Text::parseWords because, contrary to popular
    belief, it doesn't handle CSV style quotes.

    --
    James Taylor, London, UK PGP key: 3FBE1BF9
    To protect against spam, the address in the "From:" header is not valid.
    In any case, you should reply to the group so that everyone can benefit.
    If you must send me a private email, use james at oakseed demon co uk.
     
    James Taylor, Jul 29, 2005
    #7
  8. James Taylor Guest

    In article <dccplg$oqk$-Berlin.DE>,
    Anno Siegel <-berlin.de> wrote:
    >
    > James Taylor wrote:
    > >
    > > Even if you wanted to use a module, I note that the FAQ
    > > entry "How can I split a [character] delimited string except
    > > when inside [character]?" recommends the use of Text::CVS or
    > > Text::CVS_XS but I don't believe CVS is what's needed here. :)

    >
    > That must be a typo in the FAQ. s/CVS/CSV/g.


    Who's responsible for maintaining the FAQ?
    What's the correct procedure for nudging them?

    --
    James Taylor, London, UK PGP key: 3FBE1BF9
    To protect against spam, the address in the "From:" header is not valid.
    In any case, you should reply to the group so that everyone can benefit.
    If you must send me a private email, use james at oakseed demon co uk.
     
    James Taylor, Jul 29, 2005
    #8
  9. > James Taylor wrote:
    > If you don't know in advance which fields will be quoted,
    > you can use this regex instead:



    ....so based on that (you said fieldS), the greedy matching would have
    caused the regex to do something that was unintended.

    > James Taylor also wrote:
    > If this is what the
    > OP wants he can ask, but I don't believe it is.


    ....referring to nested quotes. you'r right, he didn't ask that. nor did
    i assume he did. the example that he gave suggests that the 4th field
    would always be the quoted field, so that's why i gave him the simple
    regex that i did.

    i was simply pointing out what i thought was an oversight in your
    regex, because my interpretation was that you thought the OP may have
    to deal with multiple quoted fields, and if that were the case, the
    default greedy matching would eat up all but the last quote.
     
    it_says_BALLS_on_your forehead, Jul 29, 2005
    #9
  10. Guest

    "it_says_BALLS_on_your forehead" <> wrote:
    > > James Taylor wrote:
    > > If you don't know in advance which fields will be quoted,
    > > you can use this regex instead:

    >
    > ...so based on that (you said fieldS), the greedy matching would have
    > caused the regex to do something that was unintended.


    Can you illustrate this alleged problem?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Jul 29, 2005
    #10
  11. woops, you're right. the [^"] deals with that, so it wouldn't be a
    problem. and nested quotes would be a problem regardless of whether the
    repetition specifier (?) was used. sorry about that...


    i was thinking something like:
    my $str = q{"one" "two" "three" "four" "five"};
    my @fields = $str =~ /(".*")/g;

    ....
    which would populate the whole string in the $fields[0];

    again, sorry about that James Taylor.
     
    it_says_BALLS_on_your forehead, Jul 29, 2005
    #11
  12. James Taylor Guest

    In article <>,
    <> wrote:
    >
    > sorry about that James Taylor.


    No problem Simon. :)

    --
    James Taylor, London, UK PGP key: 3FBE1BF9
    To protect against spam, the address in the "From:" header is not valid.
    In any case, you should reply to the group so that everyone can benefit.
    If you must send me a private email, use james at oakseed demon co uk.
     
    James Taylor, Jul 29, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Geometer
    Replies:
    34
    Views:
    5,805
    Richard Herring
    May 9, 2006
  2. Geometer
    Replies:
    33
    Views:
    2,378
    Richard Herring
    May 9, 2006
  3. dmitrey
    Replies:
    4
    Views:
    465
  4. Gunter Hansen
    Replies:
    5
    Views:
    944
    Roedy Green
    Sep 1, 2011
  5. Albert Schlef

    String#split and capturing delimiters

    Albert Schlef, Oct 30, 2009, in forum: Ruby
    Replies:
    3
    Views:
    193
    Rajinder Yadav
    Oct 30, 2009
Loading...

Share This Page