Split and RegEx Help

Discussion in 'Perl Misc' started by Vaughn Sargent, Aug 31, 2004.

  1. Hi,

    I have some flat file data coming in from a client that is ~
    delimited. I have no control over the incoming data so I'm stuck with
    the ~. What I need to do is split the data into fields (which should
    be easy using split) however, some text fields may contain a ~ but
    it's not the delimiter, it's just part of the text field. All text
    fields are enclosed with double quotes. Number and date fields are
    not. So I may recieve data such as the following 5 field line:

    "Field - One"~234.00~"Field ~ 3"~20040830~"Field 5"

    What I want returned is:

    Field 1: "Field - One"
    Field 2: 234.00
    Field 3: "Field ~ 3"
    Field 4: 20040830
    Field 5: "Field 5"

    But using split perl returns:

    Field 1: "Field - One"
    Field 2: 234.00
    Field 3: "Field
    Field 4: 3"
    Field 5: 20040830
    Field 6: "Field 5"

    It would also be possible for any text field to contain more than one
    ~ as a non-delimiter which may or may not be next to each other.

    I guess what I would like to tell perl is "Hey, Perl, split this line
    for me using ~ as the delimiter but ~ isn't a delimiter if there are
    double quotes around it."

    I thought it might be possible to use split and tack on a regular
    expression. I'm a newbie when it comes to regular expressions.

    If anyone can help me out I'd be very greatful.

    Vaughn
     
    Vaughn Sargent, Aug 31, 2004
    #1
    1. Advertising

  2. Vaughn Sargent wrote:
    > I have some flat file data coming in from a client that is ~
    > delimited. I have no control over the incoming data so I'm stuck with
    > the ~. What I need to do is split the data into fields (which should
    > be easy using split) however, some text fields may contain a ~ but
    > it's not the delimiter, it's just part of the text field. All text
    > fields are enclosed with double quotes.


    You may want to have a look at Text::CSV.
    Although it uses a comma as the separator for the data fields it should be
    trivial to copy the source code and modify it to use the ~ instead.

    jue
     
    Jürgen Exner, Aug 31, 2004
    #2
    1. Advertising

  3. Vaughn Sargent

    Uri Guttman Guest

    >>>>> "JE" == Jürgen Exner <> writes:

    JE> Vaughn Sargent wrote:
    >> I have some flat file data coming in from a client that is ~
    >> delimited. I have no control over the incoming data so I'm stuck with
    >> the ~. What I need to do is split the data into fields (which should
    >> be easy using split) however, some text fields may contain a ~ but
    >> it's not the delimiter, it's just part of the text field. All text
    >> fields are enclosed with double quotes.


    JE> You may want to have a look at Text::CSV. Although it uses a
    JE> comma as the separator for the data fields it should be trivial to
    JE> copy the source code and modify it to use the ~ instead.

    without even looking, i wager it has an option to set the separator. it
    is too easy and such a commonly needed feature to believe it doesn't
    support that.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Aug 31, 2004
    #3
  4. Uri Guttman wrote:
    >>>>>> "JE" == Jürgen Exner <> writes:

    >
    >> Vaughn Sargent wrote:
    > >> I have some flat file data coming in from a client that is ~
    > >> delimited. I have no control over the incoming data so I'm stuck

    > with >> the ~. What I need to do is split the data into fields
    > (which should >> be easy using split) however, some text fields may
    > contain a ~ but >> it's not the delimiter, it's just part of the
    > text field. All text >> fields are enclosed with double quotes.
    >
    >> You may want to have a look at Text::CSV. Although it uses a
    >> comma as the separator for the data fields it should be trivial to
    >> copy the source code and modify it to use the ~ instead.

    >
    > without even looking, i wager it has an option to set the separator.
    > it is too easy and such a commonly needed feature to believe it
    > doesn't support that.


    For a second you really scared me because I didn't check, either.

    However according to the module doc on CPAN the standard Text::CSV does not
    support changing the separator character (I win).
    For that you need to use Text::CSV_XS (you win):

    new(\%attr)

    sep_char


    The char used for separating fields, by default a comme. (,)

    Now, what do we do with the prices?

    jue
     
    Jürgen Exner, Aug 31, 2004
    #4
  5. Vaughn Sargent

    Uri Guttman Guest

    >>>>> "JE" == Jürgen Exner <> writes:

    JE> Uri Guttman wrote:
    >>>>>>> "JE" == Jürgen Exner <> writes:

    >>
    >>> Vaughn Sargent wrote:
    >> >> I have some flat file data coming in from a client that is ~
    >> >> delimited. I have no control over the incoming data so I'm stuck

    >> with >> the ~. What I need to do is split the data into fields
    >> (which should >> be easy using split) however, some text fields may
    >> contain a ~ but >> it's not the delimiter, it's just part of the
    >> text field. All text >> fields are enclosed with double quotes.
    >>
    >>> You may want to have a look at Text::CSV. Although it uses a
    >>> comma as the separator for the data fields it should be trivial to
    >>> copy the source code and modify it to use the ~ instead.

    >>
    >> without even looking, i wager it has an option to set the separator.
    >> it is too easy and such a commonly needed feature to believe it
    >> doesn't support that.


    JE> For a second you really scared me because I didn't check, either.

    JE> However according to the module doc on CPAN the standard Text::CSV does not
    JE> support changing the separator character (I win).
    JE> For that you need to use Text::CSV_XS (you win):

    JE> new(\%attr)

    JE> sep_char


    JE> The char used for separating fields, by default a comme. (,)

    JE> Now, what do we do with the prices?

    well, i say it is a push (pun intended!).

    odd how the xs version which usually requires more work has the option.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Aug 31, 2004
    #5
  6. Vaughn Sargent

    Tore Aursand Guest

    On Mon, 30 Aug 2004 20:28:20 -0700, Vaughn Sargent wrote:
    > I have some flat file data coming in from a client that is ~ delimited.
    > I have no control over the incoming data so I'm stuck with the ~. What
    > I need to do is split the data into fields (which should be easy using
    > split) however, some text fields may contain a ~ but it's not the
    > delimiter, it's just part of the text field. All text fields are
    > enclosed with double quotes.


    You could try the Text::parseWords module, which I think comes with Perl
    these days;

    #!/usr/bin/perl
    #
    use strict;
    use warnings;
    use Data::Dumper;
    use Text::parseWords;

    while ( <DATA> ) {
    chomp;
    my @fields = quotewords( '~', 0, $_ );
    print Dumper( \@fields );
    }

    __DATA__
    "Field - One"~234.00~"Field ~ 3"~20040830~"Field 5"


    --
    Tore Aursand <>
    "Life is pleasant. Death is peaceful. It's the transition that's
    troublesome." (Isaac Asimov)
     
    Tore Aursand, Aug 31, 2004
    #6
  7. Uri Guttman <> wrote in message news:<>...
    > >>>>> "JE" == Jürgen Exner <> writes:

    >
    > JE> Uri Guttman wrote:
    > >>>>>>> "JE" == Jürgen Exner <> writes:

    >
    > Vaughn Sargent wrote:
    > >> >> I have some flat file data coming in from a client that is ~
    > >> >> delimited. I have no control over the incoming data so I'm stuck
    > >> with >> the ~. What I need to do is split the data into fields
    > >> (which should >> be easy using split) however, some text fields may
    > >> contain a ~ but >> it's not the delimiter, it's just part of the
    > >> text field. All text >> fields are enclosed with double quotes.
    > >>
    > >>> You may want to have a look at Text::CSV. Although it uses a
    > >>> comma as the separator for the data fields it should be trivial to
    > >>> copy the source code and modify it to use the ~ instead.
    > >>
    > >> without even looking, i wager it has an option to set the separator.
    > >> it is too easy and such a commonly needed feature to believe it
    > >> doesn't support that.

    >
    > JE> For a second you really scared me because I didn't check, either.
    >
    > JE> However according to the module doc on CPAN the standard Text::CSV does not
    > JE> support changing the separator character (I win).
    > JE> For that you need to use Text::CSV_XS (you win):
    >
    > JE> new(\%attr)
    >
    > JE> sep_char
    >
    >
    > JE> The char used for separating fields, by default a comme. (,)
    >
    > JE> Now, what do we do with the prices?
    >
    > well, i say it is a push (pun intended!).
    >
    > odd how the xs version which usually requires more work has the option.
    >
    > uri



    Thank you very much. I installed the Text::CSV_XS module and it works
    just as I needed it too. Now any text field that contains my
    delimiter of ~ is not seen as a delimiter as text fields are double
    quoted. Also, I have the option to change the delimiter to ~ instead
    of the default ,

    Thanks again!
    Vaughn
     
    Vaughn Sargent, Aug 31, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    469
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    701
    Alex Martelli
    Sep 17, 2004
  3. Replies:
    3
    Views:
    768
    Reedick, Andrew
    Jul 1, 2008
  4. trans.  (T. Onoma)

    split on '' (and another for split -1)

    trans. (T. Onoma), Dec 27, 2004, in forum: Ruby
    Replies:
    10
    Views:
    217
    Florian Gross
    Dec 28, 2004
  5. Robert Oschler
    Replies:
    2
    Views:
    115
    peterS.
    Aug 2, 2005
Loading...

Share This Page