Parsing CSV and "  "

Discussion in 'Perl Misc' started by hotkitty, Oct 9, 2008.

  1. hotkitty

    hotkitty Guest

    I'm trying to parse the following csv file in a linux environment:

    "this is row1 column0","this is row1 column1","this is row1
    column2","this is row1 column3","this
    is row1 column4"
    "this is row2 column0","this is row2 column1","this is row2
    column2","this is row2 column3","this is
    row2 column4"

    Pretty standard CSV but with the last column running onto the next
    line it gets screwed up somehow as my script doesn't recognize when a
    new row starts. I tried substituting the carriage return but still no
    luck. When I open up the file on my windows box w/ notepad I get the
    following (notice the "   " that is added to the end of the
    line):
    "this is row1 column0","this is row1 column1","this is row1
    column2","this is row1 column3","this
    is row1 column4""   "
    "this is row2 column0","this is row2 column1","this is row2
    column2","this is row2 column3","this is
    row2 column4""   "

    Maybe I'm just overlooking some simple solution but how do I deal w/
    the "   " as Linux doesn't recognize it?

    Thanks in advance. My code is as follows:

    my $csv= Text::CSV->new(); ####### I've also tried setting binary to 1
    but still didn't work
    open (CSV, "<", $thisfile) or die $!;
    while (<CSV>) {
    if ($csv->parse($_)) {
    my @csvcolumns= $csv->fields();
    my $newstuff = "$csvcolumns[1]";
    open(OUT, ">>$thatfile");
    print OUT "$newstuff\n";
    close(OUT);
    }
     
    hotkitty, Oct 9, 2008
    #1
    1. Advertising

  2. hotkitty

    Ben Morrow Guest

    Quoth hotkitty <>:
    > I'm trying to parse the following csv file in a linux environment:
    >
    > "this is row1 column0","this is row1 column1","this is row1
    > column2","this is row1 column3","this
    > is row1 column4"
    > "this is row2 column0","this is row2 column1","this is row2
    > column2","this is row2 column3","this is
    > row2 column4"
    >
    > Pretty standard CSV but with the last column running onto the next
    > line it gets screwed up somehow as my script doesn't recognize when a
    > new row starts.


    How do you know when a new record starts? Can you guarantee that a line
    beginning with " is always the start of a new record? If so, then
    running the file through something like

    perl -lne'if (/^"/) { print $line; $line = "" } $line .= $_;
    END { print $line }'

    may help. If any of your fields could end with a space (so the
    terminating " might wrap onto the next line), or could end up not being
    quoted, you might have a problem.

    > I tried substituting the carriage return but still no
    > luck. When I open up the file on my windows box w/ notepad I get the
    > following (notice the " &nbsp;&nbsp;" that is added to the end of the
    > line):
    > "this is row1 column0","this is row1 column1","this is row1
    > column2","this is row1 column3","this
    > is row1 column4"" &nbsp;&nbsp;"
    > "this is row2 column0","this is row2 column1","this is row2
    > column2","this is row2 column3","this is
    > row2 column4"" &nbsp;&nbsp;"
    >
    > Maybe I'm just overlooking some simple solution but how do I deal w/
    > the " &nbsp;&nbsp;" as Linux doesn't recognize it?


    Where did it come from? How did you transfer the file Linux to Windows:
    did you somehow use a web browser or a stupid mail client or something
    else that has messed up the file? If " &nbsp;&nbsp;" is never part of
    valid data then removing it is as simple as adding

    s/" &nbsp;&nbsp;"//;

    to the start of the above.

    Ben

    --
    Raise your hand if you're invulnerable.
    []
     
    Ben Morrow, Oct 9, 2008
    #2
    1. Advertising

  3. hotkitty wrote:
    > I'm trying to parse the following csv file in a linux environment:
    >
    > "this is row1 column0","this is row1 column1","this is row1
    > column2","this is row1 column3","this
    > is row1 column4"
    > "this is row2 column0","this is row2 column1","this is row2
    > column2","this is row2 column3","this is
    > row2 column4"
    >
    > Pretty standard CSV but with the last column running onto the next
    > line it gets screwed up somehow as my script doesn't recognize when a
    > new row starts.


    A pragmatic approach:
    Collect lines until you have an even number of quotes:
    <untested>
    my $line;
    while (1) {
    $line .= <$src>;
    chomp $line;
    last if ($line =~ tr/"//) %2 == 0;
    }

    --
    These are my personal views and not those of Fujitsu Siemens Computers!
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://www.fujitsu-siemens.com/imprint.html
     
    Josef Moellers, Oct 9, 2008
    #3
  4. hotkitty

    Guest

    On Wed, 8 Oct 2008 16:58:03 -0700 (PDT), hotkitty <> wrote:

    >I'm trying to parse the following csv file in a linux environment:
    >
    >"this is row1 column0","this is row1 column1","this is row1
    >column2","this is row1 column3","this
    >is row1 column4"
    >"this is row2 column0","this is row2 column1","this is row2
    >column2","this is row2 column3","this is
    >row2 column4"
    >
    >Pretty standard CSV but with the last column running onto the next
    >line it gets screwed up somehow as my script doesn't recognize when a
    >new row starts. I tried substituting the carriage return but still no
    >luck. When I open up the file on my windows box w/ notepad I get the
    >following (notice the " &nbsp;&nbsp;" that is added to the end of the
    >line):
    >"this is row1 column0","this is row1 column1","this is row1
    >column2","this is row1 column3","this
    >is row1 column4"" &nbsp;&nbsp;"
    >"this is row2 column0","this is row2 column1","this is row2
    >column2","this is row2 column3","this is
    >row2 column4"" &nbsp;&nbsp;"
    >
    >Maybe I'm just overlooking some simple solution but how do I deal w/
    >the " &nbsp;&nbsp;" as Linux doesn't recognize it?
    >
    >Thanks in advance. My code is as follows:
    >
    >my $csv= Text::CSV->new(); ####### I've also tried setting binary to 1
    >but still didn't work
    >open (CSV, "<", $thisfile) or die $!;
    > while (<CSV>) {
    > if ($csv->parse($_)) {
    >my @csvcolumns= $csv->fields();
    >my $newstuff = "$csvcolumns[1]";
    >open(OUT, ">>$thatfile");
    >print OUT "$newstuff\n";
    >close(OUT);
    >}


    This is one way. I used the criteria buffer a row until /"$/, eol is found,
    and just process remaining buffer if EOF. Otherwise, there is no delineation of rows..

    sln

    #############
    # Csv1 Regex
    #############

    use strict;
    use warnings;

    my $fname = 'c:\temp\junk.csv';
    open CSV, $fname or die "can't open $fname...";

    my ($row, $tmp) = ('','');
    my ($parsing, $count) = (1,1);

    while ($parsing)
    {
    if (!($_ = <CSV>))
    {
    $parsing = 0;
    } else {
    $tmp = $_;
    $tmp =~ s/\s+$//s;
    $row .= " $tmp" if (length($tmp));
    ## buffer until '"$' or eof
    next if ($tmp !~ /"$/);
    }
    print " (".$count++.") ----------\n";
    while ($row =~ /\s*"+\s*([^"]*?)\s*"+\s*|\s*([^,\n]+)\s*/g)
    {
    my $val = defined $1 ? $1 : $2;
    print "val = $val\n";
    # ... push @ary, $val;
    }
    $row = $tmp = '';
    }

    close CSV;

    __END__

    output:

    (1) ----------
    val = this is row1 column0
    val = this is row1 column1
    val = this is row1 column2
    val = this is row1 column3
    val = this is row1 column4
    (2) ----------
    val = this is row2 column0
    val = this is row2 column1
    val = this is row2 column2
    val = this is row2 column3
    val = this is row2 column4
    (3) ----------
    val = this is row3 column0
    val = this is row3 column1


    junk.csv
    ----------
    "this is row1 column0","this is row1 column1","this is row1
    column2","this is row1 column3","this
    is row1 column4"
    "this is row2 column0","this is row2 column1","this is row2
    column2","this is row2 column3","this is
    row2 column4"

    "this is row3 column0","this is row3 column1",
     
    , Oct 9, 2008
    #4
  5. hotkitty

    Guest

    On Thu, 09 Oct 2008 18:52:50 GMT, wrote:

    >On Wed, 8 Oct 2008 16:58:03 -0700 (PDT), hotkitty <> wrote:
    >
    >>I'm trying to parse the following csv file in a linux environment:
    >>
    >>"this is row1 column0","this is row1 column1","this is row1
    >>column2","this is row1 column3","this
    >>is row1 column4"
    >>"this is row2 column0","this is row2 column1","this is row2
    >>column2","this is row2 column3","this is
    >>row2 column4"
    >>
    >>Pretty standard CSV but with the last column running onto the next
    >>line it gets screwed up somehow as my script doesn't recognize when a
    >>new row starts. I tried substituting the carriage return but still no
    >>luck. When I open up the file on my windows box w/ notepad I get the
    >>following (notice the " &nbsp;&nbsp;" that is added to the end of the
    >>line):
    >>"this is row1 column0","this is row1 column1","this is row1
    >>column2","this is row1 column3","this
    >>is row1 column4"" &nbsp;&nbsp;"
    >>"this is row2 column0","this is row2 column1","this is row2
    >>column2","this is row2 column3","this is
    >>row2 column4"" &nbsp;&nbsp;"
    >>
    >>Maybe I'm just overlooking some simple solution but how do I deal w/
    >>the " &nbsp;&nbsp;" as Linux doesn't recognize it?
    >>
    >>Thanks in advance. My code is as follows:
    >>
    >>my $csv= Text::CSV->new(); ####### I've also tried setting binary to 1
    >>but still didn't work
    >>open (CSV, "<", $thisfile) or die $!;
    >> while (<CSV>) {
    >> if ($csv->parse($_)) {
    >>my @csvcolumns= $csv->fields();
    >>my $newstuff = "$csvcolumns[1]";
    >>open(OUT, ">>$thatfile");
    >>print OUT "$newstuff\n";
    >>close(OUT);
    >>}

    >
    >This is one way. I used the criteria buffer a row until /"$/, eol is found,
    >and just process remaining buffer if EOF. Otherwise, there is no delineation of rows..
    >
    >sln
    >
    >#############
    ># Csv1 Regex
    >#############
    >
    >use strict;
    >use warnings;
    >
    >my $fname = 'c:\temp\junk.csv';
    >open CSV, $fname or die "can't open $fname...";
    >
    >my ($row, $tmp) = ('','');
    >my ($parsing, $count) = (1,1);
    >
    >while ($parsing)
    >{
    > if (!($_ = <CSV>))
    > {
    > $parsing = 0;
    > } else {
    > $tmp = $_;
    > $tmp =~ s/\s+$//s;
    > $row .= " $tmp" if (length($tmp));
    > ## buffer until '"$' or eof
    > next if ($tmp !~ /"$/);
    > }
    > print " (".$count++.") ----------\n";
    > while ($row =~ /\s*"+\s*([^"]*?)\s*"+\s*|\s*([^,\n]+)\s*/g)
    > {
    > my $val = defined $1 ? $1 : $2;
    > print "val = $val\n";
    > # ... push @ary, $val;
    > }
    > $row = $tmp = '';
    >}
    >
    >close CSV;
    >
    >__END__
    >
    >output:
    >
    > (1) ----------
    >val = this is row1 column0
    >val = this is row1 column1
    >val = this is row1 column2
    >val = this is row1 column3
    >val = this is row1 column4
    > (2) ----------
    >val = this is row2 column0
    >val = this is row2 column1
    >val = this is row2 column2
    >val = this is row2 column3
    >val = this is row2 column4
    > (3) ----------
    >val = this is row3 column0
    >val = this is row3 column1
    >
    >
    >junk.csv
    >----------
    >"this is row1 column0","this is row1 column1","this is row1
    >column2","this is row1 column3","this
    >is row1 column4"
    >"this is row2 column0","this is row2 column1","this is row2
    >column2","this is row2 column3","this is
    >row2 column4"
    >
    >"this is row3 column0","this is row3 column1",
    >
    >


    Btw, Excel will not do this correctly. Either you have to generate a
    proper csv file (with EOR definition), or do this kind of a fix-up using your own criteria.
     
    , Oct 9, 2008
    #5
  6. hotkitty

    hotkitty Guest

    On Oct 9, 3:11 pm, wrote:
    > On Thu, 09 Oct 2008 18:52:50 GMT, wrote:
    > >On Wed, 8 Oct 2008 16:58:03 -0700 (PDT), hotkitty <> wrote:

    >
    > >>I'm trying to parse the following csv file in a linux environment:

    >
    > >>"this is row1 column0","this is row1 column1","this is row1
    > >>column2","this is row1 column3","this
    > >>is row1 column4"
    > >>"this is row2 column0","this is row2 column1","this is row2
    > >>column2","this is row2 column3","this is
    > >>row2 column4"

    >
    > >>Pretty standard CSV but with the last column running onto the next
    > >>line it gets screwed up somehow as my script doesn't recognize when a
    > >>new row starts. I tried substituting the carriage return but still no
    > >>luck. When I open up the file on my windows box w/ notepad I get the
    > >>following (notice the " &nbsp;&nbsp;" that is added to the end of the
    > >>line):
    > >>"this is row1 column0","this is row1 column1","this is row1
    > >>column2","this is row1 column3","this
    > >>is row1 column4"" &nbsp;&nbsp;"
    > >>"this is row2 column0","this is row2 column1","this is row2
    > >>column2","this is row2 column3","this is
    > >>row2 column4"" &nbsp;&nbsp;"

    >
    > >>Maybe I'm just overlooking some simple solution but how do I deal w/
    > >>the " &nbsp;&nbsp;" as Linux doesn't recognize it?

    >
    > >>Thanks in advance. My code is as follows:

    >
    > >>my $csv= Text::CSV->new(); ####### I've also tried setting binary to 1
    > >>but still didn't work
    > >>open (CSV, "<", $thisfile) or die $!;
    > >> while (<CSV>) {
    > >>        if ($csv->parse($_)) {
    > >>my @csvcolumns= $csv->fields();
    > >>my $newstuff = "$csvcolumns[1]";
    > >>open(OUT, ">>$thatfile");
    > >>print OUT "$newstuff\n";
    > >>close(OUT);
    > >>}

    >
    > >This is one way. I used the criteria buffer a row until /"$/, eol is found,
    > >and just process remaining buffer if EOF. Otherwise, there is no delineation of rows..

    >
    > >sln

    >
    > >#############
    > ># Csv1 Regex
    > >#############

    >
    > >use strict;
    > >use warnings;

    >
    > >my $fname = 'c:\temp\junk.csv';
    > >open CSV, $fname or die "can't open $fname...";

    >
    > >my ($row, $tmp) = ('','');
    > >my ($parsing, $count) = (1,1);

    >
    > >while ($parsing)
    > >{
    > >    if (!($_ = <CSV>))
    > >    {
    > >            $parsing = 0;
    > >    } else {
    > >            $tmp = $_;
    > >            $tmp =~ s/\s+$//s;
    > >            $row .= " $tmp" if (length($tmp));
    > >            ## buffer until '"$' or eof
    > >            next if ($tmp !~ /"$/);
    > >    }
    > >    print " (".$count++.") ----------\n";
    > >    while ($row =~ /\s*"+\s*([^"]*?)\s*"+\s*|\s*([^,\n]+)\s*/g)
    > >    {
    > >            my $val = defined $1 ? $1 : $2;
    > >            print "val = $val\n";
    > >            # ... push @ary, $val;
    > >    }
    > >    $row = $tmp = '';
    > >}

    >
    > >close CSV;

    >
    > >__END__

    >
    > >output:

    >
    > > (1) ----------
    > >val = this is row1 column0
    > >val = this is row1 column1
    > >val = this is row1 column2
    > >val = this is row1 column3
    > >val = this is row1 column4
    > > (2) ----------
    > >val = this is row2 column0
    > >val = this is row2 column1
    > >val = this is row2 column2
    > >val = this is row2 column3
    > >val = this is row2 column4
    > > (3) ----------
    > >val = this is row3 column0
    > >val = this is row3 column1

    >
    > >junk.csv
    > >----------
    > >"this is row1 column0","this is row1 column1","this is row1
    > >column2","this is row1 column3","this
    > >is row1 column4"
    > >"this is row2 column0","this is row2 column1","this is row2
    > >column2","this is row2 column3","this is
    > >row2 column4"

    >
    > >"this is row3 column0","this is row3 column1",

    >
    > Btw, Excel will not do this correctly. Either you have to generate a
    > proper csv file (with EOR definition), or do this kind of a fix-up using your own criteria.


    I apologize for the late reply but appreciate the quick responses you
    have given me. Perhaps I am doing something wrong w/ the above
    suggestions but I'll keep cracking away. Here is the actual .csv file
    I am trying to parse:
    http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0

    thx
     
    hotkitty, Oct 11, 2008
    #6
  7. hotkitty

    Guest

    On Fri, 10 Oct 2008 17:33:21 -0700 (PDT), hotkitty <> wrote:

    >On Oct 9, 3:11 pm, wrote:
    >> On Thu, 09 Oct 2008 18:52:50 GMT, wrote:
    >> >On Wed, 8 Oct 2008 16:58:03 -0700 (PDT), hotkitty <> wrote:

    >>
    >> >>I'm trying to parse the following csv file in a linux environment:

    >>
    >> >>"this is row1 column0","this is row1 column1","this is row1
    >> >>column2","this is row1 column3","this
    >> >>is row1 column4"
    >> >>"this is row2 column0","this is row2 column1","this is row2
    >> >>column2","this is row2 column3","this is
    >> >>row2 column4"

    >>
    >> >>Pretty standard CSV but with the last column running onto the next
    >> >>line it gets screwed up somehow as my script doesn't recognize when a
    >> >>new row starts. I tried substituting the carriage return but still no
    >> >>luck. When I open up the file on my windows box w/ notepad I get the
    >> >>following (notice the " &nbsp;&nbsp;" that is added to the end of the
    >> >>line):
    >> >>"this is row1 column0","this is row1 column1","this is row1
    >> >>column2","this is row1 column3","this
    >> >>is row1 column4"" &nbsp;&nbsp;"
    >> >>"this is row2 column0","this is row2 column1","this is row2
    >> >>column2","this is row2 column3","this is
    >> >>row2 column4"" &nbsp;&nbsp;"

    >>
    >> >>Maybe I'm just overlooking some simple solution but how do I deal w/
    >> >>the " &nbsp;&nbsp;" as Linux doesn't recognize it?

    >>
    >> >>Thanks in advance. My code is as follows:

    >>
    >> >>my $csv= Text::CSV->new(); ####### I've also tried setting binary to 1
    >> >>but still didn't work
    >> >>open (CSV, "<", $thisfile) or die $!;
    >> >> while (<CSV>) {
    >> >>        if ($csv->parse($_)) {
    >> >>my @csvcolumns= $csv->fields();
    >> >>my $newstuff = "$csvcolumns[1]";
    >> >>open(OUT, ">>$thatfile");
    >> >>print OUT "$newstuff\n";
    >> >>close(OUT);
    >> >>}

    >>
    >> >This is one way. I used the criteria buffer a row until /"$/, eol is found,
    >> >and just process remaining buffer if EOF. Otherwise, there is no delineation of rows..

    >>
    >> >sln

    >>
    >> >#############
    >> ># Csv1 Regex
    >> >#############

    >>
    >> >use strict;
    >> >use warnings;

    >>
    >> >my $fname = 'c:\temp\junk.csv';
    >> >open CSV, $fname or die "can't open $fname...";

    >>
    >> >my ($row, $tmp) = ('','');
    >> >my ($parsing, $count) = (1,1);

    >>
    >> >while ($parsing)
    >> >{
    >> >    if (!($_ = <CSV>))
    >> >    {
    >> >            $parsing = 0;
    >> >    } else {
    >> >            $tmp = $_;
    >> >            $tmp =~ s/\s+$//s;
    >> >            $row .= " $tmp" if (length($tmp));
    >> >            ## buffer until '"$' or eof
    >> >            next if ($tmp !~ /"$/);
    >> >    }
    >> >    print " (".$count++.") ----------\n";
    >> >    while ($row =~ /\s*"+\s*([^"]*?)\s*"+\s*|\s*([^,\n]+)\s*/g)
    >> >    {
    >> >            my $val = defined $1 ? $1 : $2;
    >> >            print "val = $val\n";
    >> >            # ... push @ary, $val;
    >> >    }
    >> >    $row = $tmp = '';
    >> >}

    >>
    >> >close CSV;

    >>
    >> >__END__

    >>
    >> >output:

    >>
    >> > (1) ----------
    >> >val = this is row1 column0
    >> >val = this is row1 column1
    >> >val = this is row1 column2
    >> >val = this is row1 column3
    >> >val = this is row1 column4
    >> > (2) ----------
    >> >val = this is row2 column0
    >> >val = this is row2 column1
    >> >val = this is row2 column2
    >> >val = this is row2 column3
    >> >val = this is row2 column4
    >> > (3) ----------
    >> >val = this is row3 column0
    >> >val = this is row3 column1

    >>
    >> >junk.csv
    >> >----------
    >> >"this is row1 column0","this is row1 column1","this is row1
    >> >column2","this is row1 column3","this
    >> >is row1 column4"
    >> >"this is row2 column0","this is row2 column1","this is row2
    >> >column2","this is row2 column3","this is
    >> >row2 column4"

    >>
    >> >"this is row3 column0","this is row3 column1",

    >>
    >> Btw, Excel will not do this correctly. Either you have to generate a
    >> proper csv file (with EOR definition), or do this kind of a fix-up using your own criteria.

    >
    >I apologize for the late reply but appreciate the quick responses you
    >have given me. Perhaps I am doing something wrong w/ the above
    >suggestions but I'll keep cracking away. Here is the actual .csv file
    >I am trying to parse:
    >http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0
    >
    >thx


    Actually, Josef Möllers posted a pragmatic method and it works (Occums Razor),
    ie: counting the number of quotes.
    And it makes sense because in reality, the end of record is the eol, but in this case
    there are multiple double quotes scattered over mutiple lines. The only intersection of
    these two principles is eol AND even # of double quotes.
    This file loaded up in Excel right away, parsed fine. Although all the junk was left in there.

    >A pragmatic approach:
    >Collect lines until you have an even number of quotes:
    ><untested>
    >my $line;
    >while (1) {
    > $line .= <$src>;
    > chomp $line;
    > last if ($line =~ tr/"//) %2 == 0;
    >}


    With a little extra effort I cleaned up the parsing and it works fine.

    Good Luck...
    sln


    #############
    # Csv2 Regex
    #############

    # http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0

    use strict;
    use warnings;

    my $fname = 'c:\temp\junkie.csv';
    open CSV, $fname or die "can't open $fname...";

    my ($row, $tmp) = ('','');
    my ($parsing, $count) = (1,1);

    while ($parsing)
    {
    ## Buffer until a full row
    ## -------------------------
    if (!($_ = <CSV>)) {
    $parsing = 0; # eof, parse what's left
    } else {
    ## -------------------------------
    $tmp = $_;
    $tmp =~ s/\s+$//s;
    next if (!length($tmp));
    $row .= " $tmp";
    next if (!($row =~ tr/"// %2 == 0)); # Even number of double quotes?
    } # Good to go, parse it ...

    print " (".$count++.") ----------\n";

    # parse the row
    # -------------------
    while ($row =~ /\s*"\s*([^"]*?)\s*"\s*,|\s*"\s*(.*?)\s*"\s*$/g)
    {
    my $val = $1;
    if (defined $2) {
    # do some cleanup
    # ----------------
    $val = $2;
    $val =~ s/""/"/g;
    $val =~ s/\.\.\. More\.\.\.//ig;
    $val =~ s/&nbsp;/ /ig;
    }
    print "val = $val\n";
    }
    $row = '';
    }

    close CSV;

    __END__

    Partial output:

    (1) ----------
    val = NASDAQ Securities as of 12/31/2008
    val =
    val =
    val =
    (2) ----------
    val = Name
    val = Symbol
    val = Security Type
    val = Shares Outstanding
    val = Market Value (millions)
    val = Description (as filed with the SEC)
    (3) ----------
    val = 012 Smile.Communications Ltd.
    val = SMLC
    val = Ordinary Shares
    val = 25,360,000
    val = $136.4
    val = Prior to our October 2007 public offering, we were a wholly-owned subsidiary of Internet Gold, a public company traded on the NASDAQ Global Market and the Tel Aviv Stock Exchange, whose shares
    are included in the TASE-100 Index. Internet Gold currently owns approximately 72.4% of our ordinary shares. In November 2004, Internet Gold became our sole shareholder after purchasing our ordinary
    shares from our prior shareholders. As part of its internal restructuring in 2006, Internet Gold transferred its communications and media operations into two operating subsidiaries. Internet Gold
    transferred to us its broadband and traditional voice services businesses, which we refer to in this annual report as the Communications Business.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f06%2f30%2f0001178913-08-001700.html#FIS_COMPANY_INFORMATION"
    (4) ----------
    val = 1-800 FLOWERS.COM, Inc.
    val = FLWS
    val = Common Stock
    val = 26,528,000
    val = $116.5
    val = For more than 30 years, 1-800-FLOWERS.COM, Inc. - "Your Florist of Choice(R)" - has been providing customers around the world with the freshest flowers and finest selection of plants,
    gift baskets, gourmet foods and confections, and plush stuffed animals perfect for every occasion. 1-800-FLOWERS.COM(R) offers the best of both worlds: exquisite, florist-designed arrangements
    individually created by some of the nation's top floral artists and hand-delivered the same day, and spectacular flowers delivered through its "Fresh From Our Growers(TM)" program. Customers can
    "call, click, or come in" to shop 1-800-FLOWERS.COM(R) 24 hours a day, 7 days a week at 1-800-356-9377 or www.1800flowers.com. Sales and Service Specialists are available 24/7, and fast and
    reliable delivery is offered same day, any day. As always, 100 percent satisfaction and freshness is guaranteed. The 1-800-FLOWERS.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2007%2f09%2f13%2f0001084869-07-000018.html#FIS_BUSINESS"
    (5) ----------
    val = 1st Constitution Bancorp (NJ)
    val = FCCY
    val = Common Stock
    val = 3,998,000
    val = $35.0
    val = 1st Constitution Bancorp (the “Company”) is a bank holding company registered under the Bank Holding Company Act of 1956, as amended. The Company was organized under the laws of the State of New
    Jersey in February 1999 for the purpose of acquiring all of the issued and outstanding stock of 1st Constitution Bank (the “Bank”) and thereby enabling the Bank to operate within a bank holding
    company structure. The Company became an active bank holding company on July 1, 1999. The Bank is a wholly-owned subsidiary of the Company. Other than its investment in the Bank, the Company currently
    conducts no other significant business activities. The main office of the Company and the Bank is located at 2650 Route 130 North, Cranbury, New Jersey 08512, and the telephone number is (609)
    655-4500. 1st Constitution Bank The Bank, a commercial bank formed under the laws of the State of New Jersey, engages in the business of commercial and retail banking.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f04%2f15%2f0001214659-08-000838.html#FIS_BUSINESS"
    (6) ----------
    val = 1st Pacific Bancorp (CA)
    val = FPBN
    val = Common Stock
    val = 4,970,000
    val = $25.3
    val = 1st Pacific Bancorp (the "Company", "we", "our", or "us") is a California corporation incorporated on August 4, 2006 and is registered with the Board of Governors of the Federal Reserve System
    as a bank holding company under the Bank Holding Company Act of 1956, as amended. 1st Pacific Bank of California (the "Bank") is a wholly-owned bank subsidiary of the Company and was incorporated in
    California on April 17, 2000. The Bank is a California corporation licensed to operate as a commercial bank under the California Banking Law by the California Department of Financial Institutions (the
    "DFI"). In accordance with the Federal Deposit Insurance Act, the Federal Deposit Insurance Corporation (the "FDIC") insures the deposits of the Bank. The Bank is a member of the Federal Reserve
    System. "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f03%2f31%2f0001047469-08-003795.html#FIS_BUSINESS"
    (7) ----------
    val = 1st Source Corporation
    val = SRCE
    val = Common Stock
    val = 24,110,000
    val = $530.2
    val = 1st Source Corporation, an Indiana corporation incorporated in 1971, is a bank holding company headquartered in South Bend, Indiana that provides, through our subsidiaries (collectively referred
    to as "1st Source"), a broad array of financial products and services. 1st Source Bank and First National Bank, Valparaiso (collectively referred to as the "Banks"), our banking subsidiaries, offer
    commercial and consumer banking services, trust and investment management services, and insurance to individual and business clients through most of our 83 banking center locations in 17 counties in
    Indiana and Michigan. 1st Source Bank's Specialty Finance Group, with 24 locations nationwide, offers specialized financing services for new and used private and cargo aircraft, automobiles and light
    trucks for leasing and rental agencies, medium and heavy duty trucks, construction equipment, and environmental equipment.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f02%2f22%2f0000034782-08-000022.html#FIS_BUSINESS"
    (8) ----------
    val = 21st Century Holding Company
    val = TCHC
    val = Common Stock
    val = 8,014,000
    val = $33.7
    val = 21st Century Holding Company (“21st Century,” “Company,” “we,” “us”) is an insurance holding company, which, through our subsidiaries and our contractual relationships with our independent
    agents and general agents, controls substantially all aspects of the insurance underwriting, distribution and claims process. We are authorized to underwrite homeowners’ property and casualty
    insurance, commercial general liability insurance, personal automobile insurance and commercial automobile insurance in various states with various lines of authority through our wholly owned
    subsidiaries, Federated National Insurance Company (“Federated National”) and American Vehicle Insurance Company (“American Vehicle”). The insurable events during 2007 and 2006 did not include any
    weather related catastrophic events such as the well publicized series of hurricanes that occurred in Florida during 2005 and 2004.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f03%2f17%2f0001144204-08-015873.html#FIS_BUSINESS"
    (9) ----------
    val = 3Com Corporation
    val = COMS
    val = Common Stock
    val = 405,283,000
    val = $911.9
    val = We provide secure, converged networking solutions on a global scale to organizations of all sizes. Our products and solutions enable customers to manage business-critical voice, video and data
    in a secure, scalable, reliable and efficient network environment. We deliver networking products and services for enterprises that view their networks as mission critical, and value cost-effective
    superior performance. Our products form integrated solutions and function in multi-vendor environments based upon open, not proprietary, platforms. Our products are sold on a worldwide basis through a
    combination of value added partners and direct sales representatives. We deliver products and solutions that support the increasingly complex and demanding application environments in today’s
    businesses. We aspire to be one of the leading enterprise networking companies by delivering innovative, secure, feature-rich products and solutions built on open platform technology.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2007%2f07%2f31%2f0000950135-07-004539.html#FIS_BUSINESS"
    (10) ----------
    val = 3D Systems Corporation
    val = TDSC
    val = Common Stock
    val = 22,365,000
    val = $188.5
    val = 3D Systems Corporation (“3D Systems” or the “Company”) is a holding company that operates through subsidiaries in the United States, Europe and the Asia-Pacific region. We design, develop,
    manufacture, market and service a suite of additive manufacturing solutions including 3-D modeling, rapid prototyping and manufacturing systems and related products and materials that enable complex
    three-dimensional objects to be produced directly from computer data. Our customers use our proprietary systems to produce physical objects from digital data using commonly available computer-aided
    design software, often referred to as CAD software, or other digital-media devices such as engineering scanners and MRI or CT medical scanners.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f03%2f17%2f0000950144-08-002028.html#FIS_BUSINESS"
    (11) ----------
    val = 3SBio Inc.
    val = SSRX
    val = American Depositary Shares
    val = 21,797,000
    val = $126.4
    val = We commenced business operations in 1993 through Shenyang Sunshine Pharmaceutical Co., Ltd., or Shenyang Sunshine, a limited liability company established in China. Prior to our initial public
    offering in February 2007, we established a holding company structure through the following series of corporate reorganization transactions: • we formed Collected Mind Limited, a British Virgin
    Islands company, in July 2006; • Collected Mind Limited acquired 100% of the equity interests of Shenyang Sunshine, which was reorganized as a wholly foreign owned enterprise,
    or WFOE, in July 2006; and • we incorporated 3SBio Inc., an exempted company in the Cayman Islands, which acquired 100% equity interest in Collected Mind in September 2006.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2007%2f06%2f29%2f0001193125-07-146810.html#FIS_COMPANY_INFORMATION"
    (12) ----------
    val = 51job, Inc.
    val = JOBS
    val = American Depositary Shares
    val = 28,260,000
    val = $241.3
    val = We commenced our business in 1998. Since our inception, we have conducted substantially all of our operations in China. In March 2000, our founders incorporated a new holding company, now called
    51job, Inc., as an exempted limited liability company in the Cayman Islands under the Cayman Islands Companies Law (2004 Revision). Subsequently, 51job, Inc. acquired 51net.com Inc., or 51net, a
    British Virgin Islands company, and other subsidiaries to become the holding company of our corporate group. We operate as a foreign investment enterprise in China through our wholly owned
    subsidiaries, 51net, which is the registered owner of some of our trademarks and our domain name, 51net Beijing and 51net HR, which are both Cayman Islands companies, as well as our affiliated Chinese
    entities, the primary ones being: • Shanghai Qianjin Advertising Co., Ltd. "http://secfilings.nasdaq.com/edgar_conv_html%2f2007%2f06%2f28%2f0001145549-07-001142.html#FIS_COMPANY_INFORMATION"
    (13) ----------
    val = 8x8 Inc
    val = EGHT
    val = Common Stock
    val = 62,175,000
    val = $42.3
    val = Statements contained in this annual report on Form 10-K, or Annual Report, regarding our expectations, beliefs, estimates, intentions or strategies are forward-looking statements within the
    meaning of Section 27A of the Securities Act and Section 21E of the Exchange Act. Any statements contained herein that are not statements of historical fact may be deemed to be forward-looking
    statements. For example, words such as "may," "will," "should," "estimates," "predicts," "potential," "continue," "strategy," "believes," "anticipates," "plans," "expects," "intends," and similar
    expressions are intended to identify forward-looking statements. You should not place undue reliance on these forward-looking statements. Actual results and trends may differ materially from
    historical results or those projected in any such forward-looking statements depending on a variety of factors.
    "http://secfilings.nasdaq.com/edgar_conv_html%2f2007%2f06%2f29%2f0001023731-07-000014.html#FIS_BUSINESS"
    (14) ----------
    val = A-Power Energy Generation Systems, Ltd.
    val = APWR
    val = Common Stock
    val = 32,707,000
    val = $136.4
    val = A-Power A-Power Energy Generated Systems, Ltd. (formerly known as China Energy Technology Limited) was incorporated under the laws of the British Virgin Islands on May 14, 2007. Until January
    18, 2008, A-Power was a wholly-owned subsidiary of Chardan South China Acquisition Corporation. Chardan Chardan South China Acquisition Corporation was a blank check corporation organized under the
    laws of the State of Delaware on March 10, 2005. Chardan was originally incorporated as “Chardan China Acquisition Corp. III,” but changed its name to Chardan South China Acquisition Corporation on
    July 14, 2005. Chardan was formed to effect a business combination with an unidentified operating business that had its primary operating facilities located in the PRC in any city or province south of
    the Yangtze River. "http://secfilings.nasdaq.com/edgar_conv_html%2f2008%2f07%2f11%2f0001144204-08-039652.html#FIS_COMPANY_INFORMATION"
    (15) ----------
     
    , Oct 11, 2008
    #7
  8. hotkitty

    Guest

    On Sat, 11 Oct 2008 21:47:27 GMT, wrote:

    [snip]

    Small change's ..

    - For performance, the transliteration was changed to count $tmp string.
    - Added the span modifier on the regex loop.
    Thus the option below to keep newlines, and have the original formatting intact,
    ie: bullet point location's etc...
    Just (un)comment the block that is needed. Try it both ways.


    #############
    # Csv3 Regex
    #############

    # http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0

    use strict;
    use warnings;

    my $fname = 'c:\temp\symbols.csv';
    open CSV, $fname or die "can't open $fname...";

    my ($row, $tmp) = ('','');
    my ($parsing, $records, $quotes) = (1,1,0);

    while ($parsing)
    {
    ## Buffer until a full row
    ## -------------------------
    if (!($_ = <CSV>)) {
    $parsing = 0; # eof, parse what's left
    } else {
    $tmp = $_;

    ## this block will trim newlines ---
    $tmp =~ s/\s+$//s;
    next if (!length($tmp));
    $row .= " $tmp";
    ## ---

    ## this block will keep newlines ---
    # $row .= $tmp;
    ## ---

    $quotes += $tmp =~ tr/"//;
    next if (!($quotes % 2 == 0)); # Even number of double quotes?
    } # Good to go, parse it ...

    print " (".$records++.") ----------\n";

    ## Parse the row
    ## -------------------
    while ($row =~ /\s*"\s*([^"]*?)\s*"\s*,|\s*"\s*(.*?)\s*"\s*$/gs) # span lines
    {
    my $val = $1;
    if (defined $2) {
    # cleanup the description field
    # ------------------------------
    $val = $2;
    $val =~ s/""/"/g;
    $val =~ s/\.\.\. More\.\.\.//ig;
    $val =~ s/&nbsp;/ /ig;
    }
    print "val = $val\n";
    }
    $row = '';
    $quotes = 0;
    }
    close CSV;

    __END__
     
    , Oct 13, 2008
    #8
  9. hotkitty

    hotkitty Guest

    On Oct 13, 12:28 pm, wrote:
    > On Sat, 11 Oct 2008 21:47:27 GMT, wrote:
    >
    > [snip]
    >
    > Small change's ..
    >
    > - For performance, the transliteration was changed to count $tmp string.
    > - Added the span modifier on the regex loop.
    >   Thus the option below to keep newlines, and have the original formatting intact,
    >   ie: bullet point location's etc...
    >   Just (un)comment the block that is needed. Try it both ways.
    >
    > #############
    > # Csv3 Regex
    > #############
    >
    > #http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0
    >
    > use strict;
    > use warnings;
    >
    > my $fname = 'c:\temp\symbols.csv';
    > open CSV, $fname or die "can't open $fname...";
    >
    > my ($row, $tmp) = ('','');
    > my ($parsing, $records, $quotes) = (1,1,0);
    >
    > while ($parsing)
    > {
    >         ## Buffer until a full row
    >         ## -------------------------
    >         if (!($_ = <CSV>)) {
    >                 $parsing = 0; # eof, parse what's left
    >         } else {
    >                 $tmp = $_;
    >
    >                 ## this block will trim newlines ---
    >                   $tmp =~ s/\s+$//s;
    >                   next if (!length($tmp));
    >                   $row .= " $tmp";
    >                 ## ---
    >
    >                 ## this block will keep newlines ---
    >                   # $row .= $tmp;
    >                 ## ---
    >
    >                 $quotes += $tmp =~ tr/"//;
    >                 next if (!($quotes % 2 == 0));  # Even number of double quotes?
    >         }                                      # Good to go, parse it ...
    >
    >         print " (".$records++.") ----------\n";
    >
    >         ## Parse the row
    >         ## -------------------
    >         while ($row =~ /\s*"\s*([^"]*?)\s*"\s*,|\s*"\s*(.*?)\s*"\s*$/gs)   # span lines
    >         {
    >                 my $val = $1;
    >                 if (defined $2) {
    >                         # cleanup the descriptionfield
    >                         # ------------------------------
    >                         $val = $2;
    >                         $val =~ s/""/"/g;
    >                         $val =~ s/\.\.\. More\.\.\.//ig;
    >                         $val =~ s/&nbsp;/ /ig;
    >                 }
    >                 print "val = $val\n";
    >         }
    >         $row = '';
    >         $quotes = 0;}
    >
    > close CSV;
    >
    > __END__


    This works great! Now, I realize that my next question should be
    categorized in the beginner's group but for whatever reason I will
    post here:
    How would I just print out every 4th occurrence of $val (i.e. the
    Market Value column)?
     
    hotkitty, Oct 19, 2008
    #9
  10. hotkitty

    Guest

    On Sun, 19 Oct 2008 08:42:04 -0700 (PDT), hotkitty <> wrote:

    >On Oct 13, 12:28 pm, wrote:
    >> On Sat, 11 Oct 2008 21:47:27 GMT, wrote:
    >>
    >> [snip]
    >>
    >> Small change's ..
    >>
    >> - For performance, the transliteration was changed to count $tmp string.
    >> - Added the span modifier on the regex loop.
    >>   Thus the option below to keep newlines, and have the original formatting intact,
    >>   ie: bullet point location's etc...
    >>   Just (un)comment the block that is needed. Try it both ways.
    >>
    >> #############
    >> # Csv3 Regex
    >> #############
    >>
    >> #http://www.nasdaq.com//asp/symbols.asp?exchange=Q&start=0
    >>
    >> use strict;
    >> use warnings;
    >>
    >> my $fname = 'c:\temp\symbols.csv';
    >> open CSV, $fname or die "can't open $fname...";
    >>
    >> my ($row, $tmp) = ('','');
    >> my ($parsing, $records, $quotes) = (1,1,0);

    my $MarketValueTotal = 0;
    >>
    >> while ($parsing)
    >> {
    >>         ## Buffer until a full row
    >>         ## -------------------------
    >>         if (!($_ = <CSV>)) {
    >>                 $parsing = 0; # eof, parse what's left
    >>         } else {
    >>                 $tmp = $_;
    >>
    >>                 ## this block will trim newlines ---
    >>                   $tmp =~ s/\s+$//s;
    >>                   next if (!length($tmp));
    >>                   $row .= " $tmp";
    >>                 ## ---
    >>
    >>                 ## this block will keep newlines ---
    >>                   # $row .= $tmp;
    >>                 ## ---
    >>
    >>                 $quotes += $tmp =~ tr/"//;
    >>                 next if (!($quotes % 2 == 0));  # Even number of double quotes?
    >>         }                                       # Good to go, parse it ...
    >>
    >>         print " (".$records++.") ----------\n";
    >>
    >>         ## Parse the row
    >>         ## -------------------


    my $field = 0;
    >>         while ($row =~ /\s*"\s*([^"]*?)\s*"\s*,|\s*"\s*(.*?)\s*"\s*$/gs)   # span lines
    >>         {
    >>                 my $val = $1;
    >>                 if (defined $2) {
    >>                         # cleanup the description field
    >>                         # ------------------------------
    >>                         $val = $2;
    >>                         $val =~ s/""/"/g;
    >>                         $val =~ s/\.\.\. More\.\.\.//ig;
    >>                         $val =~ s/&nbsp;/ /ig;
    >>                 }

    if ($field++ == 4)
    {
    if ($val =~ /^[\$,\.\d]+$/)
    {
    $val =~ s/[\$,]//g;
    $MarketValueTotal += $val;
    print "val = $val\n";
    }
    else { print STDERR "'$val' is not numeric, record = ".($records-1)."\n";}
    }


    >>   #              print "val = $val\n";
    >>         }
    >>         $row = '';
    >>         $quotes = 0;}
    >>
    >> close CSV;
    >>

    print STDERR "Market Value Total = $MarketValueTotal (in millions)\n";

    >> __END__

    >


    output:

    'Market Value (millions)' is not numeric, record = 2
    'N/A' is not numeric, record = 3122
    Market Value Total = 2650685.5 (in millions)

    >This works great! Now, I realize that my next question should be
    >categorized in the beginner's group but for whatever reason I will
    >post here:
    >How would I just print out every 4th occurrence of $val (i.e. the
    >Market Value column)?
    >



    Not bad, NASDAQ ~ 2.6 trillion dollars.

    sln
     
    , Oct 20, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GIMME
    Replies:
    2
    Views:
    927
    GIMME
    Feb 11, 2004
  2. Michal Mikolajczyk
    Replies:
    0
    Views:
    690
    Michal Mikolajczyk
    Feb 13, 2004
  3. Tintin92
    Replies:
    1
    Views:
    1,806
    Andrew Thompson
    Feb 14, 2007
  4. jliu66
    Replies:
    0
    Views:
    559
    jliu66
    Oct 19, 2007
  5. Sacha Rook

    csv read clean up and write out to csv

    Sacha Rook, Nov 2, 2012, in forum: Python
    Replies:
    2
    Views:
    244
    Hans Mulder
    Nov 2, 2012
Loading...

Share This Page