tab delimited file processing problem

Discussion in 'Perl Misc' started by Domenico Discepola, Jul 17, 2003.

  1. Hi all. I have constructed a script that uses Win32::OLE to save an Excel
    workbook as a tab-delimited text file (TSV file). This works fine. My next
    step is to perform formatting on each field per line in the TSV file while
    retaining the # of fields. The problem lies with "empty" cells in the 1st
    column of the Excel file.

    Example Excel file row:
    col A's value=<empty>
    col B's value = "1"
    col C's value = "2"
    <end of row>

    When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
    the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
    /^\t12$/ (using regex notation). In other words, I lose the existence of
    col A (which I need).

    I was thinking of the following solution:
    s/^\t/\s\t/, $my_line;
    but could there be a 'better' way to handle it?

    Any suggestions on how to best solve this problem would be appreciated.

    Thanks in advance.
    Domenico Discepola, Jul 17, 2003
    #1
    1. Advertising

  2. Domenico Discepola

    Greg Bacon Guest

    In article <>,
    Domenico Discepola <> wrote:

    : [...]
    : Example Excel file row:
    : col A's value=<empty>
    : col B's value = "1"
    : col C's value = "2"
    : <end of row>
    :
    : When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
    : the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
    : /^\t12$/ (using regex notation). In other words, I lose the existence of
    : col A (which I need).

    How have you lost column A? Consider the example below:

    C:\Temp>type try
    #! perl

    use warnings;
    use strict;

    use Data::Dumper;

    my $data = "\t12";
    my @fields = split /\t/, $data;

    print Dumper \@fields;

    C:\Temp>perl try
    $VAR1 = [
    '',
    '12'
    ];

    Are you sure there wasn't a TAB between the 1 and the 2? Even so,
    you're still happy; note that the first element of @fields is empty:

    #! perl

    use warnings;
    use strict;

    use Data::Dumper;

    my $data = "\t1\t2";
    my @fields = split /\t/, $data;

    print Dumper \@fields;

    C:\Temp>perl try
    $VAR1 = [
    '',
    '1',
    '2'
    ];

    How were you trying to extract the fields in your TSV file?

    Technical side note: what you're calling tab-delimited is really
    tab-separated. Using [TAB] to make things stand out, a tab-delimited
    record would look like

    [TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]

    Tab-*separated*, however, would look like

    field_1[TAB]field_2[TAB]...[TAB]field_n

    Hope this helps,
    Greg
    --
    It remains true today as it did in fascist Italy, socialist Germany, New
    Deal America, and socialist Russia: freedom has no greater opponents than
    those who despise and demonize commercial society.
    -- Lew Rockwell
    Greg Bacon, Jul 17, 2003
    #2
    1. Advertising

  3. "Greg Bacon" <> wrote in message
    news:...
    > In article <>,
    > Domenico Discepola <> wrote:
    >
    > : [...]
    > : Example Excel file row:
    > : col A's value=<empty>
    > : col B's value = "1"
    > : col C's value = "2"
    > : <end of row>
    > :
    > : When you use Win32::OLE to "tell" Excel to save this as a TSV file

    (using
    > : the SaveAs method), a hex-dump of the resultant TSV file reveals row1

    as:
    > : /^\t12$/ (using regex notation). In other words, I lose the existence

    of
    > : col A (which I need).
    >
    > How have you lost column A? Consider the example below:
    >
    > C:\Temp>type try
    > #! perl
    >
    > use warnings;
    > use strict;
    >
    > use Data::Dumper;
    >
    > my $data = "\t12";
    > my @fields = split /\t/, $data;
    >
    > print Dumper \@fields;
    >
    > C:\Temp>perl try
    > $VAR1 = [
    > '',
    > '12'
    > ];
    >
    > Are you sure there wasn't a TAB between the 1 and the 2? Even so,
    > you're still happy; note that the first element of @fields is empty:
    >
    > #! perl
    >
    > use warnings;
    > use strict;
    >
    > use Data::Dumper;
    >
    > my $data = "\t1\t2";
    > my @fields = split /\t/, $data;
    >
    > print Dumper \@fields;
    >
    > C:\Temp>perl try
    > $VAR1 = [
    > '',
    > '1',
    > '2'
    > ];
    >
    > How were you trying to extract the fields in your TSV file?
    >
    > Technical side note: what you're calling tab-delimited is really
    > tab-separated. Using [TAB] to make things stand out, a tab-delimited
    > record would look like
    >
    > [TAB]field_1[TAB]field_2[TAB]...[TAB]field_n[TAB]
    >
    > Tab-*separated*, however, would look like
    >
    > field_1[TAB]field_2[TAB]...[TAB]field_n
    >
    > Hope this helps,
    > Greg
    > --
    > It remains true today as it did in fascist Italy, socialist Germany, New
    > Deal America, and socialist Russia: freedom has no greater opponents than
    > those who despise and demonize commercial society.
    > -- Lew Rockwell


    1st off, you're right - my regex describing my output from Excel was
    incorrect - it should have read (as you suggested) /^\t1\t2$/
    2nd, I did mean tab separated (instead of tab delimited). 3rd, thanks for
    mentioning the Data::Dumper module, I hadn't heard of that - I'll check it
    out. Right off the bat it seems to help diagnosing my problem.
    Domenico Discepola, Jul 18, 2003
    #3
  4. Domenico Discepola

    Bart Lateur Guest

    Domenico Discepola wrote:

    >Example Excel file row:
    >col A's value=<empty>
    >col B's value = "1"
    >col C's value = "2"
    ><end of row>
    >
    >When you use Win32::OLE to "tell" Excel to save this as a TSV file (using
    >the SaveAs method), a hex-dump of the resultant TSV file reveals row1 as:
    >/^\t12$/ (using regex notation). In other words, I lose the existence of
    >col A (which I need).


    No you don't. The first column comes before the first tab.

    --
    Bart.
    Bart Lateur, Jul 18, 2003
    #4
  5. Domenico Discepola

    Bill Smith Guest

    "Domenico Discepola" <> wrote in message
    news:...
    > Hi all. I have constructed a script that uses Win32::OLE to save an

    Excel
    > workbook as a tab-delimited text file (TSV file). This works fine.

    My next
    > step is to perform formatting on each field per line in the TSV file

    while
    > retaining the # of fields. The problem lies with "empty" cells in the

    1st
    > column of the Excel file.


    I remember having similar problems in the distant past. I used native
    EXCEL commands to create the TSV file. Details of the file format
    depended on the version of EXCEL. This is probably not a problem to
    you, but beware!

    Bill
    Bill Smith, Jul 18, 2003
    #5
  6. "Bill Smith" <> wrote in message
    news:pxWRa.31700$...
    >
    > "Domenico Discepola" <> wrote in message
    > news:...
    > > Hi all. I have constructed a script that uses Win32::OLE to save an

    > Excel
    > > workbook as a tab-delimited text file (TSV file). This works fine.

    > My next
    > > step is to perform formatting on each field per line in the TSV file

    > while
    > > retaining the # of fields. The problem lies with "empty" cells in the

    > 1st
    > > column of the Excel file.

    >
    > I remember having similar problems in the distant past. I used native
    > EXCEL commands to create the TSV file. Details of the file format
    > depended on the version of EXCEL. This is probably not a problem to
    > you, but beware!
    >
    > Bill
    >
    >


    As an added note, if one particular row in Excel has fewer columns than
    columns in surrounding rows, Excel will not pad the remaining columns with
    tabs... In other words, you can end up with a TSV file having a different
    number of fields:

    Example Excel file:
    row1: x, y, z
    row2: a, b, c, d

    resultant TSV file (in regex format):
    row1: /^x\ty\tz$/
    row2: /^a\tb\tc\td$/

    So, as you see, if you are trying to manipulate a TSV file and expect the
    same number of fields per line, you must account for it programmatically as
    Excel does not do it for you.
    Domenico Discepola, Jul 18, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Niraj Agarwal
    Replies:
    3
    Views:
    3,513
    Niraj Agarwal
    Jan 13, 2004
  2. Replies:
    5
    Views:
    4,817
  3. RyanL
    Replies:
    6
    Views:
    671
    Paul McGuire
    Aug 28, 2007
  4. Replies:
    1
    Views:
    318
    ZedGama3
    Apr 14, 2004
  5. Srikant
    Replies:
    1
    Views:
    464
    Greg Bacon
    Sep 29, 2007
Loading...

Share This Page