How to determine end-of-line sequence?

Discussion in 'Perl Misc' started by J Krugman, Dec 16, 2003.

  1. J Krugman

    J Krugman Guest

    I'm writing a cgi script that uploads and processes files. These
    files can come from Windows, Linux, or Macintosh machines, so the
    script doesn't know ahead of time the end-of-line conventions used
    by the uploaded files. What is the best way for a script to
    determine the end-of-line sequence used by a given text file?

    TIA,

    jill
    J Krugman, Dec 16, 2003
    #1
    1. Advertising

  2. On Tue, 16 Dec 2003, J Krugman <> wrote:
    >
    > I'm writing a cgi script that uploads and processes files. These
    > files can come from Windows, Linux, or Macintosh machines, so the
    > script doesn't know ahead of time the end-of-line conventions used
    > by the uploaded files. What is the best way for a script to
    > determine the end-of-line sequence used by a given text file?


    It is probably best to correct it to what you want regardless of what it
    is. Following is a script I wrote awhile ago that converts any text file
    type to what you want based on the script name, but could be easily
    modified (shortened) to convert to a single type:

    #!/usr/bin/perl -w
    # txtconv - convert text to or from other OS
    # Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
    # follow with list of files on commandline
    # "tomac" may conflict with a program name
    while(@ARGV) {
    my $file = shift @ARGV;
    unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
    flock(FILE,2); seek(FILE,0,0); binmode FILE;
    print "$file before ", -s $file;
    my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
    if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
    elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
    elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
    else { die "file not converted, read $0\n" }
    foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
    close FILE;
    print " after ", -s $file,"\n";
    }

    --
    David Efflandt - All spam ignored http://www.de-srv.com/
    David Efflandt, Dec 17, 2003
    #2
    1. Advertising

  3. J Krugman

    J Krugman Guest

    In <> (David Efflandt) writes:

    >On Tue, 16 Dec 2003, J Krugman <> wrote:
    >>
    >> I'm writing a cgi script that uploads and processes files. These
    >> files can come from Windows, Linux, or Macintosh machines, so the
    >> script doesn't know ahead of time the end-of-line conventions used
    >> by the uploaded files. What is the best way for a script to
    >> determine the end-of-line sequence used by a given text file?


    >It is probably best to correct it to what you want regardless of what it
    >is. Following is a script I wrote awhile ago that converts any text file
    >type to what you want based on the script name, but could be easily
    >modified (shortened) to convert to a single type:


    >#!/usr/bin/perl -w
    ># txtconv - convert text to or from other OS
    ># Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
    ># follow with list of files on commandline
    ># "tomac" may conflict with a program name
    >while(@ARGV) {
    > my $file = shift @ARGV;
    > unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
    > flock(FILE,2); seek(FILE,0,0); binmode FILE;
    > print "$file before ", -s $file;
    > my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
    > if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
    > elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
    > elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
    > else { die "file not converted, read $0\n" }
    > foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
    > close FILE;
    > print " after ", -s $file,"\n";
    >}


    Cool. Thanks!

    jill
    J Krugman, Dec 17, 2003
    #3
  4. J Krugman

    Anno Siegel Guest

    David Efflandt <> wrote in comp.lang.perl.misc:

    [...]

    > is. Following is a script I wrote awhile ago that converts any text file
    > type to what you want based on the script name, but could be easily
    > modified (shortened) to convert to a single type:
    >
    > #!/usr/bin/perl -w
    > # txtconv - convert text to or from other OS
    > # Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
    > # follow with list of files on commandline
    > # "tomac" may conflict with a program name
    > while(@ARGV) {
    > my $file = shift @ARGV;
    > unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
    > flock(FILE,2); seek(FILE,0,0); binmode FILE;

    ^^^^^^^^^^^^^
    > print "$file before ", -s $file;
    > my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
    > if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
    > elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
    > elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
    > else { die "file not converted, read $0\n" }
    > foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
    > close FILE;
    > print " after ", -s $file,"\n";
    > }


    Just out of interest -- why are you locking the file? I mean, EOL
    conversion is not something that is normally done concurrently :)

    Anno
    Anno Siegel, Dec 17, 2003
    #4
  5. On 17 Dec 2003, Anno Siegel <-berlin.de> wrote:
    > David Efflandt <> wrote in comp.lang.perl.misc:
    >
    > [...]
    >
    >> is. Following is a script I wrote awhile ago that converts any text file
    >> type to what you want based on the script name, but could be easily
    >> modified (shortened) to convert to a single type:
    >>
    >> #!/usr/bin/perl -w
    >> # txtconv - convert text to or from other OS
    >> # Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
    >> # follow with list of files on commandline
    >> # "tomac" may conflict with a program name
    >> while(@ARGV) {
    >> my $file = shift @ARGV;
    >> unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
    >> flock(FILE,2); seek(FILE,0,0); binmode FILE;

    > ^^^^^^^^^^^^^
    >> print "$file before ", -s $file;
    >> my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
    >> if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
    >> elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
    >> elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
    >> else { die "file not converted, read $0\n" }
    >> foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
    >> close FILE;
    >> print " after ", -s $file,"\n";
    >> }

    >
    > Just out of interest -- why are you locking the file? I mean, EOL
    > conversion is not something that is normally done concurrently :)


    Just force of habit from working with CGI, so if doing a list of files
    hopefully something else will not try to modify it in the middle of
    converting it. I had one situation where if two scripts just tried to
    read the same file line by line at the same time, it would confuse the
    file pointer or something and both would endlessly loop (maybe something
    peculiar about SunOS awhile ago).

    --
    David Efflandt - All spam ignored http://www.de-srv.com/
    David Efflandt, Dec 18, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sandra-24
    Replies:
    5
    Views:
    396
    Leif K-Brooks
    Apr 9, 2006
  2. stef mientki
    Replies:
    13
    Views:
    618
    stef mientki
    Oct 20, 2007
  3. Andreas Leitgeb
    Replies:
    0
    Views:
    438
    Andreas Leitgeb
    May 15, 2009
  4. Mark Space
    Replies:
    0
    Views:
    471
    Mark Space
    May 15, 2009
  5. Lew
    Replies:
    0
    Views:
    930
Loading...

Share This Page