I'm writing a cgi script that uploads and processes files. These
files can come from Windows, Linux, or Macintosh machines, so the
script doesn't know ahead of time the end-of-line conventions used
by the uploaded files. What is the best way for a script to
determine the end-of-line sequence used by a given text file?
It is probably best to correct it to what you want regardless of what it
is. Following is a script I wrote awhile ago that converts any text file
type to what you want based on the script name, but could be easily
modified (shortened) to convert to a single type:
#!/usr/bin/perl -w
# txtconv - convert text to or from other OS
# Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
# follow with list of files on commandline
# "tomac" may conflict with a program name
while(@ARGV) {
my $file = shift @ARGV;
unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
flock(FILE,2); seek(FILE,0,0); binmode FILE;
print "$file before ", -s $file;
my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
else { die "file not converted, read $0\n" }
foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
close FILE;
print " after ", -s $file,"\n";
}