How to determine end-of-line sequence?

J

J Krugman

I'm writing a cgi script that uploads and processes files. These
files can come from Windows, Linux, or Macintosh machines, so the
script doesn't know ahead of time the end-of-line conventions used
by the uploaded files. What is the best way for a script to
determine the end-of-line sequence used by a given text file?

TIA,

jill
 
D

David Efflandt

I'm writing a cgi script that uploads and processes files. These
files can come from Windows, Linux, or Macintosh machines, so the
script doesn't know ahead of time the end-of-line conventions used
by the uploaded files. What is the best way for a script to
determine the end-of-line sequence used by a given text file?

It is probably best to correct it to what you want regardless of what it
is. Following is a script I wrote awhile ago that converts any text file
type to what you want based on the script name, but could be easily
modified (shortened) to convert to a single type:

#!/usr/bin/perl -w
# txtconv - convert text to or from other OS
# Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
# follow with list of files on commandline
# "tomac" may conflict with a program name
while(@ARGV) {
my $file = shift @ARGV;
unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
flock(FILE,2); seek(FILE,0,0); binmode FILE;
print "$file before ", -s $file;
my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
else { die "file not converted, read $0\n" }
foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
close FILE;
print " after ", -s $file,"\n";
}
 
J

J Krugman

It is probably best to correct it to what you want regardless of what it
is. Following is a script I wrote awhile ago that converts any text file
type to what you want based on the script name, but could be easily
modified (shortened) to convert to a single type:
#!/usr/bin/perl -w
# txtconv - convert text to or from other OS
# Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
# follow with list of files on commandline
# "tomac" may conflict with a program name
while(@ARGV) {
my $file = shift @ARGV;
unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
flock(FILE,2); seek(FILE,0,0); binmode FILE;
print "$file before ", -s $file;
my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
else { die "file not converted, read $0\n" }
foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
close FILE;
print " after ", -s $file,"\n";
}

Cool. Thanks!

jill
 
A

Anno Siegel

[...]
is. Following is a script I wrote awhile ago that converts any text file
type to what you want based on the script name, but could be easily
modified (shortened) to convert to a single type:

#!/usr/bin/perl -w
# txtconv - convert text to or from other OS
# Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
# follow with list of files on commandline
# "tomac" may conflict with a program name
while(@ARGV) {
my $file = shift @ARGV;
unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
flock(FILE,2); seek(FILE,0,0); binmode FILE; ^^^^^^^^^^^^^
print "$file before ", -s $file;
my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
else { die "file not converted, read $0\n" }
foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
close FILE;
print " after ", -s $file,"\n";
}

Just out of interest -- why are you locking the file? I mean, EOL
conversion is not something that is normally done concurrently :)

Anno
 
D

David Efflandt

[...]
is. Following is a script I wrote awhile ago that converts any text file
type to what you want based on the script name, but could be easily
modified (shortened) to convert to a single type:

#!/usr/bin/perl -w
# txtconv - convert text to or from other OS
# Symlink or rename any of: tounix, 2unix, todos, 2dos, tomac, 2mac
# follow with list of files on commandline
# "tomac" may conflict with a program name
while(@ARGV) {
my $file = shift @ARGV;
unless (open(FILE,"+< $file")) {warn "Can't open $file: $!\n"; next;}
flock(FILE,2); seek(FILE,0,0); binmode FILE; ^^^^^^^^^^^^^
print "$file before ", -s $file;
my @lines = <FILE>; seek(FILE,0,0); truncate(FILE,0);
if (lc($0) =~ /2unix|tounix/) { $end = "\012" }
elsif (lc($0) =~ /2dos|todos/) { $end = "\015\012" }
elsif (lc($0) =~ /2mac|tomac/) { $end = "\015" }
else { die "file not converted, read $0\n" }
foreach (@lines) { s/(\015\012|[\015\012])/$end/g; print FILE $_; }
close FILE;
print " after ", -s $file,"\n";
}

Just out of interest -- why are you locking the file? I mean, EOL
conversion is not something that is normally done concurrently :)

Just force of habit from working with CGI, so if doing a list of files
hopefully something else will not try to modify it in the middle of
converting it. I had one situation where if two scripts just tried to
read the same file line by line at the same time, it would confuse the
file pointer or something and both would endlessly loop (maybe something
peculiar about SunOS awhile ago).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top