Why does chomp leave newlines?

Mark Healey · May 8, 2004

First some fragments

I get the array thusly:

13 @searchTerms=split(/\n/,$queryHash{"searchText"});

I later print it:

40 sub printSearchTerms
41 {
42 foreach(@searchTerms)
43 {
44 chomp;
45 print ("$_ \n");
46 }
47 }

And yet I get:

point loma
 
mission hills
 
hillcrest
 
bankers hill
 
university heights 

What's up?

David Efflandt · May 8, 2004

First some fragments

I get the array thusly:

13 @searchTerms=split(/\n/,$queryHash{"searchText"});

I later print it:

40 sub printSearchTerms
41 {
42 foreach(@searchTerms)
43 {
44 chomp;
45 print ("$_ \n");
46 }
47 }

And yet I get:

point loma
 
mission hills
 
hillcrest
 
bankers hill
 
university heights 

What's up?

perldoc -f chomp

It removes the input record separator ("$/") based the OS it is running
on. If newlines in your data are CR-LF and default newlines in your OS
are not, then you may need to set $/ = "\015\012"; before using chomp for
that.

Mark Healey · May 8, 2004

perldoc -f chomp

It removes the input record separator ("$/") based the OS it is running
on. If newlines in your data are CR-LF and default newlines in your OS
are not, then you may need to set $/ = "\015\012"; before using chomp for
that.

Sinct this suppoded to be a CGI script and I don't know what os'es
are going to be making requests is there any way to set $/ to several
different possibilities such as CRLF, CR alone or LF alone?

I'd still like a function that removes all leading and trailing
whitespace. I suppose I could do it with regexps but that would be
kind of ugly.

Bob Walton · May 8, 2004

Mark said:
....
Sinct this suppoded to be a CGI script and I don't know what os'es
are going to be making requests is there any way to set $/ to several
different possibilities such as CRLF, CR alone or LF alone?

No. The value of $/ is a string, not a regexp. Unless you do something
like [untested]:

{local $/;$/="\n";chomp}
{local $/;$/="\r";chomp}
{local $/;$/="\r\n";chomp} #not needed?
#etc?

I'd still like a function that removes all leading and trailing
whitespace. I suppose I could do it with regexps but that would be
kind of ugly.

Why ugly? It should be simple [untested]:

sub trim{
my $s=shift;
$s=~/^\s*//;
$s=~/\s*$//;
return $s;
}

Dave Cross · May 8, 2004

All modern browsers submit \r\n when ENTER is pressed
while a cursor is inside a multiline text area box for a
cgi form action submission. This is independent of all
operating systems.

Of course, you can never be sure that your input is coming from a browser

Dave...

Alan J. Flavell · May 8, 2004

This is codified, e.g for HTML4.01, in the appropriate parts of
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.3

What they submit is a CR followed by an LF.

I don't see how a browser can be expected to submit something that's a
logical Perl concept (\r and/or \n) rather than real control
characters. See perlport, where it's explained that an appropriate
notation in Perl for the ASCII CR LF sequence would be \015\012.

And just to correct the sloppy wording: hitting Enter in a textarea
input control does not in itself submit anything. The newline(s)
would be part of the data when the form is finally submitted by other
means.

No. \r\n would be \012\015 on at least one operating system, and
something else again on an EBCDIC-based architecture. What would be
submitted by the client, and received by the server, would still be
\015\012.

Of course, you can never be sure that your input is coming from a browser

That should be irrelevant. The HTML specification covers the
interworking requirements for all kinds of client, not only browsers
/per se/.

But certainly it would seem wise to tolerate other newline
conventions, no matter what the specification might demand. Contrary
to the issue addressed a bit earlier in this thread, I don't see any
way to handle that solely by means of settings of $/ - it's necessary
to either do some kind of harmonisation separately, or to write code
which explicitly handles any of the plausible representations.

gnari · May 8, 2004

....

I'd still like a function that removes all leading and trailing
whitespace. I suppose I could do it with regexps but that would be
kind of ugly.

change your split to:
my ($tmp=$queryHash{"searchText"}) =~ /^ *(.*) *$/s;
@searchTerms=split(/ *[\r\n]+ */,$tmp);

and drop the chomp;

this will remove all leading and trailing spaces , including
the ones around the newlines

gnari

Dave Cross · May 8, 2004

This is related to modern browser behavior in what way?

It's not. I'm simply pointing out for the benefit of the original poster
that you should never assume that you know how the input to your CGI
program is generated.

Of course, you know this and you're just arguing for the sake of it.

Dave...

Joe Smith · May 9, 2004

Purl said:
Which is a \r\n submission. Car - automobile, horse - equine.

Are you aware that there are instances where "\r\n" is not the
same as "\015\012"? When dealing with data read from a
network connection, it is better to use "\015\012".

Anal.
Purl Gurl

You've got no argument from me there.
-Joe

Joe Smith · May 10, 2004

Purl said:
If you would, provide a case example of \r\n not
being the same as \015\012 syntax.

Any time you read a text file on MacOS Classic.
The end-of-line character, \015, is converted to \n on input
and \n on output is converted to \015. If the text file does
happen to contain \012, it is converted to \r on input and
\r on output is converted to \012.

This means that many Perl scripts written for Unix can run
unmodified on MacOS Classic, when it comes to reading and
writing lines in files on the native file system. This also
means that Unix perl scripts doing I/O to TCP/IP sockets have
problems on MacOS Classic if they use the logical end-of-line
character (\n) instead of the ASCII code for linefeed (\012).

References:

perldoc -f binmode

Mac OS, all variants of Unix, and Stream_LF files on
VMS use a single character to end each line in the
external representation of text (even though that
single character is CARRIAGE RETURN on Mac OS and
LINE FEED on Unix and most VMS files). In other
systems like OS/2, DOS and the various flavors of
MS-Windows your program sees a "\n" as a simple
"\cJ", but what's stored in text files are the two
characters "\cM\cJ". That means that, if you don't
use binmode() on these systems, "\cM\cJ" sequences
on disk will be converted to "\n" on input, and any
"\n" in your program will be converted back to
"\cM\cJ" on output. This is what you want for text
files, but it can be disastrous for binary files.

perldoc Socket

Also, some common socket "newline" constants are provided:
the constants "CR", "LF", and "CRLF", as well as $CR, $LF,
and $CRLF, which map to "\015", "\012", and "\015\012". If
you do not want to use the literal characters in your
programs, then use the constants provided here. They are
not exported by default, but can be imported individually,
and with the ":crlf" export tag:

use Socket qw

DEFAULT :crlf);

-Joe

Alan J. Flavell · May 10, 2004

Any time you read a text file on MacOS Classic.
The end-of-line character, \015, is converted to \n on input

That's very confusing. On MacOS Classic, surely \n _is_ \015: there
is no conversion involved. See perldoc perlport, one version of which
says:

Perl uses "\n" to represent the "logical" newline, where
what is logical may depend on the platform in use. In
MacPerl, "\n" always means "\015". In DOSish perls, "\n"
usually means "\012", but when accessing a file in "text"
mode, STDIO translates it to (or from) "\015\012", depend
ing on whether you're reading or writing. Unix does the
same thing on ttys in canonical mode. "\015\012" is com
monly referred to as CRLF.

and so on.

and \n on output is converted to \015. If the text file does
happen to contain \012, it is converted to \r on input and
\r on output is converted to \012.

Again, no "conversion" takes place, as I understand the message of
perlport. The only "conversion" needed is in the heads of certain
folks.

This means that many Perl scripts written for Unix can run
unmodified on MacOS Classic, when it comes to reading and
writing lines in files on the native file system.

This also means that Unix perl scripts doing I/O to TCP/IP sockets
have problems on MacOS Classic if they use the logical end-of-line
character (\n) instead of the ASCII code for linefeed (\012).

Which is why the FAQs say don't do that. THERE IS NO PROBLEM, other
than the ones created by a refusal to read the documentation.

[your useful additional references snipped for brevity, but I think
they support my contention that Perl does not perform any
"conversion" in this situation.]

Taskcproblem calendar	4	Aug 31, 2023
Minimum Total Difficulty	0	Nov 15, 2023
assignments of arrays	9	Feb 15, 2013
min max from tuples in list	23	Dec 12, 2013
Help: Odd Output	6	Jan 26, 2009
Pool Module: iterator does not yield consistently with differentchunksizes	2	Jul 2, 2010
recursive perl	1	Mar 10, 2010
Unable to compile class for JSP	0	Oct 3, 2010

Why does chomp leave newlines?

Mark Healey

David Efflandt

Mark Healey

Bob Walton

Dave Cross

Alan J. Flavell

gnari

Dave Cross

Joe Smith

Joe Smith

Alan J. Flavell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads