string length and newlines

Rob · Jan 10, 2008

I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob

jhurstus · Jan 10, 2008

I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob

The most compact code I can think of would be:

var num_newlines = some_string.split("\n").length - 1;
var augmented_length = some_string.length + num_newlines;

-Joey

Anthony Levensalor · Jan 10, 2008

The most compact code I can think of would be:

var num_newlines = some_string.split("\n").length - 1;
var augmented_length = some_string.length + num_newlines;

-Joey

some_string.replace(/\n/g/, ' ');

will replace the newlines in the string with a single space

Dr J R Stockton · Jan 10, 2008

In comp.lang.javascript message <[email protected]>
, Thu, 10 Jan 2008 12:26:44, Anthony Levensalor

That requires creating an Object for each line, which could be a little
slow.

some_string.replace(/\n/g/, ' ');

will replace the newlines in the string with a single space

If the third slash is first removed.

If the line separation always contains a match to \n, then

X = some_string.replace(/[^\n]/g, "")

should give a count of new lines, *possibly* quicker. Untested.

Anthony Levensalor · Jan 10, 2008

If the third slash is first removed.

D'oh! Thanks for the catch

If the line separation always contains a match to \n, then

X = some_string.replace(/[^\n]/g, "")

should give a count of new lines, *possibly* quicker. Untested.

Actually, I'm betting it's a ton quicker, but I am waaaaaay too busy to
test that at the moment.

~A!

David Mark · Jan 11, 2008

I am trying to perform client-side input validation for a textarea to
determine that the number of characters doesn't exceed a certain
length. Currently, I am just using str.length, but if the textarea
contains newlines, str.length is inaccurate. If I view the string, I
will see something like '123\n456', but when this gets passed back to
the server, '\n' will be changed to '\r\n' and what had a length of 7
characters now has a length of 8. Is the best way to approach this to
search the string for newlines and add 1 to the length count or is
there a simpler way?

Thanks for your help,
Rob

What happens if you add a maxlength attribute to the textarea? If it
still allows the wrong number of characters, strip out the /r's on the
server before storing the data.

Thomas 'PointedEars' Lahn · Jan 11, 2008

Rob said:
[...] Currently, I am just using str.length, but if the textarea contains
newlines, str.length is inaccurate.

It isn't.

If I view the string, I will see something like '123\n456', but when this
gets passed back to the server, '\n' will be changed to '\r\n' and what
had a length of 7 characters now has a length of 8. Is the best way to
approach this to search the string for newlines and add 1 to the length
count
No.

or is there a simpler way?

Your problem is server-side, not client-side. And since you can't expect
consistent results from the client, you should replace all \r and \r\n with
\n server-side before, as I suppose, storing it in the database.

PointedEars

Bart Van der Donck · Jan 13, 2008

David said:
What happens if you add a maxlength attribute to the textarea?

MAXLENGTH is only for INPUT elements:
http://www.w3.org/TR/html4/interact/forms.html#h-17.7

If it still allows the wrong number of characters, strip out the /r's
on the server before storing the data.

I don't believe this is the most robust strategy. It is the browser
itself who silently converts \n (or \r) into \r\n, before the data is
sent to the server. The script at the server only reads out what was
offered. Consider the following test:

<form method="get" name="f">
<textarea name="t" rows="10" cols="10"></textarea>
<br>
<input type="button" value="Show length"
onClick="alert(document.forms['f'].elements['t'].value.length)">
<input type="submit">
</form>

The results are different: e.g. Vista MSIE7 shows 2 characters for an
EOL, and Firefox 1. But that has no direct importance in the OP's
case, since the data had been offered to the server.

http://www.rfc-editor.org/EOLstory.txt says:

| ASCII text (ed.: like percent-encoded form-data) transmitted across
| the network *must* use the two-character sequence: CR LF (ed.: \r
\n).

I don't agree with your suggestion to store end-of-line characters as
\n by force; I would always store \r\n, as offered by the browser.

To calculate the length, I would use a regular expression to replace \r
\n by a single character.

David Mark · Jan 14, 2008

MAXLENGTH is only for INPUT elements:http://www.w3.org/TR/html4/interact/forms.html#h-17.7

My mistake.

I don't believe this is the most robust strategy. It is the browser
itself who silently converts \n (or \r) into \r\n, before the data is
sent to the server. The script at the server only reads out what was
offered. Consider the following test:

But the database should store in a predetermined canonical form,
regardless of what the browser says. Whether that is \n, \n\r or \r
is up to the DBA.

<form method="get" name="f">
<textarea name="t" rows="10" cols="10"></textarea>
<br>
<input type="button" value="Show length"
onClick="alert(document.forms['f'].elements['t'].value.length)">
<input type="submit">
</form>

The results are different: e.g. Vista MSIE7 shows 2 characters for an
EOL, and Firefox 1. But that has no direct importance in the OP's
case, since the data had been offered to the server.

http://www.rfc-editor.org/EOLstory.txtsays:

| ASCII text (ed.: like percent-encoded form-data) transmitted across
| the network *must* use the two-character sequence: CR LF (ed.: \r
\n).

I don't agree with your suggestion to store end-of-line characters as
\n by force; I would always store \r\n, as offered by the browser.

As offered by which browser? As mentioned, some don't send \r\n.

To calculate the length, I would use a regular expression to replace \r
\n by a single character.

Then how could you store what is offered by the browser?

Bart Van der Donck · Jan 14, 2008

David said:
But the database should store in a predetermined canonical form,
regardless of what the browser says. Whether that is \n, \n\r or \r
is up to the DBA.

You probably mean '\r\n' in stead of '\n\r'. I would say that it's
rather up to the operating system. I haven't seen a case where the DBA
interferes with these OS settings when it comes to _storing_ data.

From http://en.wikipedia.org/wiki/Newline :
\r: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS
X, etc.), BeOS, Amiga, RISC OS, and others
\r\n: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/
M, DOS, OS/2, Microsoft Windows
\n: Commodore machines, Apple II family and Mac OS up to version 9

As offered by which browser? As mentioned, some don't send \r\n.

When a browser doesn't send '\r\n', it violates RFC (see quotation
above from http://www.rfc-editor.org/EOLstory.txt). The word *must*
means:

| MUST This word, or the terms "REQUIRED" or "SHALL", mean that
the
| definition is an absolute requirement of the specification.

http://www.faqs.org/rfcs/rfc2119.html

One can safely conclude that a browser which doesn't send '\r\n' is a
bad browser.

Then how could you store what is offered by the browser?

The browser *must* offer '\r\n' anyhow, so in theory there can be no
discussion. It is the operating system which decides which newline-
character it uses internally. You are right that the stored data might
not be identical to the data that was offered by the browser regarding
line-ends. But this is not important for browsers, because any stored
line-end *must* be sent over the network again as '\r\n', no matter
how it was stored at server.

David Mark · Jan 14, 2008

You probably mean '\r\n' in stead of '\n\r'. I would say that it's

Yes. CRLF.

rather up to the operating system. I haven't seen a case where the DBA
interferes with these OS settings when it comes to _storing_ data.

Fromhttp://en.wikipedia.org/wiki/Newline:
\r: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS
X, etc.), BeOS, Amiga, RISC OS, and others
\r\n: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/
M, DOS, OS/2, Microsoft Windows
\n: Commodore machines, Apple II family and Mac OS up to version 9

When a browser doesn't send '\r\n', it violates RFC (see quotation
above fromhttp://www.rfc-editor.org/EOLstory.txt). The word *must*
means:

| MUST This word, or the terms "REQUIRED" or "SHALL", mean that
the
| definition is an absolute requirement of the specification.

http://www.faqs.org/rfcs/rfc2119.html

One can safely conclude that a browser which doesn't send '\r\n' is a
bad browser.

I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.

The issue is only related to client-side validation. If the client
counts /n as one character, then it will disagree with the server side
validation. Your suggestion to convert two characters to one before
client-side validation doesn't seem to address the issue (though I may
be missing something.) It seems more logical to me to do the opposite
(you know it will be sent as two, so count it as two in the client.)
If the database stores it as one, there is no harm done.

Steve Swift · Jan 15, 2008

David said:
I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.

I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.

I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area. I'm curious to
know if anyone sees a problem with this; I've not encountered one in
many years of practice.

Bart Van der Donck · Jan 15, 2008

Steve said:
I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.

Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC. This is an old and wide-spread convention; I
would be surprised to see any browser which would behave differently
(I would immediately send a bug report anyway).

I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area.

You should be pretty safe. MSIE, FF and Opera don't allow \x01 and
\x02 to be typed inside form elements; CTRL+A and CTRL+B are shortcuts
to browser functions.

I'm curious to know if anyone sees a problem with this; I've not
encountered one in many years of practice.

I think you have a robust solution. A good deal of the ASCII control
characters were actually meant for this purpose; you see them all the
time on older mainframe systems.

Steve Swift · Jan 16, 2008

Bart said:
Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.

Bart, Thank you for confirming what I'd noticed in practice.
I do, however, have a few examples where single x'0A' characters have
found their way into my data files, and since this is the linend
sequence on my linux server, it caused problems.

I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs.

Bart Van der Donck · Jan 16, 2008

Steve said:
Bart, Thank you for confirming what I'd noticed in practice.
I do, however, have a few examples where single x'0A' characters have
found their way into my data files, and since this is the linend
sequence on my linux server, it caused problems.

I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.

I'm thinking of 4 possibilities:

- manual crafting of the URL (?data=one%0Atwo)
- an incorrect browser violating RFC
- an error in the regular expression or its execution order; in your
case it's necessary to first do:
'\r\n' -> '\x01\x02'
before
'\r' -> '\x01\x02' and '\n' -> '\x01\x02'
- incorrect URL parsing of the server script like ?data=one%250Atwo
or something with percent-encoding under UTF-8 (headache warning)

I would go for your pragmatic approach as well.

Dr J R Stockton · Jan 16, 2008

Wed said:
I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs.

Whenever data is of possibly uncertain origin, it is well to assume the
worst of the characters which come between the lines.

In (past?) Delphi, for example, one could by various editing generate a
source file in which most line separations were CRLF but some were just
LF (or maybe just CR). Unfortunately, the IDE editor believed both, but
the compiler only believed LF.

Therefore, in Delphi, with
<statement1> CR LF
// comment LF
<statement2> CR LF
<statement3> CR LF

<statement2> would not be compiled. An LF between statements would not
matter so much, since, in Delphi, newline is a terminator only for that
type of comment, and not for code statements.

One needs an algorithm to convert bad newlines to good ones.

Thomas 'PointedEars' Lahn · Jan 16, 2008

Dr said:
One needs an algorithm to convert bad newlines to good ones.

man recode
man iconv

PointedEars

Sort and count word pairs in a string	6	Jan 29, 2023
Measuring a string of text	1	Sep 15, 2022
Remove trailing newlines (blank lines) ???	6	Jan 27, 2008
CRLF Newlines and libc	1	Jul 27, 2009
two types of newlines (\n and \r\n) and browser	1	Jan 11, 2007
String and list error while running a Markov Chain	1	Aug 26, 2020
Small JS Countdown timer where user has to type string of numbers to stop it and win	8	Jun 16, 2024
universal newlines and utf-16	0	Apr 11, 2010

string length and newlines

Rob

jhurstus

Anthony Levensalor

Dr J R Stockton

Anthony Levensalor

David Mark

Thomas 'PointedEars' Lahn

Bart Van der Donck

David Mark

Bart Van der Donck

David Mark

Steve Swift

Bart Van der Donck

Steve Swift

Bart Van der Donck

Dr J R Stockton

Thomas 'PointedEars' Lahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads