Split string on empty line

S

Sen Haerens

I'm using string.split(/^$/m, 2) on a curl output to separate header
and body. There’s an empty line between them. ^$ doesn’t seem to work...

Example curl output:
HTTP/1.1 404 Not Found
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>test</TITLE>
</HEAD><BODY>
<H1>test</H1>
The requested URL was not found on this server.<P>
<HR>
</BODY></HTML>
 
R

RobG

Sen said:
I'm using string.split(/^$/m, 2) on a curl output to separate header and
body. There’s an empty line between them. ^$ doesn’t seem to work...

First the caveat: not all UAs support the use of regular expressions as
arguments to split().

Now the problem: you haven't given any regular expression to match. The
'^' character means match the pattern when it occurs at the start of a
line, it does not match the start of a line itself. Similarly, $ does
not match the end of a line (which is different to how they are treated
in regular expressions in some other environments).

If you are looking for an empty line, then match the pattern that
represents two consecutive newlines. Where text is input to a browser
through a form control, in Firefox the required pattern is \n\n and in
IE it is \r\n\r\n. Other patterns may be needed for other browsers.
Since your text is generated elsewhere, you may need some other pattern.

You can match different patterns simultaneously using '|' (which means
or) between the patterns:

/\n\n|\r\n\r\n/


will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
new lines in both Firefox and IE (presuming that there is absolutely
nothing on each line).

A safer pattern that allows for possible white space on the 'empty' line is:

/\n\s*\n|\r\n\s*\r\n/

Example curl output:
HTTP/1.1 404 Not Found
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

That does not have any empty lines, how should head and body be
separated? Here is a small test case based on your sample text:


<form action="">
<textarea id="ta" rows="10" cols="60">HTTP/1.1 404 Not Found

Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1</textarea>
<input type="button" value="Split header & body"
onclick="splitHB(this.form.ta.value);">
</form>

<script type="text/javascript">
function splitHB(txt)
{
var bits = txt.split(/\n\s*\n|\r\n\s*\r\n/);
alert('Header:\n' + bits[0]
+ '\n\nBody:\n' + bits[1]);
}
</script>


Shows an alert with:

Header:
HTTP/1.1 404 Not Found

Body:
Date: Wed, 22 Feb 2006 00:01:45 GMT
Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1


[...]
 
S

Sen Haerens

Similarly, $ does not match the end of a line (which is different to
how they are treated in regular expressions in some other environments).

That was the confusing part. ^$ works perfectly in other programs.
A safer pattern that allows for possible white space on the 'empty' line is:
/\n\s*\n|\r\n\s*\r\n/

This one pulled off the trick.

Thanks a lot!
Sen
 
T

Thomas 'PointedEars' Lahn

RobG said:
First the caveat: not all UAs support the use of regular expressions as
arguments to split().

Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
specified for Regular Expressions in ECMAScript Edition 3, is not supported
before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
based).
Now the problem: you haven't given any regular expression to match. The
'^' character means match the pattern when it occurs at the start of a
line, it does not match the start of a line itself. Similarly, $ does
not match the end of a line (which is different to how they are treated
in regular expressions in some other environments).

IBTD. `$' matches end-of-line with the `m' modifier, or end-of-input
without that modifier. It does not match the newline character (sequence),
_at the end of a line_, though.
If you are looking for an empty line, then match the pattern that
represents two consecutive newlines. Where text is input to a browser
through a form control, in Firefox the required pattern is \n\n and in
IE it is \r\n\r\n. Other patterns may be needed for other browsers.
Since your text is generated elsewhere, you may need some other pattern.

You can match different patterns simultaneously using '|' (which means
or) between the patterns:

/\n\n|\r\n\r\n/

Can be simplified to and made more compatible with /(\r\n?|\n){2}/.
will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
new lines in both Firefox and IE (presuming that there is absolutely
nothing on each line).

A safer pattern that allows for possible white space on the 'empty' line
is:

/\n\s*\n|\r\n\s*\r\n/

Consequently,

/(\r\n?|\n)\s*(\r\n?|\n)/

or

/(\r\n?|\n)\s*\1/

if backreferences are supported within the expression.

[completed quotation]
^

That does not have any empty lines,

There can be no completely empty line here, really. Every line that
conforms to the Internet Message Format (RFC822/STD11), as a line in
a HTTP/1.1 response, must be ended with <CR><LF>.


PointedEars
 
S

Sen Haerens

RobG wrote:
Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
specified for Regular Expressions in ECMAScript Edition 3, is not supported
before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
based).

The browser used is Safari 2.0.3.
There can be no completely empty line here, really. Every line that
conforms to the Internet Message Format (RFC822/STD11), as a line in
a HTTP/1.1 response, must be ended with <CR><LF>.

That clears it all up. Thank you!
 
T

Thomas 'PointedEars' Lahn

Sen Haerens wrote:

[Restored context]
[...] Thomas 'PointedEars' Lahn [...] said:
Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0,
JScript 3.0/IE 4.0) do. The issue is a different one here: The `m'
modifier, specified for Regular Expressions in ECMAScript Edition 3,
is not supported before JavaScript 1.5 (Gecko-based incl. NN6+) and
JScript 3.0 (IE 4.0+ based).

The browser used is Safari 2.0.3.

Apple Safari 2.0.3's Webcore 417.8 and WebKit/417.9 (released January 10,
2006) should implement at least KJS 3.4.1 (KDE 3.4.1 was released May 31,
2005), which supports both a Regular Expression object reference as
argument to String.prototype.split(), and the `m' modifier for Regular
Expressions:

<URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/string__object_8cpp-source.html>
<URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/regexp__object_8cpp-source.html>

However, it is unclear what the test browser has to do with the target
browser here.
That clears it all up. Thank you!

You are welcome :)


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top