Split string on empty line

Discussion in 'Javascript' started by Sen Haerens, Feb 22, 2006.

  1. Sen Haerens

    Sen Haerens Guest

    I'm using string.split(/^$/m, 2) on a curl output to separate header
    and body. There’s an empty line between them. ^$ doesn’t seem to work...

    Example curl output:
    HTTP/1.1 404 Not Found
    Date: Wed, 22 Feb 2006 00:01:45 GMT
    Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=iso-8859-1

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <HTML><HEAD>
    <TITLE>test</TITLE>
    </HEAD><BODY>
    <H1>test</H1>
    The requested URL was not found on this server.<P>
    <HR>
    </BODY></HTML>
     
    Sen Haerens, Feb 22, 2006
    #1
    1. Advertising

  2. Sen Haerens

    RobG Guest

    Sen Haerens wrote:
    > I'm using string.split(/^$/m, 2) on a curl output to separate header and
    > body. There’s an empty line between them. ^$ doesn’t seem to work...


    First the caveat: not all UAs support the use of regular expressions as
    arguments to split().

    Now the problem: you haven't given any regular expression to match. The
    '^' character means match the pattern when it occurs at the start of a
    line, it does not match the start of a line itself. Similarly, $ does
    not match the end of a line (which is different to how they are treated
    in regular expressions in some other environments).

    If you are looking for an empty line, then match the pattern that
    represents two consecutive newlines. Where text is input to a browser
    through a form control, in Firefox the required pattern is \n\n and in
    IE it is \r\n\r\n. Other patterns may be needed for other browsers.
    Since your text is generated elsewhere, you may need some other pattern.

    You can match different patterns simultaneously using '|' (which means
    or) between the patterns:

    /\n\n|\r\n\r\n/


    will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
    new lines in both Firefox and IE (presuming that there is absolutely
    nothing on each line).

    A safer pattern that allows for possible white space on the 'empty' line is:

    /\n\s*\n|\r\n\s*\r\n/


    > Example curl output:
    > HTTP/1.1 404 Not Found
    > Date: Wed, 22 Feb 2006 00:01:45 GMT
    > Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
    > Transfer-Encoding: chunked
    > Content-Type: text/html; charset=iso-8859-1


    That does not have any empty lines, how should head and body be
    separated? Here is a small test case based on your sample text:


    <form action="">
    <textarea id="ta" rows="10" cols="60">HTTP/1.1 404 Not Found

    Date: Wed, 22 Feb 2006 00:01:45 GMT
    Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=iso-8859-1</textarea>
    <input type="button" value="Split header & body"
    onclick="splitHB(this.form.ta.value);">
    </form>

    <script type="text/javascript">
    function splitHB(txt)
    {
    var bits = txt.split(/\n\s*\n|\r\n\s*\r\n/);
    alert('Header:\n' + bits[0]
    + '\n\nBody:\n' + bits[1]);
    }
    </script>


    Shows an alert with:

    Header:
    HTTP/1.1 404 Not Found

    Body:
    Date: Wed, 22 Feb 2006 00:01:45 GMT
    Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
    Transfer-Encoding: chunked
    Content-Type: text/html; charset=iso-8859-1


    [...]


    --
    Rob
     
    RobG, Feb 22, 2006
    #2
    1. Advertising

  3. Sen Haerens

    Sen Haerens Guest

    On 2006-02-22 02:22:09 +0100, RobG <> said:

    > Similarly, $ does not match the end of a line (which is different to
    > how they are treated in regular expressions in some other environments).


    That was the confusing part. ^$ works perfectly in other programs.

    > A safer pattern that allows for possible white space on the 'empty' line is:
    > /\n\s*\n|\r\n\s*\r\n/


    This one pulled off the trick.

    Thanks a lot!
    Sen
     
    Sen Haerens, Feb 22, 2006
    #3
  4. RobG wrote:

    > Sen Haerens wrote:
    >> I'm using string.split(/^$/m, 2) on a curl output to separate header and
    >> body. There’s an empty line between them. ^$ doesn’t seem to work...

    >
    > First the caveat: not all UAs support the use of regular expressions as
    > arguments to split().


    Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
    3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
    specified for Regular Expressions in ECMAScript Edition 3, is not supported
    before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
    based).

    > Now the problem: you haven't given any regular expression to match. The
    > '^' character means match the pattern when it occurs at the start of a
    > line, it does not match the start of a line itself. Similarly, $ does
    > not match the end of a line (which is different to how they are treated
    > in regular expressions in some other environments).


    IBTD. `$' matches end-of-line with the `m' modifier, or end-of-input
    without that modifier. It does not match the newline character (sequence),
    _at the end of a line_, though.

    > If you are looking for an empty line, then match the pattern that
    > represents two consecutive newlines. Where text is input to a browser
    > through a form control, in Firefox the required pattern is \n\n and in
    > IE it is \r\n\r\n. Other patterns may be needed for other browsers.
    > Since your text is generated elsewhere, you may need some other pattern.
    >
    > You can match different patterns simultaneously using '|' (which means
    > or) between the patterns:
    >
    > /\n\n|\r\n\r\n/


    Can be simplified to and made more compatible with /(\r\n?|\n){2}/.

    > will match either \n\n or \r\n\r\n - i.e. a sequence of two consecutive
    > new lines in both Firefox and IE (presuming that there is absolutely
    > nothing on each line).
    >
    > A safer pattern that allows for possible white space on the 'empty' line
    > is:
    >
    > /\n\s*\n|\r\n\s*\r\n/


    Consequently,

    /(\r\n?|\n)\s*(\r\n?|\n)/

    or

    /(\r\n?|\n)\s*\1/

    if backreferences are supported within the expression.

    [completed quotation]
    >> Example curl output:
    >> HTTP/1.1 404 Not Found
    >> Date: Wed, 22 Feb 2006 00:01:45 GMT
    >> Server: Apache/1.3.33 (Darwin) PHP/5.1.2 mod_perl/1.29
    >> Transfer-Encoding: chunked
    >> Content-Type: text/html; charset=iso-8859-1
    >>

    ^
    >
    > That does not have any empty lines,


    There can be no completely empty line here, really. Every line that
    conforms to the Internet Message Format (RFC822/STD11), as a line in
    a HTTP/1.1 response, must be ended with <CR><LF>.


    PointedEars
     
    Thomas 'PointedEars' Lahn, Feb 22, 2006
    #4
  5. Sen Haerens

    Sen Haerens Guest

    On 2006-02-22 15:23:34 +0100, Thomas 'PointedEars' Lahn
    <> said:

    > RobG wrote:
    > Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0, JScript
    > 3.0/IE 4.0) do. The issue is a different one here: The `m' modifier,
    > specified for Regular Expressions in ECMAScript Edition 3, is not supported
    > before JavaScript 1.5 (Gecko-based incl. NN6+) and JScript 3.0 (IE 4.0+
    > based).


    The browser used is Safari 2.0.3.

    > There can be no completely empty line here, really. Every line that
    > conforms to the Internet Message Format (RFC822/STD11), as a line in
    > a HTTP/1.1 response, must be ended with <CR><LF>.


    That clears it all up. Thank you!
     
    Sen Haerens, Feb 24, 2006
    #5
  6. Sen Haerens wrote:

    [Restored context]
    > [...] Thomas 'PointedEars' Lahn [...] said:
    >> RobG wrote:
    >>> First the caveat: not all UAs support the use of regular expressions
    >>> as arguments to split().

    >> Most certainly those built after June 1997 (JavaScript 1.2/NN 4.0,
    >> JScript 3.0/IE 4.0) do. The issue is a different one here: The `m'
    >> modifier, specified for Regular Expressions in ECMAScript Edition 3,
    >> is not supported before JavaScript 1.5 (Gecko-based incl. NN6+) and
    >> JScript 3.0 (IE 4.0+ based).

    >
    > The browser used is Safari 2.0.3.


    Apple Safari 2.0.3's Webcore 417.8 and WebKit/417.9 (released January 10,
    2006) should implement at least KJS 3.4.1 (KDE 3.4.1 was released May 31,
    2005), which supports both a Regular Expression object reference as
    argument to String.prototype.split(), and the `m' modifier for Regular
    Expressions:

    <URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/string__object_8cpp-source.html>
    <URL:http://developer.kde.org/documentation/library/3.4-api/kjs/html/regexp__object_8cpp-source.html>

    However, it is unclear what the test browser has to do with the target
    browser here.

    >> There can be no completely empty line here, really. Every line that
    >> conforms to the Internet Message Format (RFC822/STD11), as a line in
    >> a HTTP/1.1 response, must be ended with <CR><LF>.

    >
    > That clears it all up. Thank you!


    You are welcome :)


    PointedEars
     
    Thomas 'PointedEars' Lahn, Feb 24, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hanif
    Replies:
    6
    Views:
    18,010
    Paul Lutus
    Oct 17, 2003
  2. Carlos Ribeiro
    Replies:
    11
    Views:
    723
    Alex Martelli
    Sep 17, 2004
  3. Simon Strandgaard

    [bug] String#split returns extra empty string

    Simon Strandgaard, May 31, 2004, in forum: Ruby
    Replies:
    8
    Views:
    362
    David A. Black
    Jun 1, 2004
  4. Sam Kong
    Replies:
    5
    Views:
    263
    Rick DeNatale
    Aug 12, 2006
  5. Stanley Xu
    Replies:
    2
    Views:
    678
    Stanley Xu
    Mar 23, 2011
Loading...

Share This Page