Remove trailing comments exercise

Discussion in 'Javascript' started by Csaba Gabor, Nov 4, 2009.

  1. Csaba  Gabor

    Csaba Gabor Guest

    I'm looking for a
    function stripEndComments(code) {
    // remove trailing comments and whitespace from
    /* the end of code, which is presumed to be valid
    // javascript */
    ... }


    My previous post at
    http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/2aa9a60623eb5883/
    may amount to more than just an exercise, so I am
    slicing off part of it into an independent exercise
    (and this one IS just an exercise).

    Assume the use of the function
    function checkSyntax(code) {
    // returns false if code is not syntactically OK
    // returns browser's (string) interpretation of the code if it's
    OK,
    // encapsulated in an anonymous function
    try {
    var f = new Function(code);
    return f.toString(); }
    catch (err) { return false; } } // syntax error


    Some examples:
    foo + bar // two comments /* or one? *//
    => foo + bar

    "Foo" + "bar" /* three */ // lines
    // of comments /* should all be
    /* stripped off *////
    => "Foo" + "bar"


    For the rambunctious: remove trailing empty statements, too:
    code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
    => baz/* junk */+borf; fubar


    Csaba Gabor from Vienna
     
    Csaba Gabor, Nov 4, 2009
    #1
    1. Advertising

  2. Csaba  Gabor

    SAM Guest

    Le 11/4/09 12:51 PM, Csaba Gabor a écrit :
    > I'm looking for a
    > function stripEndComments(code) {
    > // remove trailing comments and whitespace from
    > /* the end of code, which is presumed to be valid
    > // javascript */
    > ... }

    (...)
    > For the rambunctious: remove trailing empty statements, too:
    > code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
    > => baz/* junk */+borf; fubar


    I get,
    Firefox.3 :
    baz + borf;
    fubar;
    IE.5, 6 and 7 :
    baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;

    not yet finished ?

    --
    sm
     
    SAM, Nov 4, 2009
    #2
    1. Advertising

  3. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 4, 1:11 pm, SAM <>
    wrote:
    > Le 11/4/09 12:51 PM, Csaba Gabor a écrit :
    >
    > > I'm looking for a
    > > function stripEndComments(code) {
    > >   // remove trailing comments and whitespace from
    > >   /* the end of code, which is presumed to be valid
    > >   // javascript */
    > >   ... }

    > (...)
    > > For the rambunctious: remove trailing empty statements, too:
    > > code = "baz/* junk */+borf; fubar  ; /* more junk */ ; ;; ;"
    > > => baz/* junk */+borf; fubar

    >
    > I get,
    > Firefox.3 :
    >      baz + borf;
    >      fubar;
    > IE.5, 6 and 7 :
    >      baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
    >
    > not yet finished ?


    Hi SAM, what you have shown is what FF/IE returns if
    you put the mentioned strings into a function and then
    do a .toString() on it. FF cleans all comments
    whereas IE leaves them in.

    However, in this exercise, I'd like to strip the TRAILING
    comments only, in an as browser independent fashion
    as possible (without recasting the code string into
    a different form). The part to the right of the
    => above indicates the string that the desired
    function, stripEndComments, should return.
    Therefore, you can use checkSyntax as a false vs.
    nonempty-string check, but I don't think you'll find
    the actual nonempty string return values useful for the
    purposes of this exercise.
     
    Csaba Gabor, Nov 4, 2009
    #3
  4. On Nov 4, 11:51 am, Csaba Gabor wrote:
    > I'm looking for a
    > function stripEndComments(code) {
    > // remove trailing comments and whitespace from
    > /* the end of code, which is presumed to be valid
    > // javascript */
    > ... }
    >
    > My previous post at ...
    > may amount to more than just an exercise, so I am
    > slicing off part of it into an independent exercise
    > (and this one IS just an exercise).
    >
    > Assume the use of the function
    > function checkSyntax(code) {
    > // returns false if code is not syntactically OK
    > // returns browser's (string) interpretation of the code if it's
    > OK,
    > // encapsulated in an anonymous function
    > try {
    > var f = new Function(code);
    > return f.toString(); }
    > catch (err) { return false; } } // syntax error
    >
    > Some examples:
    > foo + bar // two comments /* or one? *//
    > => foo + bar
    >
    > "Foo" + "bar" /* three */ // lines
    > // of comments /* should all be
    > /* stripped off *////
    > => "Foo" + "bar"
    >
    > For the rambunctious: remove trailing empty statements, too:
    > code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
    > => baz/* junk */+borf; fubar


    This problem includes the problem of not reacting to comment
    delimiters whenever they appear in strings in the source code. For
    example, stripping everything from the // to the end of the line in
    the following would be disastrous:-

    var prefixToIRI = {
    'xsd':'http://www.w3.org/2001/XMLSchema',
    'env':'http://schemas.xmlsoap.org/soap/envelope/',
    'xsi':'http://www.w3.org/2001/XMLSchema-instance',
    'xml':'http://www.w3.org/XML/1998/namespace',
    'xmlns':'http://www.w3.org/2000/xmlns'
    };

    So for this task it seems necessary to identify the string literals
    within the source, which is getting towards tokenising the source.
    Tokenising the source was already implied in the task of verifying the
    syntax of the code (along with identifying comments) so maybe this
    stage should not be separated from the previous task if you genuinely
    want all the comments removed.

    Richard.
     
    Richard Cornford, Nov 4, 2009
    #4
  5. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 4, 1:37 pm, Richard Cornford <>
    wrote:
    > On Nov 4, 11:51 am, Csaba  Gabor wrote:
    >
    > > I'm looking for a
    > > function stripEndComments(code) {
    > >   // remove trailing comments and whitespace from
    > >   /* the end of code, which is presumed to be valid
    > >   // javascript */
    > >   ... }

    >
    > > My previous post at ...
    > > may amount to more than just an exercise, so I am
    > > slicing off part of it into an independent exercise
    > > (and this one IS just an exercise).

    >
    > > Assume the use of the function
    > > function checkSyntax(code) {
    > >   // returns false if code is not syntactically OK
    > >   // returns browser's (string) interpretation of the code if it's
    > > OK,
    > >   //     encapsulated in an anonymous function
    > >   try {
    > >     var f = new Function(code);
    > >     return f.toString(); }
    > >   catch (err) { return false; }  } // syntax error

    >
    > > Some examples:
    > > foo + bar // two comments /* or one? *//
    > > => foo + bar

    >
    > > "Foo" + "bar" /* three */   // lines
    > > // of comments /* should all be
    > > /* stripped off *////
    > > => "Foo" + "bar"

    >
    > > For the rambunctious: remove trailing empty statements, too:
    > > code = "baz/* junk */+borf; fubar  ; /* more junk */ ; ;; ;"
    > > => baz/* junk */+borf; fubar

    >
    > This problem includes the problem of not reacting to comment
    > delimiters whenever they appear in strings in the source code. For
    > example, stripping everything from the // to the end of the line in
    > the following would be disastrous:-


    I don't want to strip all comments, just those at the
    very tail end of the code string (as the 3rd example suggests).
    For example:
    foo(); // comment1
    bar(); // comment2
    =>
    foo(); // comment1
    bar()

    > var prefixToIRI = {
    >     'xsd':'http://www.w3.org/2001/XMLSchema',
    >     'env':'http://schemas.xmlsoap.org/soap/envelope/',
    >     'xsi':'http://www.w3.org/2001/XMLSchema-instance',
    >     'xml':'http://www.w3.org/XML/1998/namespace',
    >     'xmlns':'http://www.w3.org/2000/xmlns'
    >
    > };
    >
    > So for this task it seems necessary to identify the string literals
    > within the source, which is getting towards tokenising the source.


    Hopefully, we can stay away from tokenising. If we do have to
    enter the business of tokenising (in any substantive way) to
    solve this problem, it would no longer be an exercise.
    Perhaps it is better to use the browser's embedded parser to help out.

    > Tokenising the source was already implied in the task of verifying the
    > syntax of the code (along with identifying comments) so maybe this
    > stage should not be separated from the previous task if you genuinely
    > want all the comments removed.


    Removing all the comments would seem to be a messier
    problem (which I haven't thought about in this context).
    I've done this (removed all comments) in the past for
    PHP code, and it was around 60 lines of somewhat
    intricate code (in parsing the original code string).
    But I do not advocate such approach for this exercise.

    > Richard
     
    Csaba Gabor, Nov 4, 2009
    #5
  6. Csaba  Gabor

    Stevo Guest

    Csaba Gabor wrote:
    > I'm looking for a
    > function stripEndComments(code) {
    > // remove trailing comments and whitespace from
    > /* the end of code, which is presumed to be valid
    > // javascript */
    > ... }
    >
    >
    > My previous post at
    > http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/2aa9a60623eb5883/
    > may amount to more than just an exercise, so I am
    > slicing off part of it into an independent exercise
    > (and this one IS just an exercise).


    Why are you talking about this as an exercise all the time? Is that your
    way of getting people to write your code for you? Pretend it's just
    an abstract exercise for fun?
     
    Stevo, Nov 4, 2009
    #6
  7. Csaba  Gabor

    SAM Guest

    Le 11/4/09 1:37 PM, Csaba Gabor a écrit :
    > On Nov 4, 1:11 pm, SAM <>
    > wrote:
    >> Le 11/4/09 12:51 PM, Csaba Gabor a écrit :
    >>
    >>> I'm looking for a
    >>> function stripEndComments(code) {
    >>> // remove trailing comments and whitespace from
    >>> /* the end of code, which is presumed to be valid
    >>> // javascript */
    >>> ... }

    >> (...)
    >>> For the rambunctious: remove trailing empty statements, too:
    >>> code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
    >>> => baz/* junk */+borf; fubar

    >> I get,
    >> Firefox.3 :
    >> baz + borf;
    >> fubar;
    >> IE.5, 6 and 7 :
    >> baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
    >>
    >> not yet finished ?

    >
    > Hi SAM, what you have shown is what FF/IE returns if
    > you put the mentioned strings into a function and then
    > do a .toString() on it. FF cleans all comments
    > whereas IE leaves them in.


    Yes (the function checkSyntax() you've given).

    > However, in this exercise, I'd like to strip the TRAILING
    > comments only, in an as browser independent fashion
    > as possible (without recasting the code string into
    > a different form).


    javascript:alert("baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
    ;;".replace(/\/[/\*][^\*]+\*\/|\s+|\s*;(?=\s*;)/g,''))

    ==> baz+borf;fubar;;

    can't remove the last ';'

    > The part to the right of the
    > => above indicates the string that the desired
    > function, stripEndComments, should return.
    > Therefore, you can use checkSyntax as a false vs.
    > nonempty-string check, but I don't think you'll find
    > the actual nonempty string return values useful for the
    > purposes of this exercise.


    (not yet understood what is "the" purpose ... comments no ... but yes)

    javascript:alert("baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
    ;;".replace(/\/[/\*][^\*]+\*\/(?=\s*;)|\s+|;(?=\s*;)/g,''))

    ==> baz/*junk*/+borf;fubar;;

    --
    sm
     
    SAM, Nov 4, 2009
    #7
  8. Csaba  Gabor

    abozhilov Guest

    On 4 îÏÅÍ, 13:51, Csaba Gabor <> wrote:
    > I'm looking for a
    > function stripEndComments(code) {
    > š // remove trailing comments and whitespace from
    > š /* the end of code, which is presumed to be valid
    > š // javascript */
    > š ... }


    Something like this?

    code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
    \/)/gm, '');
     
    abozhilov, Nov 4, 2009
    #8
  9. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 4, 6:59 pm, abozhilov <> wrote:
    > On 4 îÃÃ…Ã, 13:51, Csaba  Gabor <> wrote:
    >
    > > I'm looking for a
    > > function stripEndComments(code) {
    > > Å¡ // remove trailing comments and whitespace from
    > > Å¡ /* the end of code, which is presumed to be valid
    > > Å¡ // javascript */
    > > Å¡ ... }

    >
    > Something like this?
    >
    > code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
    > \/)/gm, '');


    You might be able to figure out a way to do this
    with regular expressions, but I'm thinking that
    it will be VERY messy because you will have to
    account for strings and regular expressions such as:
    var code = "var messy='it was windy/*sunny*'+" and */cold/*"

    The first part of your code fails on:
    var code = "var semi=' ; ; ; '";

    While the second replace fails on
    var code = "var k=i + j /* // */";
     
    Csaba Gabor, Nov 4, 2009
    #9
  10. Csaba Gabor wrote:

    > abozhilov wrote:
    >> Csaba Gabor wrote:
    >> > Å¡ // remove trailing comments and whitespace from
    >> > Å¡ /* the end of code, which is presumed to be valid
    >> > Å¡ // javascript */
    >> > Å¡ ... }

    >>
    >> Something like this?
    >>
    >> code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
    >> \/)/gm, '');

    >
    > You might be able to figure out a way to do this
    > with regular expressions, but I'm thinking that
    > it will be VERY messy


    How fortunate then that you don't know what you are talking about.
    It is rather easy to do if you do it properly. For example:

    code = code.replace(
    /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
    function(m, p1, p2, p3, p4) {
    return (p3 || p4) ? "" : m;
    });

    > because you will have to
    > account for strings and regular expressions such as:
    > var code = "var messy='it was windy/*sunny*'+" and */cold/*"


    The concatenation here is rather pointless. Any tokenizer or parser will
    see this equivalent to

    var code = "var messy='it was windy/*sunny*' and */cold/*"

    And

    var messy='it was windy/*sunny*' and */cold/*

    is not syntactically correct to begin with. Which also points out that
    there is not Regular Expression here.


    PointedEars
    --
    var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
    ) // Plone, register_function.js:16
     
    Thomas 'PointedEars' Lahn, Nov 4, 2009
    #10
  11. Thomas 'PointedEars' Lahn wrote:

    > Csaba Gabor wrote:
    >> [...] you will have to account for strings and regular expressions such
    >> as:
    >> var code = "var messy='it was windy/*sunny*'+" and */cold/*"

    ^ ^ ^
    > The concatenation here is rather pointless. [...]


    In fact, there is no concatenation here because it ...

    > is not syntactically correct to begin with. Which also points out that
    > there is not Regular Expression here.



    PointedEars
    --
    Anyone who slaps a 'this page is best viewed with Browser X' label on
    a Web page appears to be yearning for the bad old days, before the Web,
    when you had very little chance of reading a document written on another
    computer, another word processor, or another network. -- Tim Berners-Lee
     
    Thomas 'PointedEars' Lahn, Nov 4, 2009
    #11
  12. In comp.lang.javascript message <7766145b-786d-478a-8a6e-08f2e27826ba@l2
    g2000yqd.googlegroups.com>, Wed, 4 Nov 2009 03:51:10, Csaba Gabor
    <> posted:
    >I'm looking for a
    >function stripEndComments(code) {
    > // remove trailing comments and whitespace from
    > /* the end of code, which is presumed to be valid
    > // javascript */
    > ... }


    Whitespace is trivial.

    You must recognise strings, and not count // or /* within them.
    You must allow for RegExp literals such as /slash=\//.
    Remove all /* ... */ comment; or only if last on one line?

    --
    (c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
    Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
    Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
     
    Dr J R Stockton, Nov 4, 2009
    #12
  13. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 4, 9:06 pm, Csaba Gabor <> wrote:
    > On Nov 4, 6:59 pm, abozhilov <> wrote:
    > > On 4 îÃÃ…Ã, 13:51, Csaba  Gabor <> wrote:

    >
    > > > I'm looking for a
    > > > function stripEndComments(code) {
    > > > Å¡ // remove trailing comments and whitespace from
    > > > Å¡ /* the end of code, which is presumed to be valid
    > > > Å¡ // javascript */
    > > > Å¡ ... }

    >
    > You might be able to figure out a way to do this
    > with regular expressions, but I'm thinking that
    > it will be VERY messy because you will have to
    > account for strings and regular expressions such as:


    > var code = "var messy='it was windy/*sunny*'+" and */cold/*"


    Oops, I see I've made a transcription error. It should read:
    var code = "var messy='it was windy/*sunny*'+' and */cold/*'"

    But the following may be slightly more interesting:
    var code =
    "var mess='it\\'s windy//*sunny*'+' & */cold/*' //asdf"
     
    Csaba Gabor, Nov 5, 2009
    #13
  14. Csaba Gabor wrote:

    > On Nov 4, 9:06 pm, Csaba Gabor <> wrote:
    >> You might be able to figure out a way to do this
    >> with regular expressions, but I'm thinking that
    >> it will be VERY messy because you will have to
    >> account for strings and regular expressions such as:
    >>
    >> var code = "var messy='it was windy/*sunny*'+" and */cold/*"

    >
    > Oops, I see I've made a transcription error. It should read:
    > var code = "var messy='it was windy/*sunny*'+' and */cold/*'"


    Still no RegExp here:

    var messy='it was windy/*sunny* and */cold/*'
    ^ ^

    > But the following may be slightly more interesting:
    > var code =
    > "var mess='it\\'s windy//*sunny*'+' & */cold/*' //asdf"


    You are still on the wrong track.

    var mess='it\\'s windy//*sunny* & */cold/*' //asdf
    ^ ^

    It is really merely an issue to recognize and ignore string literals first,
    then to recognize and ignore RegExp initializers outside of them. My
    replace function already implements the former; adapting it to also take
    care of the latter is left as an exercise to the reader.


    PointedEars
    --
    Prototype.js was written by people who don't know javascript for people
    who don't know javascript. People who don't know javascript are not
    the best source of advice on designing systems that use javascript.
    -- Richard Cornford, cljs, <f806at$ail$1$>
     
    Thomas 'PointedEars' Lahn, Nov 5, 2009
    #14
  15. Thomas 'PointedEars' Lahn <> writes:

    > Csaba Gabor wrote:
    >
    >> abozhilov wrote:
    >>> Csaba Gabor wrote:
    >>> > Å¡ // remove trailing comments and whitespace from
    >>> > Å¡ /* the end of code, which is presumed to be valid
    >>> > Å¡ // javascript */
    >>> > Å¡ ... }

    ....
    > How fortunate then that you don't know what you are talking about.
    > It is rather easy to do if you do it properly. For example:
    >
    > code = code.replace(
    > /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
    > function(m, p1, p2, p3, p4) {
    > return (p3 || p4) ? "" : m;
    > });


    The ('(?:[^']|\\')*') part fails to recognize the end of the following
    string literal:
    'foo \\'
    and will match up to the next "'". Ditto for double-quoted strings.
    Try
    ('(?:[^'\\]|\\[^])*')
    (Here I'm also allowing backslash-newline in string literals, even
    though it's not in the standard, otherwise replace "[^]" with ".").


    And it's easy to add standard (not-single-line) comments as well:
    (\/\*(?:[^*]*\*+)*\/)

    This only works in the absence of regexp literals.
    RegExps are harder to recognize, because it's the syntactic starting
    point that distinguishes the starting slash from a division.
    E.g.,
    /foo + 42/g
    might be a RegExp, if occuring in an expression context, but not
    if it occurs where an operator is expected:
    bar/foo + 42/g
    (I.e., it's not tokenizable without context information).

    And if you can't recognize regexps, you can mess up the recognition
    of comments and strings as well.

    /L
    --
    Lasse Reichstein Holst Nielsen
    'Javascript frameworks is a disruptive technology'
     
    Lasse Reichstein Nielsen, Nov 5, 2009
    #15
  16. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 5, 7:19 am, Lasse Reichstein Nielsen <>
    wrote:
    > Thomas 'PointedEars' Lahn <> writes:
    > > Csaba Gabor wrote:

    >
    > >> abozhilov wrote:
    > >>> Csaba Gabor wrote:
    > >>> > // remove trailing comments and whitespace from
    > >>> > /* the end of code, which is presumed to be valid
    > >>> > // javascript */
    > >>> > ... }

    > ...
    > > How fortunate then that you don't know what you are talking about.
    > > It is rather easy to do if you do it properly.  For example:

    >
    > >   code = code.replace(
    > >     /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
    > >     function(m, p1, p2, p3, p4) {
    > >       return (p3 || p4) ? "" : m;
    > >     });

    >
    > The ('(?:[^']|\\')*') part fails to recognize the end of the following
    > string literal:
    >   'foo \\'
    > and will match up to the next "'". Ditto for double-quoted strings.
    > Try
    >   ('(?:[^'\\]|\\[^])*')
    > (Here I'm also allowing backslash-newline in string literals, even
    > though it's not in the standard, otherwise replace "[^]" with ".").


    Very interesting. I've not seen that [^] construct in
    javascript before. With a PHP regular expression if ] is
    the first character following the ^ in a character class,
    it means to exclude the right closing bracket ]. Evidently,
    PHP's [^]] translates to [^\]] in JS

    > And it's easy to add standard (not-single-line) comments as well:
    >   (\/\*(?:[^*]*\*+)*\/)


    Or: (\/\*.*?(?=\*\/)..)
    though I have not extensively tested it

    > This only works in the absence of regexp literals.
    > RegExps are harder to recognize, because it's the syntactic starting
    > point that distinguishes the starting slash from a division.
    > E.g.,
    >   /foo + 42/g  
    > might be a RegExp, if occuring in an expression context, but not
    > if it occurs where an operator is expected:
    >  bar/foo + 42/g
    > (I.e., it's not tokenizable without context information).
    >
    > And if you can't recognize regexps, you can mess up the
    > recognition of comments and strings as well.


    Indeed. Thanks for that nice reply Lasse. I would be highly
    curious to see a reg exp variant developed to completion.
    Perhaps there should be a separate 'Remove all comments' thread.

    My solution to the 'Remove trailing comments' exercise follows.
    My reason in posing the exercise was to highlight that in the
    best spirit of programming, one may use the browser's syntax
    checking capabilities to do the heavy lifting, rather than
    having to parse the entire code string manually.

    Reminder, I only want to remove the final comments at the end of
    the code, and not at the end of each line. In short, I want to
    be able to get at the last code that actually "does something"
    (or might be doing something).

    After getting rid of trailing whitespace and vacuous lines,
    we consider that there exactly three situations. The final
    characters are either:
    1) Part of a comment started by //
    2) The end of a comment started by /*
    3) Not a comment

    How to test for this (and what to do when we know which case)?

    syntaxCheck(code + ' x y') will pass iff case 1 holds
    and we have a // style comment. In that situation find
    the previous //, strip the final / and perform the test
    (on the stripped version). If it passes, recurse (since
    we're still in the comment). If it fails, strip off one
    more character from the end (the first / of the // pair),
    and recurse on that. We can't be too greedy in the
    passes case because we may have situations like ///

    If case 1, above, does not hold, and the code does not
    end with */, then it is evidently not part of a comment,
    so it is case 3, and we are done.

    Otherwise, find the prior /*. It is either the start
    of the comment or in the middle of it. To test for
    this, replace the /*...*/ with */
    If this passes the syntax check, then we are still
    in the middle of a comment, so we recurse on the just
    tested string. Otherwise, we're at the start of a
    comment so recurse on the just tested string less the
    final two characters.

    Here's the code:
    function stripEndComments(code) {
    // Trim trailing comments from code
    // First trim whitespace and vacuous statements
    code = code.replace(/(\s*;)*\s*$/,"");

    // Next check for double slash type of comment at end
    if (checkSyntax(code + ' x y')) {
    var pos=code.lastIndexOf("//"),
    cS = checkSyntax(code.substr(0,pos+1) + ' x y');
    return stripEndComments(code.substr(0,pos+!!cS)); }

    // In this next case there are no more trailing comments
    if (code.substr(-2)!="*/") return code;

    // Here deal with /* ... /* ... */ comments
    var c = code.substr(0,code.lastIndexOf("/*"));
    return stripEndComments(c.substr(0,c.length-2*!checkSyntax(c)));
    }

    Csaba Gabor from Vienna
     
    Csaba Gabor, Nov 5, 2009
    #16
  17. Csaba  Gabor

    Csaba Gabor Guest

    On Nov 5, 11:20 am, Csaba Gabor <> wrote:
    > On Nov 5, 7:19 am, Lasse Reichstein Nielsen <>
    > wrote:
    > > Thomas 'PointedEars' Lahn <> writes:
    > > > Csaba Gabor wrote:

    >
    > > >> abozhilov wrote:
    > > >>> Csaba Gabor wrote:
    > > >>> >   // remove trailing comments and whitespace from
    > > >>> >   /* the end of code, which is presumed to be valid
    > > >>> >   // javascript */
    > > >>> >   ... }


    > My solution to the 'Remove trailing comments' exercise follows.
    > My reason in posing the exercise was to highlight that in the
    > best spirit of programming, one may use the browser's syntax
    > checking capabilities to do the heavy lifting, rather than
    > having to parse the entire code string manually.
    >
    > Reminder, I only want to remove the final comments at the end of
    > the code, and not at the end of each line.  In short, I want to
    > be able to get at the last code that actually "does something"
    > (or might be doing something).
    >
    > After getting rid of trailing whitespace and vacuous lines,
    > we consider that there exactly three situations.  The final
    > characters are either:
    > 1)  Part of a comment started by //
    > 2)  The end of a comment started by /*
    > 3)  Not a comment


    Slightly revised code:

    function stripEndComments(code) {
    // Trim trailing comments from code
    // First trim whitespace and vacuous statements
    code = code.replace(/[\s;]*\s*$/,"");

    // Next check for double slash type of comment at end
    if (checkSyntax(code + ' x y')) {
    var pos=code.lastIndexOf("//"),
    cS = checkSyntax(code.substr(0,pos+1) + ' x y');
    return stripEndComments(code.substr(0,pos+!!cS)); }

    // In this next case there are no more trailing comments
    if (code.substr(code.length-2)!="*/") return code;

    // Here deal with /* ... /* ... */ comments
    var c = code.substr(0,code.lastIndexOf("/*"));
    return stripEndComments(c.substr(0,c.length-2*!checkSyntax(c)));
    }


    What changed:
    code.substr(-2) => code.substr(code.length-2)
    since some IEs do not like a negative arguments to .substr()
     
    Csaba Gabor, Nov 5, 2009
    #17
  18. Csaba  Gabor

    SAM Guest

    Le 11/5/09 11:20 AM, Csaba Gabor a écrit :
    >
    > Very interesting. I've not seen that [^] construct in
    > javascript before. With a PHP regular expression if ] is
    > the first character following the ^ in a character class,
    > it means to exclude the right closing bracket ]. Evidently,
    > PHP's [^]] translates to [^\]] in JS


    The characters '(' and '[' have not to be antislashed
    when they are between [ ] or ( )
    alone the closers ']' ')' have to be

    Others characters that could have to be :
    o '-' except if it is at the all end
    (ie. [m-s-] : one character from m to s or sign -)
    o '+' except if it is at the beginning
    (ie. [+ms] : character m or s or +)

    >> And it's easy to add standard (not-single-line) comments as well:
    >> (\/\*(?:[^*]*\*+)*\/)

    >
    > Or: (\/\*.*?(?=\*\/)..)
    > though I have not extensively tested it


    All depends the way you code ...

    var reg = /(\/\*.*?(?=\*\/))/g;
    var reg = new RegExp('(/\\*.*?(?=\\*/))','g');

    <https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp>

    var myString = 'some blah /* comment ?!; comment-2 /|\ */ + no comment';
    myString = myString.replace(reg, '');
    alert(myString);

    But that Regexp doesn't work ...
    This one is a little better :
    var reg = new RegExp('(/\\*[^*]*\\*/)','g');

    alert(myString.replace(/(\/\*[^*]*\*\/)/g,''));
    or :
    alert(myString.replace(/\/\*[^*]*\*\//g,''));

    Of course, this RegExp doesn't work with :
    myString = 'some blah /* comment?!; comment-2* /|\ */ + no comment';
    where one '*' is introduced in the comment.

    alert(myString.replace(/\/\*([^*]|\*(?!\/))+\*\//g,''));
    OK (for "that" string !)


    > Reminder, I only want to remove the final comments at the end of
    > the code,


    $ : to tell it's the end

    > and not at the end of each line. In short, I want to
    > be able to get at the last code that actually "does something"
    > (or might be doing something).
    >
    > After getting rid of trailing whitespace and vacuous lines,
    > we consider that there exactly three situations. The final
    > characters are either:
    > 1) Part of a comment started by //
    > 2) The end of a comment started by /*
    > 3) Not a comment


    var reg = /[\/\s][\/*][^};]*(?![};])$/g;

    var strg = 'var f = function(){ foo(); /* comment */} //no se';
    alert(strg.replace(reg,''));

    var strg = 'var f = function(){ foo(); /* comment */} /*no se*/';
    alert(strg.replace(reg,''));

    both ==> var f = function(){ foo(); /* comment */}


    var strg = 'var f = function(){ foo(); // comment \n} /*no se*/';
    alert(strg.replace(reg,''));
    ==>
    var f = function(){ foo(); // comment
    }

    var strg = 'var f = function(){ foo(); // comment **\n} /*no se*/';
    alert(strg.replace(reg,''));
    ==>
    var f = function(){ foo(); // comment **
    }


    Not tested with IE ...


    can try your reg exps and your strings here:
    <http://www.regextester.com/>
    <http://www.google.com/search?q=tester+regex>
    <http://stephane.moriaux.pagesperso-orange.fr/truc/js_regexp_testeur>
    --
    sm
     
    SAM, Nov 5, 2009
    #18
  19. Lasse Reichstein Nielsen wrote:

    > Thomas 'PointedEars' Lahn <> writes:
    >> Csaba Gabor wrote:
    >>> abozhilov wrote:
    >>>> Csaba Gabor wrote:
    >>>> > Å¡ // remove trailing comments and whitespace from
    >>>> > Å¡ /* the end of code, which is presumed to be valid
    >>>> > Å¡ // javascript */
    >>>> > Å¡ ... }

    > ...
    >> How fortunate then that you don't know what you are talking about.
    >> It is rather easy to do if you do it properly. For example:
    >>
    >> code = code.replace(
    >> /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
    >> function(m, p1, p2, p3, p4) {
    >> return (p3 || p4) ? "" : m;
    >> });

    >
    > The ('(?:[^']|\\')*') part fails to recognize the end of the following
    > string literal:
    > 'foo \\'
    > and will match up to the next "'". Ditto for double-quoted strings.


    Not here (Iceweasel 3.5.4, JavaScript 1.8.1). Have you used "'foo \\'" or
    "'foo \\\\'" for the test? Because the latter is the representation of
    'foo \\' in a string value, while "'foo \\'" as a string value represents
    the syntactically invalid 'foo \' (which is why it must be matched up to the
    next apostrophe to be a string literal).

    /* 'foo \\' */
    var code = "'foo \\\\' '";

    /* ["'foo \\'", "'foo \\'"] */
    /('(?:[^']|\\')*')/.exec(code)

    If I am overlooking something, can you explain why the recognition of this
    string literal should fail?

    > [...]
    > And it's easy to add standard (not-single-line) comments as well:
    > (\/\*(?:[^*]*\*+)*\/)
    >
    > This only works in the absence of regexp literals.
    > RegExps are harder to recognize, because it's the syntactic starting
    > point that distinguishes the starting slash from a division.
    > E.g.,
    > /foo + 42/g
    > might be a RegExp, if occuring in an expression context, but not
    > if it occurs where an operator is expected:
    > bar/foo + 42/g
    > (I.e., it's not tokenizable without context information).
    >
    > And if you can't recognize regexps, you can mess up the recognition
    > of comments and strings as well.


    Thank you. I am working on an ECMAScript-compliant source code parser and
    you have given me quite something to think about.


    PointedEars
    --
    Danny Goodman's books are out of date and teach practices that are
    positively harmful for cross-browser scripting.
    -- Richard Cornford, cljs, <cife6q$253$1$> (2004)
     
    Thomas 'PointedEars' Lahn, Nov 5, 2009
    #19
  20. Thomas 'PointedEars' Lahn <> writes:

    > Lasse Reichstein Nielsen wrote:
    >> The ('(?:[^']|\\')*') part fails to recognize the end of the following
    >> string literal:
    >> 'foo \\'
    >> and will match up to the next "'". Ditto for double-quoted strings.

    >
    > Not here (Iceweasel 3.5.4, JavaScript 1.8.1). Have you used "'foo \\'" or
    > "'foo \\\\'" for the test? Because the latter is the representation of
    > 'foo \\' in a string value, while "'foo \\'" as a string value represents
    > the syntactically invalid 'foo \' (which is why it must be matched up to the
    > next apostrophe to be a string literal).


    (I'll write all strings as string literals from here, to (try to) avoid
    confusion).

    To be honest, I didn't test it, and the argument for why it didn't
    work was wrong because of that.
    It still doesn't work, but for the opposite reason of initial guess:
    it doesn't exclude "\\'" from ending the string literal, whereas I had
    guessed that it wouldn't correctly recognize "\\\\'" as ending it.

    Try:

    var code = "'abc\\'def'";
    // I.e., code contains two strings literals
    var re = /('(?:[^']|\\')*')/g;
    alert(re.exec(code)[0]);

    It alerts the string "'abc\\'", i.e., it does end at the first
    "'", even if the quote is escaped.

    The reason it does so is that [^'] matches backslash as well, and
    with a higher priority than what comes after, so it matches the
    backslash as well.

    The immediate fix of swapping the alternatives:
    var re = /('(?:\\'|[^'])*'/g;
    and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
    but will also ignore "\\\\'". It's necessary to know whether there is an
    even number of backslashes before the quote in order to know whether it's
    escaped or not. The RegExp below is the simplest one I have found to do that.

    > /* 'foo \\' */
    > var code = "'foo \\\\' '";
    >
    > /* ["'foo \\'", "'foo \\'"] */
    > /('(?:[^']|\\')*')/.exec(code)
    >
    > If I am overlooking something, can you explain why the recognition of this
    > string literal should fail?


    It works. It's the escaped backslash before a quote that fails:
    "'foo \\\\' + 'bar'" that fails

    ....
    > Thank you. I am working on an ECMAScript-compliant source code parser and
    > you have given me quite something to think about.


    Glad to be of service :)
    ECMAScript syntax is ... interesting. Context depending lexing combined
    with semicolon-insertion gives ample room to make mistakes :)

    var b=2,g=1;
    var a = 84
    /b/g; // <- it's division :)

    /L
    --
    Lasse Reichstein Holst Nielsen
    'Javascript frameworks is a disruptive technology'
     
    Lasse Reichstein Nielsen, Nov 6, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Donald Canton

    remove trailing whitespace from string

    Donald Canton, Feb 9, 2004, in forum: C++
    Replies:
    5
    Views:
    11,210
    Tilman Kuepper
    Feb 9, 2004
  2. pd
    Replies:
    3
    Views:
    462
    Jeff Higgins
    Dec 7, 2007
  3. Bob Smyph

    Remove only TRAILING whitespace

    Bob Smyph, Oct 14, 2008, in forum: Ruby
    Replies:
    4
    Views:
    159
    Bob Smyph
    Oct 14, 2008
  4. McKirahan

    RegExp for remove all trailing CrLf's?

    McKirahan, Jan 28, 2004, in forum: Javascript
    Replies:
    4
    Views:
    211
  5. Jame Pearl

    How can I remove trailing commas?

    Jame Pearl, Jan 30, 2006, in forum: Perl Misc
    Replies:
    21
    Views:
    1,221
Loading...

Share This Page