'\u000a' and '\u000d'

Discussion in 'Java' started by dimakura, Feb 19, 2007.

  1. dimakura

    dimakura Guest

    i found in the web-search why i can not use

    ////////////////////

    char c = '\u000a'

    ////////////////////

    but i can not find why i can not use

    ////////////////////

    // char c = '\u000a'

    ////////////////////

    is it because '\u000a' is equivivalent to \n and this type is comment
    is a single-line?

    thanks,
    dimitri
    dimakura, Feb 19, 2007
    #1
    1. Advertising

  2. dimakura

    Oliver Wong Guest

    "dimakura" <> wrote in message
    news:...
    >i found in the web-search why i can not use
    >
    > ////////////////////
    >
    > char c = '\u000a'
    >
    > ////////////////////
    >
    > but i can not find why i can not use
    >
    > ////////////////////
    >
    > // char c = '\u000a'
    >
    > ////////////////////
    >
    > is it because '\u000a' is equivivalent to \n and this type is comment
    > is a single-line?


    The process of converting unicode escape sequences to characters happens
    somewhere between reading the source file, and then parsing source file for
    compilation.

    So javac will read in the file, and thus get:

    ////////////////////
    // char c = '\u000a'
    ////////////////////

    Then it will convert unicode escape sequences to their equivalent
    characters and get:

    ////////////////////
    // char c = '
    '
    ////////////////////

    And then it will try to compile this code, and it'll fail with some sort
    of error like "Not expecting apostrophe here".

    - Oliver
    Oliver Wong, Feb 19, 2007
    #2
    1. Advertising

  3. On 19 Feb 2007 06:33:32 -0800, dimakura wrote:
    > is it because '\u000a' is equivivalent to \n and this type is
    > comment is a single-line?


    Yes.

    Read section 3.2 of the JLS, which describes translation of the input
    to the compiler. The unicode escape sequences are translated into
    their corresponding unicode characters, *then* the resulting sequence
    of characters is tokenized.

    So when you escape a line feed as you've done, you are essentially
    writing this (illegal) code:


    char c = '
    '

    i.e. the closing quote ends up on the following line.

    Similarly, commenting the line results in this invalid sequence:

    // char c = '
    '

    /gordon

    --
    [ don't email me support questions or followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
    Gordon Beaton, Feb 19, 2007
    #3
  4. dimakura

    Chris Uppal Guest

    dimakura wrote:

    > but i can not find why i can not use
    > // char c = '\u000a'
    > is it because '\u000a' is equivivalent to \n and this type is comment
    > is a single-line?


    Yes, exactly right.

    -- chris
    Chris Uppal, Feb 19, 2007
    #4
  5. Chris Uppal <-THIS.org> wrote:
    > dimakura wrote:
    >> but i can not find why i can not use
    >> // char c = '\u000a'
    >> is it because '\u000a' is equivivalent to \n and this type is comment
    >> is a single-line?

    > Yes, exactly right.


    And to test yourself, whether you've really understood,
    predict what the compiler will say to that:

    // char c = '\u000a//'

    // :)
    Andreas Leitgeb, Feb 19, 2007
    #5
  6. dimakura

    dimakura Guest

    On Feb 19, 7:10 pm, Andreas Leitgeb <>
    wrote:
    > Chris Uppal <-THIS.org> wrote:
    > > dimakura wrote:
    > >> but i can not find why i can not use
    > >> // char c = '\u000a'
    > >> is it because '\u000a' is equivivalent to \n and this type is comment
    > >> is a single-line?

    > > Yes, exactly right.

    >
    > And to test yourself, whether you've really understood,
    > predict what the compiler will say to that:
    >
    > // char c = '\u000a//'
    >
    > // :)



    yes, i understand: new line begin with comment!
    ok.

    just to test myself:

    it is not an error:

    // \u000a

    but error is

    // \u000a something_else

    where "something_else" is not spaces or something placed in correct
    Java-style comment
    dimakura, Feb 20, 2007
    #6
  7. dimakura wrote:
    > On Feb 19, 7:10 pm, Andreas Leitgeb <>
    > wrote:
    >> Chris Uppal <-THIS.org> wrote:
    >>> dimakura wrote:
    >>>> but i can not find why i can not use
    >>>> // char c = '\u000a'
    >>>> is it because '\u000a' is equivivalent to \n and this type is comment
    >>>> is a single-line?
    >>> Yes, exactly right.

    >> And to test yourself, whether you've really understood,
    >> predict what the compiler will say to that:
    >>
    >> // char c = '\u000a//'
    >>
    >> // :)

    >
    >
    > yes, i understand: new line begin with comment!
    > ok.
    >
    > just to test myself:
    >
    > it is not an error:
    >
    > // \u000a
    >
    > but error is
    >
    > // \u000a something_else
    >
    > where "something_else" is not spaces or something placed in correct
    > Java-style comment
    >


    It is not an error. It is two lines of code, and something_else is on
    the second line, not part of the one line comment. In the following
    valid program, ("Hello, world"); is neither spaces nor a Java-style comment.

    public class HelloWorld{
    public static void main(String[] args){
    System.out.println // \u000a ("Hello, world");
    }
    }

    Patricia
    Patricia Shanahan, Feb 20, 2007
    #7
  8. On 20 Feb 2007 01:22:55 -0800, dimakura wrote:
    > but error is
    >
    > // \u000a something_else
    >
    > where "something_else" is not spaces or something placed in correct
    > Java-style comment


    Not just comments and whitespace. It's valid if something_else is
    anything that can appear at the start of a line, including statements
    or declarations, etc, in the context of the most recent non-comment
    before this line, e.g.:

    public class
    // \u000a Foo {
    }

    /gordon

    --
    [ don't email me support questions or followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
    Gordon Beaton, Feb 20, 2007
    #8
  9. Gordon Beaton wrote:
    >
    > Read section 3.2 of the JLS, which describes translation of the input
    > to the compiler. The unicode escape sequences are translated into
    > their corresponding unicode characters, *then* the resulting sequence
    > of characters is tokenized.
    >
    > So when you escape a line feed as you've done, you are essentially
    > writing this (illegal) code:
    >
    >
    > char c = '
    > '
    >
    > i.e. the closing quote ends up on the following line.
    >
    > Similarly, commenting the line results in this invalid sequence:
    >
    > // char c = '
    > '
    >
    > /gordon
    >


    Gordon:

    char c = \u0027\u002a\u0027\u003b

    Do you know why they would process the unicode prior to determining if
    it was part of a comment or literal first? It does provide for some
    great obfuscation. I'm really glad it wasn't me that ran across this, I
    could have spent days trying to figure this one out :).

    --

    Knute Johnson
    email s/nospam/knute/
    Knute Johnson, Feb 20, 2007
    #9
  10. dimakura

    Chris Uppal Guest

    Knute Johnson wrote:

    > char c = \u0027\u002a\u0027\u003b
    >
    > Do you know why they would process the unicode prior to determining if
    > it was part of a comment or literal first?


    I presume the idea is to allow the use of Unicode characters in identifiers and
    comments without making the source completely inaccessible to people using
    non-Unicode editors. Also to allow for the case where the source has to be
    manipulated by non-Unicode programs (source code control, and so on).

    -- chris
    Chris Uppal, Feb 20, 2007
    #10
  11. Chris Uppal wrote:
    > Knute Johnson wrote:
    >
    >> char c = \u0027\u002a\u0027\u003b
    >>
    >> Do you know why they would process the unicode prior to determining if
    >> it was part of a comment or literal first?

    >
    > I presume the idea is to allow the use of Unicode characters in identifiers and
    > comments without making the source completely inaccessible to people using
    > non-Unicode editors. Also to allow for the case where the source has to be
    > manipulated by non-Unicode programs (source code control, and so on).
    >
    > -- chris
    >


    I guess you have to make the rule one way or the other and this is the
    way. It does make for some really interesting traps though.

    --

    Knute Johnson
    email s/nospam/knute/
    Knute Johnson, Feb 21, 2007
    #11
  12. dimakura

    dimakura Guest

    On Feb 20, 9:44 am, Gordon Beaton <> wrote:
    > On 20 Feb 2007 01:22:55 -0800, dimakura wrote:
    >
    > > but error is

    >
    > > // \u000a something_else

    >
    > > where "something_else" is not spaces or something placed in correct
    > > Java-style comment

    >
    > Not just comments and whitespace. It's valid if something_else is
    > anything that can appear at the start of a line, including statements
    > or declarations, etc, in the context of the most recent non-comment
    > before this line, e.g.:
    >
    > public class
    > // \u000a Foo {
    > }
    >
    > /gordon
    >
    > --
    > [ don't email me support questions or followups ]
    > g o r d o n + n e w s @ b a l d e r 1 3 . s e



    i agree, my formulation was not too precise.
    thanks.
    dimakura, Feb 21, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken Beesley

    unicodedata name for \u000a

    Ken Beesley, Aug 21, 2004, in forum: Python
    Replies:
    7
    Views:
    7,257
    Peter Otten
    Aug 22, 2004
  2. Ken Beesley

    Re: unicodedata name for \u000a

    Ken Beesley, Aug 22, 2004, in forum: Python
    Replies:
    1
    Views:
    433
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 22, 2004
  3. Ken Beesley

    Re: unicode name for \u000a

    Ken Beesley, Aug 22, 2004, in forum: Python
    Replies:
    0
    Views:
    440
    Ken Beesley
    Aug 22, 2004
  4. titi

    if and and vs if and,and

    titi, Mar 9, 2007, in forum: VHDL
    Replies:
    4
    Views:
    569
    Mike Treseler
    Mar 11, 2007
  5. Francis Girard

    Interpreting string containing \u000a

    Francis Girard, Jun 18, 2008, in forum: Python
    Replies:
    1
    Views:
    709
    Peter Otten
    Jun 18, 2008
Loading...

Share This Page