'\u000a' and '\u000d'

dimakura · Feb 19, 2007

i found in the web-search why i can not use

////////////////////

char c = '\u000a'

////////////////////

but i can not find why i can not use

////////////////////

// char c = '\u000a'

////////////////////

is it because '\u000a' is equivivalent to \n and this type is comment
is a single-line?

thanks,
dimitri

Oliver Wong · Feb 19, 2007

dimakura said:
i found in the web-search why i can not use

////////////////////

char c = '\u000a'

////////////////////

but i can not find why i can not use

////////////////////

// char c = '\u000a'

////////////////////

is it because '\u000a' is equivivalent to \n and this type is comment
is a single-line?

The process of converting unicode escape sequences to characters happens
somewhere between reading the source file, and then parsing source file for
compilation.

So javac will read in the file, and thus get:

////////////////////
// char c = '\u000a'
////////////////////

Then it will convert unicode escape sequences to their equivalent
characters and get:

////////////////////
// char c = '
'
////////////////////

And then it will try to compile this code, and it'll fail with some sort
of error like "Not expecting apostrophe here".

- Oliver

Gordon Beaton · Feb 19, 2007

is it because '\u000a' is equivivalent to \n and this type is
comment is a single-line?

Yes.

Read section 3.2 of the JLS, which describes translation of the input
to the compiler. The unicode escape sequences are translated into
their corresponding unicode characters, *then* the resulting sequence
of characters is tokenized.

So when you escape a line feed as you've done, you are essentially
writing this (illegal) code:

char c = '
'

i.e. the closing quote ends up on the following line.

Similarly, commenting the line results in this invalid sequence:

// char c = '
'

/gordon

Chris Uppal · Feb 19, 2007

dimakura said:
but i can not find why i can not use
// char c = '\u000a'
is it because '\u000a' is equivivalent to \n and this type is comment
is a single-line?

Yes, exactly right.

-- chris

Andreas Leitgeb · Feb 19, 2007

Chris Uppal said:
Yes, exactly right.

And to test yourself, whether you've really understood,
predict what the compiler will say to that:

// char c = '\u000a//'

//

dimakura · Feb 20, 2007

And to test yourself, whether you've really understood,
predict what the compiler will say to that:

// char c = '\u000a//'

//

yes, i understand: new line begin with comment!
ok.

just to test myself:

it is not an error:

// \u000a

but error is

// \u000a something_else

where "something_else" is not spaces or something placed in correct
Java-style comment

Patricia Shanahan · Feb 20, 2007

dimakura said:
yes, i understand: new line begin with comment!
ok.

just to test myself:

it is not an error:

// \u000a

but error is

// \u000a something_else

where "something_else" is not spaces or something placed in correct
Java-style comment

It is not an error. It is two lines of code, and something_else is on
the second line, not part of the one line comment. In the following
valid program, ("Hello, world"); is neither spaces nor a Java-style comment.

public class HelloWorld{
public static void main(String[] args){
System.out.println // \u000a ("Hello, world");
}
}

Patricia

Gordon Beaton · Feb 20, 2007

but error is

// \u000a something_else

where "something_else" is not spaces or something placed in correct
Java-style comment

Not just comments and whitespace. It's valid if something_else is
anything that can appear at the start of a line, including statements
or declarations, etc, in the context of the most recent non-comment
before this line, e.g.:

public class
// \u000a Foo {
}

/gordon

Knute Johnson · Feb 20, 2007

Gordon said:
Read section 3.2 of the JLS, which describes translation of the input
to the compiler. The unicode escape sequences are translated into
their corresponding unicode characters, *then* the resulting sequence
of characters is tokenized.

So when you escape a line feed as you've done, you are essentially
writing this (illegal) code:

char c = '
'

i.e. the closing quote ends up on the following line.

Similarly, commenting the line results in this invalid sequence:

// char c = '
'

/gordon

Gordon:

char c = \u0027\u002a\u0027\u003b

Do you know why they would process the unicode prior to determining if
it was part of a comment or literal first? It does provide for some
great obfuscation. I'm really glad it wasn't me that ran across this, I
could have spent days trying to figure this one out

.

Chris Uppal · Feb 20, 2007

Knute said:
char c = \u0027\u002a\u0027\u003b

Do you know why they would process the unicode prior to determining if
it was part of a comment or literal first?

I presume the idea is to allow the use of Unicode characters in identifiers and
comments without making the source completely inaccessible to people using
non-Unicode editors. Also to allow for the case where the source has to be
manipulated by non-Unicode programs (source code control, and so on).

-- chris

Knute Johnson · Feb 21, 2007

Chris said:
I presume the idea is to allow the use of Unicode characters in identifiers and
comments without making the source completely inaccessible to people using
non-Unicode editors. Also to allow for the case where the source has to be
manipulated by non-Unicode programs (source code control, and so on).

-- chris

I guess you have to make the rule one way or the other and this is the
way. It does make for some really interesting traps though.

dimakura · Feb 21, 2007

Not just comments and whitespace. It's valid if something_else is
anything that can appear at the start of a line, including statements
or declarations, etc, in the context of the most recent non-comment
before this line, e.g.:

public class
// \u000a Foo {
}

/gordon

i agree, my formulation was not too precise.
thanks.

encoding javascript	1	Nov 21, 2007
Difference between two encoding methods	0	Jul 11, 2006
What is better encoding method?	9	Jul 12, 2006
unicodedata name for \u000a	7	Aug 21, 2004
Linux: using "clone3" and "waitid"	0	Oct 17, 2023
Pyautogui, cv2 and cannot find image	0	Feb 7, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Command Line Arguments	0	Mar 7, 2023

'\u000a' and '\u000d'

dimakura

Oliver Wong

Gordon Beaton

Chris Uppal

Andreas Leitgeb

dimakura

Patricia Shanahan

Gordon Beaton

Knute Johnson

Chris Uppal

Knute Johnson

dimakura

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads