Enum enlightenment

R

Roedy Green

I wrote a simple enum-using class and decompiled it. Now all sorts of
things about enum make sense.

to understand this paste this into documents and view them side by
side in your IDE.


Here is the original code -- a enum to track the various flavours of
Windows:

package com.mindprod.htmlmacros;

import java.util.EnumSet;
import java.util.Set;

/**
* enum of possible Windows OSes. May be used freely for any purpose
but military.
* @author Roedy Green copyright 2005 Canadian Mind Products
*/
public enum WindowsOS {

WIN95( "W95", "Windows 95"),
WIN98( "W98", "Windows 98"),
WINME( "Me", "Windows Me"),
WINNT( "NT", "Windows NT" ),
WIN2K( "W2K", "Windows 2000" ),
WINXP( "XP", "Windows XP" ),
WIN2K3("W2K3","Windows 2003");

private String shortName;

private String longName;

private static boolean DEBUGGING = true;

/**
* Enum constant constructor that captures two extra facts about
the enum.
* @param short name for the os e.g. Me
* @param long name of the OS e.g. "Windows XP"
*/
WindowsOS ( String shortName, String longName )
{
this.shortName = shortName;
this.longName = longName;
}

/**
* @return short name
*/
public String getShortName ()
{
return this.shortName;
}

/**
* @return long name
*/
public String getLongName ()
{
return this.longName;
}

/**
* Static method to construct a string mentioning multiple OSes,
* by slashes.
* @param choices, EnumSet of just the oses you want included
* @return a String of the form "Windows 95/98/Me"
*/
public static String OSes( EnumSet<WindowsOS> choices )
{
StringBuilder sb = new StringBuilder( 40 );
for ( WindowsOS o : choices )
{
sb.append( '/' );
sb.append( o.shortName );
}
if ( sb.length() == 0 )
{
return "";
}
else
{
// chop lead / and prepend "windows "
return "Windows " + sb.toString().substring( 1 );
}
}

/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args )
{
if ( DEBUGGING )
{
// You don't use a constructor to create EnumSet objects.
EnumSet<WindowsOS> justThese = EnumSet.of( WIN2K, WINXP,
WINME );

// prints "Windows Me/W2K/XP"
// note they come out in proper order.
System.out.println( WindowsOS.OSes ( justThese ) );
}
}
}



^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Here is the decomiled code, showing what actually makes it into byte
code.


package com.mindprod.htmlmacros;

import java.io.PrintStream;
import java.util.EnumSet;
import java.util.Iterator;

public final class WindowsOS extends Enum
{

public static final WindowsOS[] values()
{
return (WindowsOS[])$VALUES.clone();
}

public static WindowsOS valueOf(String s)
{
return
(WindowsOS)Enum.valueOf(com/mindprod/htmlmacros/WindowsOS, s);
}

private WindowsOS(String s, int i, String s1, String s2)
{
super(s, i);
shortName = s1;
longName = s2;
}

public String getShortName()
{
return shortName;
}

public String getLongName()
{
return longName;
}

public static String OSes(EnumSet enumset)
{
StringBuilder stringbuilder = new StringBuilder(40);
WindowsOS windowsos;
for(Iterator iterator = enumset.iterator();
iterator.hasNext(); stringbuilder.append(windowsos.shortName))
{
windowsos = (WindowsOS)iterator.next();
stringbuilder.append('/');
}

if(stringbuilder.length() == 0)
return "";
else
return (new StringBuilder()).append("Windows
").append(stringbuilder.toString().substring(1)).toString();
}

public static void main(String args[])
{
if(DEBUGGING)
{
EnumSet enumset = EnumSet.of(WIN2K, WINXP, WINME);
System.out.println(OSes(enumset));
}
}

public static final WindowsOS WIN95;
public static final WindowsOS WIN98;
public static final WindowsOS WINME;
public static final WindowsOS WINNT;
public static final WindowsOS WIN2K;
public static final WindowsOS WINXP;
public static final WindowsOS WIN2K3;
private String shortName;
private String longName;
private static boolean DEBUGGING = true;
private static final WindowsOS $VALUES[];

static
{
WIN95 = new WindowsOS("WIN95", 0, "W95", "Windows 95");
WIN98 = new WindowsOS("WIN98", 1, "W98", "Windows 98");
WINME = new WindowsOS("WINME", 2, "Me", "Windows Me");
WINNT = new WindowsOS("WINNT", 3, "NT", "Windows NT");
WIN2K = new WindowsOS("WIN2K", 4, "W2K", "Windows 2000");
WINXP = new WindowsOS("WINXP", 5, "XP", "Windows XP");
WIN2K3 = new WindowsOS("WIN2K3", 6, "W2K3", "Windows 2003");
$VALUES = (new WindowsOS[] {
WIN95, WIN98, WINME, WINNT, WIN2K, WINXP, WIN2K3
});
}
}



Note how java generates you some methods in the same class!

It composes you a values and a valueOf method that does not need a
Class parameter.

it makes your constructor private.

I generates two extra secret fields to your constructor, the enum name
and the ordinal. This mean the enum constants don't have to count
themselves or register themselves. That is all done at compile time.

It creates static finals for each enum constant an the code to
initialise them using your constructors.

It creates a constant array of enum objects, one of each flavour
called $VALUES[] to use in the values method. IT can also be used by
the name method to convert

In this case no enum constant had any of its own fields or methods.

Note the true enum class is hard coded in all over the place. This is
no object-type erasure crap.

The $VALUE array could have been used by methods like
first, last, count, ordinalToEnum, but I have not found any trace of
such methods. You can't get at the $VALUES without patching byte code
since that is not a legal java identifier. So I guess every time you
wan that information you need to do a values() to clone the array just
to find out how long it is, or to index it in a read only way to
convert ordinal back to enum.

Note that the generic EnumSet handles all enums. There is no
corresponding customised code generated for the EnumSet.

It is not obvious from this code, but the bit masks used in EnumSet
computations are not built into the enum constants. They are generated
from the ordinal number as needed on the fly with shifting and
masking.

It is also not obvious from this code, but EnumSet.of figures out the
class of the enums by looking up the class of the first parameter.
There is NOT an EnumSet class generated for each Enum class.

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
R

Roedy Green

Here is what happens when you give your enum constants their own
private methods and variables:


enum
.....
WIN2K( "W2K", "Windows 2000" )
{
private int p;
int cost ()
{
return 200;
}
} ,
WINXP( "XP", "Windows XP" )
{
private int q;
int cost ()
{
return 300;
}
} ,
WIN2K3("W2K3","Windows 2003");
....

this generates:


WINNT = new WindowsOS("WINNT", 3, "NT", "Windows NT");
WIN2K = new WindowsOS("WIN2K", 4, "W2K", "Windows 2000") {

int cost()
{
return 200;
}

private int p;

};
WINXP = new WindowsOS("WINXP", 5, "XP", "Windows XP") {

int cost()
{
return 300;
}

private int q;

};
WIN2K3 = new WindowsOS("WIN2K3", 6, "W2K3", "Windows 2003");


in other words, each of those little enum constants becomes its own
little anyonyomous inner class.

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
M

Martijn Mulder

Roedy said:
I wrote a simple enum-using class and decompiled it. <snip>

Tell me Roedy, how do you decompile a .class file? javap gives me an overview of
the methods in the class, not the code within the methods. The switches I tried
(-c, -h, -l, -p, -s, -v) did not give me a 'machine formatted' version of my
..java files.
 
P

Paul

Roedy said:
The $VALUE array could have been used by methods like
first, last, count, ordinalToEnum, but I have not found any trace of
such methods. You can't get at the $VALUES without patching byte code
since that is not a legal java identifier. So I guess every time you
wan that information you need to do a values() to clone the array just
to find out how long it is, or to index it in a read only way to
convert ordinal back to enum.

In Java, you can use a dollar sign as part of a legal Java identifier. I
think the $VALUES array isn't created until some second pass after the
compiler has validated your actual .java file but before it translates
it into the implementation behind the idiom.

Maybe "legal java identifier" isn't what you meant, but that the symbol
is undefined.

public enum TestDollar
{
ONE, TWO;

private int $dollarvar;

public int get() { return $dollarvar; }
public void set(int d) { $dollarvar = d; }

public void voidfunc()
{
TestDollar[] vals = $VALUES;
// compiler says 'cannot find symbol' for $VALUES
}
}

--Paul
 
R

Roedy Green

Maybe "legal java identifier" isn't what you meant, but that the symbol
is undefined.

I scanned my text books and the web and could not get a definitive
answer on just what chars are allowed in identifiers:
1. in JVM byte code.
2. in java source.

I wanted not just to know what the current compiler lets you have, but
what the language standard guarantees.

I suppose it can be tested by experiment. is eacute ok? Chinese
characters? math symbols? the \u notation is pretty ugly. I'd need a
unicode text editor to do the proper experiments.

my personal rule has been to use nothing but A-Z a-z 0-9 and _ but
only the middle of constant names.

A similar question is just how long can an Identifier be? Natural
limits due to bit sizes for field lengths are 31, 255 and 32,767. I
suppose that could be an implementation detail.


--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
R

Roedy Green

Tell me Roedy, how do you decompile a .class file? javap gives me an overview of
the methods in the class, not the code within the methods. The switches I tried
(-c, -h, -l, -p, -s, -v) did not give me a 'machine formatted' version of my
.java files.

see http://mindprod.com/jgloss/decompiler.html
and http://mindprod.com/jgloss/disassembler.html

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
T

Tim Tyler

Roedy Green said:
A similar question is just how long can an Identifier be? Natural
limits due to bit sizes for field lengths are 31, 255 and 32,767. I
suppose that could be an implementation detail.

``The length of field and method names, field and method descriptors, and
other constant string values is limited to 65535 characters by the
16-bit unsigned length item of the CONSTANT_Utf8_info structure
(§4.4.7). Note that the limit is on the number of bytes in the encoding
and not on the number of encoded characters. UTF-8 encodes some
characters using two or three bytes. Thus, strings incorporating
multibyte characters are further constrained.''

- http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#88659
 
R

Raymond DeCampo

Roedy said:
I scanned my text books and the web and could not get a definitive
answer on just what chars are allowed in identifiers:
1. in JVM byte code.
2. in java source.

I wanted not just to know what the current compiler lets you have, but
what the language standard guarantees.

Well, did you try reading it?

http://java.sun.com/docs/books/jls/second_edition/html/j.title.doc.html
I suppose it can be tested by experiment. is eacute ok? Chinese
characters? math symbols? the \u notation is pretty ugly. I'd need a
unicode text editor to do the proper experiments.

my personal rule has been to use nothing but A-Z a-z 0-9 and _ but
only the middle of constant names.

A similar question is just how long can an Identifier be? Natural
limits due to bit sizes for field lengths are 31, 255 and 32,767. I
suppose that could be an implementation detail.

HTH,
Ray
 
R

Roedy Green


The relevant section is:

http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#40625

That's Patricia Shanahan's job. I detest reading such lawyerly
documents that try their hardest to hide the plain meaning.

A straight forward reading of the standard would say you CAN put - in
your identifier names, but I know you can't.

The example he gives of a Legal identifier violates the first java
letter rule.

Perhaps a lawyer can make sense of what they are trying say. For
mortals a list of acceptable and unacceptable identifier with reason
says for than pages of BNF or explanation.

If the standard was literally true Java foolishly refused to reserve
even the Unicode mathematical operators for future use.

--
Bush crime family lost/embezzled $3 trillion from Pentagon.
Complicit Bush-friendly media keeps mum. Rumsfeld confesses on video.
http://www.infowars.com/articles/us/mckinney_grills_rumsfeld.htm

Canadian Mind Products, Roedy Green.
See http://mindprod.com/iraq.html photos of Bush's war crimes
 
R

Raymond DeCampo

Roedy said:
The relevant section is:

http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#40625

That's Patricia Shanahan's job. I detest reading such lawyerly
documents that try their hardest to hide the plain meaning.

A straight forward reading of the standard would say you CAN put - in
your identifier names, but I know you can't.

I don't see where you would get this from the standard.
The example he gives of a Legal identifier violates the first java
letter rule.

I don't know which example you mean. They all seem fine to me.
Perhaps a lawyer can make sense of what they are trying say. For
mortals a list of acceptable and unacceptable identifier with reason
says for than pages of BNF or explanation.

If the standard was literally true Java foolishly refused to reserve
even the Unicode mathematical operators for future use.

I don't know where you are reading that into it.

Actually, after posting the link, I went in and read the above section
on my own. I was pretty disappointed that the real "specification" for
what characters may be included was punted on by saying it depends on
the results of java.lang.Character.isJavaIdentifierStart() and
java.lang.Character.isJavaIdentifierPart().

Delving into the documentation led me on a relatively uninteresting
excursion into Unicode land.

Ray
 
D

Dale King

Raymond said:
Roedy Green wrote:



I don't know where you are reading that into it.

Actually, after posting the link, I went in and read the above section
on my own. I was pretty disappointed that the real "specification" for
what characters may be included was punted on by saying it depends on
the results of java.lang.Character.isJavaIdentifierStart() and
java.lang.Character.isJavaIdentifierPart().

Delving into the documentation led me on a relatively uninteresting
excursion into Unicode land.

I think the reason they don't give you the definitive list is that list
is not necessarily static. As characters get added to Unicode they can
get added to the list of acceptable letters for Java identifiers. They
don't want to update the language spec. as Unicode support expands in Java.

How would they specify it anyway? It would take pages to list ll the
characters.

The rules are pretty broad. Almost any thing that is a letter or digit
in Unicode is acceptable.

The one area that Sun fails in this regard is the support for encodings
to actually use this full Unicode set. They don't support the use of
byte order marks at the start of a Java source file to indicate UTF-8,
UTF-16BE, UTF-32, etc. Even Windows notepad supports that, but not Sun.
All they give you is the -encoding option which is not good enough.
 
D

Dale King

Tim said:
``The length of field and method names, field and method descriptors, and
other constant string values is limited to 65535 characters by the
16-bit unsigned length item of the CONSTANT_Utf8_info structure
(§4.4.7). Note that the limit is on the number of bytes in the encoding
and not on the number of encoded characters. UTF-8 encodes some
characters using two or three bytes. Thus, strings incorporating
multibyte characters are further constrained.''

- http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html#88659

Early on, 1.5 was supposed to include support for removing some of the
class file size limitations (particularly only 64K for a method body),
but somehow it didn't make the final cut.

It's still being worked on under JSR202.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top