does the following code function as expected?

A

Aryeh M. Friedman

Assuming standard Java naming conventions does the following code (in
the general case) do the following:

1. List all imported packages and/or explicitilly named dotted classes
2. List all simple class names
3. Not list keywords, literals, instance names
4. For any thing matching items 1 and 2 list them only once in
outpurimport java.io.*;
import java.util.*;

public class Main
{
public static void main(String[] args)
throws Throwable
{
FileReader rd = new FileReader("Main.java");
StreamTokenizer st = new StreamTokenizer(rd);
boolean endImports=false;
Set<String> out=new HashSet<String>();

int token = st.nextToken();
while (token != StreamTokenizer.TT_EOF)
{
token = st.nextToken();

if(token==StreamTokenizer.TT_WORD)

// this is completely naive but it is good
enough for Aryeh's style

if((Character.isUpperCase(st.sval.charAt(0))&&st.sval.indexOf(".")==-1)
||(!
Character.isUpperCase(st.sval.charAt(0))&&st.sval.indexOf(".")!=-1&&!
endImports)){

if(Character.isUpperCase(st.sval.charAt(0))&&!endImports) {
endImports=true;
continue;
}

if(!st.sval.matches("[A-Z]*"))
out.add(st.sval);
}
}

rd.close();

System.out.println(out);
}
}

Note on final applications: I want to write a tool that will
determine from source only what classes the current source file depend
on. After a little more processing the final output is a DAG
representing the order stuff would need to be compiled in for a non-
JIT compiler.
 
L

Lew

Aryeh said:
Assuming standard Java naming conventions does the following code (in
the general case) do the following:

Did you try it? What test cases did you try?
1. List all imported packages and/or explicitilly named dotted classes

No. It'll also fail on static imports.
2. List all simple class names

Do you mean in the source? It'll fail on constants from a class, e.g.,
char cb = Util.SOME_CONSTANT;
Assuming there is no other use of Util than to bring in constants.
3. Not list keywords, literals, instance names

No. For example, it'll miss (i.e., list) StreamTokenizer.TT_EOF.
4. For any thing matching items 1 and 2 list them only once in
outpurimport java.io.*;

Are you asking if Set only holds one of each item, as determined by equals()?
Note on final applications: I want to write a tool that will
determine from source only what classes the current source file depend
on. After a little more processing the final output is a DAG
representing the order stuff would need to be compiled in for a non-
JIT compiler.

This approach will not work. It'll miss any occurrence of idioms such as
Class.forName( "foo.bar.jdbc.YourDriver" ), and its parsing is far too simple
to handle the cases it does attempt. Furthermore, it doesn't follow the
dependency chain past the direct references in the source.

Bytecode analysis will be more reliable than source analysis. (For one thing,
there are no imports in bytecode.) Even there, the dynamic nature of Java
makes this not a perfectly solvable problem in general, although workable
subsets of the problem can be approached.
 
L

Lasse Reichstein Nielsen

Aryeh M. Friedman said:
Assuming standard Java naming conventions does the following code (in
the general case) do the following:

I admit I haven't checked.
A good approach would be to make some test cases and see that it
does what you expect.
Try to challenge it, e.g., by putting class like text in String
literals and comments.
Note on final applications: I want to write a tool that will
determine from source only what classes the current source file depend
on. After a little more processing the final output is a DAG
representing the order stuff would need to be compiled in for a non-
JIT compiler.

I would go for a proper parser, not just some simple tokenizing.
I bet there is someone out there who has made a grammar for Java
that goes with one of the free parser generators.

/L
 
J

Joshua Cranmer

Aryeh said:
Assuming standard Java naming conventions does the following code (in
the general case) do the following:

1. List all imported packages and/or explicitilly named dotted classes
2. List all simple class names
3. Not list keywords, literals, instance names
4. For any thing matching items 1 and 2 list them only once in
output

Without even looking at your source code, I can tell you that the answer
is almost definitely no (it happens to still be no even after looking at
the source code).

JLS 3 clearly states that the Unicode-escape processing happens /before/
any other processing, and therefore this processing is needed to handle
any application precisely (however, having written source-level
analyzers myself, I can say that this requirement is mostly esoteric).
FileReader rd = new FileReader("Main.java");
StreamTokenizer st = new StreamTokenizer(rd);

The needed tokenizer is much more complex than StreamTokenizer. The JLS
provides an explicit description on the entire tokenization process, so
custom-writing a tokenizer is not terribly difficult (modulo Unicode
escapes).
boolean endImports=false;
Set<String> out=new HashSet<String>();

Yes, a Set is probably sufficient.

[ cut parsing method ]

Glaring errors:
1. Your code does not appear to take into account names embedded in strings.
2. Ditto for names embedded in comments.
3. Proper resolution of types of identifiers can only be properly done
through semantic analysis of the various expressions. Class names crop
up in a surprising number of places in the Java grammar.
4. I feel that your method for determining the end of imports is
incorrect. If the first statement does not begin with the keyword
`package', then there are no imports; otherwise, the first statement not
beginning with `import' excluding the optional `package' statement is
the end of imports.

In short, this can only be done with real lexers and parsers that are
more tolerant of valid input Java programs.
Note on final applications: I want to write a tool that will
determine from source only what classes the current source file depend
on. After a little more processing the final output is a DAG
representing the order stuff would need to be compiled in for a non-
JIT compiler.

I am willing to bet that there are open-source Java dependency analysis
programs already existing that you could use. It is also likely that
your problem state is not sufficient for this task: you need to generate
the fully-qualified class name for each used class. Bytecode analysis is
much easier at handling this, but it requires the compiled code to work
with.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top