question on java lang spec chapter 3.3 (unicode char lexing)

  • Thread starter Aryeh M. Friedman
  • Start date
A

Aryeh M. Friedman

It can be done.



Obviously it can also be made not to work.



Maybe you should master a Java IDE before writing an OS in Java.

an other requirement not satisfied by any IDE we have found is the ability to lay the source tree out in such a way that it can be compiled without the IDE (a requirement for almost all our projects because none of our clients have IDE's and in almost all cases there are minor changes needed to makethe code happy on their site that make testing impossible on the development machine)
 
A

Arne Vajhøj

an other requirement not satisfied by any IDE we have found is the
ability to lay the source tree out in such a way that it can be
compiled without the IDE (a requirement for almost all our projects
because none of our clients have IDE's and in almost all cases there
are minor changes needed to make the code happy on their site that
make testing impossible on the development machine)

The Java IDE's I know put code in a structure that fits
java tools, ant and maven.

Arne
 
A

Aryeh M. Friedman

The Java IDE's I know put code in a structure that fits

java tools, ant and maven.



Arne

And in almost any non-trivial case this is completely incorrect... even though I love Java as a lang I have a serious issue with some of the attitudes/assumptions made by tools... namely the universe does not revolve around the JVM
 
A

Arne Vajhøj

And in almost any non-trivial case this is completely incorrect...

Given that a big part (my estimate: 80-90%!) of all Java applications
are build:
- developer use IDE and checkin to VCS
- build process checkout from VCS and use ant/maven to build
then it has to be correct.
even though I love Java as a lang I have a serious issue with some of
the attitudes/assumptions made by tools... namely the universe does
not revolve around the JVM

I find it natural that tools developed for Java development are the
best for Java development and tools developed for C development are
the best for C development and ... PHP ... Python ... etc..

Arne
 
A

Aryeh M. Friedman

Given that a big part (my estimate: 80-90%!) of all Java applications

are build:

- developer use IDE and checkin to VCS

- build process checkout from VCS and use ant/maven to build

then it has to be correct.

Correct in what sense? Passing it's own tests? If that is the case aegisis the *ONLY* VCS that actually requires this before a checkin. The idea there is the baseline (repo in most other VCS's jargon) is guernteed to be working (as defined above)). Namely every modification is *NEW* [see note]atomic in regards to new functionality and *MUST* be accompanied by automated tests (it is possible to turn this off but for obvious reasons not recommended unless the change is essencially untestable like documentation updates).

I find it natural that tools developed for Java development are the

best for Java development and tools developed for C development are

the best for C development and ... PHP ... Python ... etc..

Most real world projects (unless they a part of a larger effort) have several components/languages (for us for example it is typical to have a HTML/CSS/JS component and a Java/"JSP" component [I am defining "JSP" a little loosely because we often need to support more then just web front-ends]... it is also common for us to have some native code accessed via a JNLP wrapper)....

Note:

There is a slight mismatch between aegis's requirements in this reguard andhow xUnit like frameworks work. We typically solve this by reusing the same test script but requiring that the total number of pass's needs to be at least one larger then the previous change.
 
A

Arne Vajhøj

...which, being lazy, I would not do from scratch.

Instead, I'd use the Java version of the Coco/R package, which generates
the lexer and parser as Java source within a framework. Unlike some
similar tools, you're almost encouraged to rewrite the framework to suit
your requirements. This is quite short and written in standard Java, so
modifying it is very easy.

Good point.

Arne
 
A

Aryeh M. Friedman

Good point.



Arne

The only issue is likely a philosophical one in that I have *NEVER* trustedcode generators of any kind they either produce impossible to follow/debugcode or have all kinds of fluff in them (the classic example in my mind [html which is not really a programming lang ;-)] is Dreamweaver that produces 75 lines of HTML for "hello, world").
 
A

Aryeh M. Friedman

Aryeh M. Friedman wrote:







So you don't care for compilers ?



;-)



-- chris



P.S. Seriously: the point of classic compiler generators (or

"compiler-compilers" as they were often called) are to produce code that works

and that runs fast in little space. It is not /AT ALL/ a design principle that

the code should be comprehensible to humans -- in fact for the kinds of

algorithms they use, there is no way the resulting code and tables could be

remotely comprehensible (to an ordinary programmer), that is /why/ we usecode

generators.

Machine code was never meant to be readable but high level languages can and should be ;-).... on the serious side of the debate there are reasons forshying away from code generators in my case that are currently proprietary(some of the lesser results will likely be FOSS'ed though)... the main reason is we need to (in some cases) deal with multiple languages in the same compilation unit and have developed fairly good (at least in theory and my "fun work" is really nothing more then a proof of concept, without the pressure of deadlines and such, with Java as a typical non-trivial language to work with from the compiler POV)... due to the above using a parse generator would make it very inefficient to create the needed parsers since they are (by there very nature) very non-OO in how they deal with more then one grammar at once... namely they are designed to deal with single languages at a time and not "families" of them
 
A

Aryeh M. Friedman

Patricia Shanahan wrote:









I'm not so sure about that. IIRC the rules about interpretting Unicode escapes

have some seriously wierd convolutions. Something to do with protecting against

multiply-encoded files, I think. It badly fails the Principle of Least WTF.



It's in the spec, but I'm too lazy to go find the exact reference :-(



-- chruis

agreed for example the following is just ugly but perfectly valid Java code:

Foo.java:
\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A

% javac Foo.java
% java Foo
hello, world
 
A

Aryeh M. Friedman

agreed for example the following is just ugly but perfectly valid Java code:



Foo.java:

\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A



% javac Foo.java

% java Foo

hello, world

Just a quick note I did end up implementing unicode escapes the way JLSv3 says to and the above is one our test inputs...
 
A

Arne Vajhøj

The only issue is likely a philosophical one in that I have *NEVER*
trusted code generators of any kind they either produce impossible to
follow/debug code or have all kinds of fluff in them (the classic
example in my mind [html which is not really a programming lang ;-)]
is Dreamweaver that produces 75 lines of HTML for "hello, world").

Sounds like NIH.

The generated code may be hard to follow, but will be more
well tested.

Arne
 
A

Arne Vajhøj

Machine code was never meant to be readable but high level languages
can and should be ;-).... on the serious side of the debate there are
reasons for shying away from code generators in my case that are
currently proprietary (some of the lesser results will likely be
FOSS'ed though)... the main reason is we need to (in some cases) deal
with multiple languages in the same compilation unit and have
developed fairly good (at least in theory and my "fun work" is really
nothing more then a proof of concept, without the pressure of
deadlines and such, with Java as a typical non-trivial language to
work with from the compiler POV)... due to the above using a parse
generator would make it very inefficient to create the needed parsers
since they are (by there very nature) very non-OO in how they deal
with more then one grammar at once... namely they are designed to
deal with single languages at a time and not "families" of them

????

You have:

1 handwritten lexer + 1 handwritten parser vs 1 generated lexer + 1
generated parser

and:

N handwritten lexers + N handwritten parsers vs N generated lexers + N
generated parsers

If it is cheaper to generate for 1 then I would expect it to be cheaper
to generate for N as well.

That the generated lexers and parsers may be more procedural than
object oriented should not be a show stopper.

Common languages like C++ and Java can fine call different
functions from different classes.

Arne
 
A

Arne Vajhøj

Correct in what sense?

Same sense as you used incorrect!
I find it natural that tools developed for Java development are the
best for Java development and tools developed for C development are
the best for C development and ... PHP ... Python ... etc..

Most real world projects (unless they a part of a larger effort) have
several components/languages (for us for example it is typical to
have a HTML/CSS/JS component and a Java/"JSP" component [I am
defining "JSP" a little loosely because we often need to support more
then just web front-ends]... it is also common for us to have some
native code accessed via a JNLP wrapper)...

(JNI wrapper??)

Eclipse and NetBeans can support all those languages.

But if you have sufficient much work in each language then
a different IDE for the HTML/CSS/JS and another for the
C/C++ could make sense.

You may want to use ant for the Java stuff and make for the
C/C++ stuff.

But ant can call make and make can call ant, so they can be integrated.

Arne
 
A

Arne Vajhøj

Foo.java:
\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u000A\u007B\u000A\u0009\u0070\u0075\u0062\u006C\u0069\u0063\u0020\u0073\u0074\u0061\u0074\u0069\u0063\u0020\u0076\u006F\u0069\u0064\u0020\u006D\u0061\u0069\u006E\u0028\u0053\u0074\u0072\u0069\u006E\u0067\u005B\u005D\u0020\u0061\u0072\u0067\u0073\u0029\u000A\u0009\u007B\u000A\u0009\u0009\u0053\u0079\u0073\u0074\u0065\u006D\u002E\u006F\u0075\u0074\u002E\u0070\u0072\u0069\u006E\u0074\u006C\u006E\u0028\u0022\u0068\u0065\u006C\u006C\u006F\u002C\u0020\u0077\u006F\u0072\u006C\u0064\u0022\u0029\u003B\u000A\u0009\u007D\u000A\u007D\u000A

% javac Foo.java
% java Foo
hello, world

:)

It is one of those features that can certainly be misused.

Arne
 
A

Aryeh M. Friedman

????



You have:



1 handwritten lexer + 1 handwritten parser vs 1 generated lexer + 1

generated parser



and:



N handwritten lexers + N handwritten parsers vs N generated lexers + N

generated parsers



If it is cheaper to generate for 1 then I would expect it to be cheaper

to generate for N as well.



That the generated lexers and parsers may be more procedural than

object oriented should not be a show stopper.



Common languages like C++ and Java can fine call different

functions from different classes.



Arne

Don't forget domain specific langs some of which may rewrite the actual content of the other embedded langs... bottom line a well designed version of this is cheaper in the long run if one of the goals is to quickly add new langs to each family

besides which I compared my hand written code to that produced by yacc/lex (and antlr to make sure I was not seeing stuff) and 1) mine is a fraction of the line count [about 90% smaller], 2) Has a much lower big-O (O(n) vs. O(n^2)), 3) is trivial to hand trace (why I would want to is any other point;-)), 4) easier to test with unit testing because you can actual get underthe hood unlike the above that is totally opaque
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top