And who is to determine what "semantically significant" whitespace is? It's
hardly the job of an SCM system.
Given that all the usual suspects (C, C++, and Java) treat whitespace
pretty much the same, and source files for these are by far the vast
majority of everything that ever gets checked into these repositories,
it doesn't seem unreasonable for "C/C++/Java awareness" to be in such
tools, certainly as a plugin or option. Basically, it can just
recognize these files by extension (.c, .cc, .cpp, .h, .c+
+, .java ...) and ignore whitespace except between keywords/
identifiers (collapse to a single space), inside string literals
(leave it alone), and the first linefeed after a // comment (collapse
the contiguous whitespace there to just the linefeed).
I'm also obviously not suggesting it genuinely modify the file in the
manner described; just treat it as so modified for diff purposes.
(This can of course be implemented using a temporary copy that really
is modified that way.)
Actually what I'd like to see is pluggable diff engines (binding to
particular file types) so various language-aware diffs could be
included. A smart diff engine for a specific language could do a lot
more than ignore semantically-insignificant whitespace differences.
For example, if it borrowed from eclipse a little a Java-specialized
diff could tell whether two source files were semantically identical
modulo bigger changes, like if (foo) { bar(); } vs. if (foo) bar();
and reordered imports. Plug this into a future super-SVN and it gets
used to decide deltas on all .java files checked in. Presto: no more
complaints about false deltas.
As for the GUI builders, smarter parsing might be nice, so those can
read any code that instantiates and invokes setup-type methods on
Swing components and parse some sensible representation out of it.
Alternatively, the GUI builder should generate an intermediate format
that a standalone tool can convert to a Java source file; the
intermediate format is treated as a project source file and the Java
file as an object file. Much the way bison and yacc sources and output
get treated by the C community.