Parsing file paths with regular expressions

J

jayharris

I'm trying to write a method that strips excess file separator
characters from a file path, so it would get something like this as
input: C:\\\Documents and Settings\\My Documents\\\\\Work\\\ and return
something more like this: C:\Documents and Settings\My Documents\Work\

My code looks like this:

private String getCorrectPath(String input) {
char fs = File.separatorChar;
String newInput;
try {
String regex = (Character.isLetter(fs) ? String.valueOf(fs) : "\\"
+ fs) + "{2,}";
newInput = input.replaceAll(regex, String.valueOf(fs))
}
catch(Exception e) {
return null;
}
return newInput;
}

This seems to work fine for something like Unix, but whenever I run it
on Windows, where the file separator is the same as the quoting
character, I get the following exception:

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(Unknown Source)
at java.util.regex.Matcher.appendReplacement(Unknown Source)
at java.util.regex.Matcher.replaceAll(Unknown Source)
at java.lang.String.replaceAll(Unknown Source)

Can anybody help me figure out what's going on?

Thanks,

Jay
 
J

jan V

My code looks like this:
catch(Exception e) {
return null;
}

This style will hurt you big time if you continue using it.. you should
always catch exception types that are as specific (tight) as possible for
the logic contained by the try block.
 
R

Roedy Green

This seems to work fine for something like Unix, but whenever I run it
on Windows, where the file separator is the same as the quoting
character, I get the following exception:

there your splitter looks like this "\\\\" to get a represent the
literal \ You have to double once for Java and once for regex. See
http://mindprod.com/jgloss/regex.html
 
J

Joan

I'm trying to write a method that strips excess file separator
characters from a file path, so it would get something like
this as
input: C:\\\Documents and Settings\\My Documents\\\\\Work\\\
and return
something more like this: C:\Documents and Settings\My
Documents\Work\

I use this and it works great.
String pp = (new File(filename)).getCanonicalPath();
 
T

Thomas Hawtin

private String getCorrectPath(String input) {
char fs = File.separatorChar;
String newInput;
try {
String regex = (Character.isLetter(fs) ? String.valueOf(fs) : "\\"
+ fs) + "{2,}";
newInput = input.replaceAll(regex, String.valueOf(fs))
}
This seems to work fine for something like Unix, but whenever I run it
on Windows, where the file separator is the same as the quoting
character, I get the following exception:

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(Unknown Source)
at java.util.regex.Matcher.appendReplacement(Unknown Source)
at java.util.regex.Matcher.replaceAll(Unknown Source)
at java.lang.String.replaceAll(Unknown Source)

Matcher.quoteReplacement is your friend. Use it like you would
PreparedStatement in JDBC (assuming JRE 5.0 or later). \ is significant
in replacement strings as well as in the regex itself. It needs to be
escaped.

Perhaps a better API design would be to throw an
IllegalArgumentException whenever the replacement text is illegal.

Tom Hawtin
 
D

David Segall

jan V said:
This style will hurt you big time if you continue using it..
How? I can see that he should produce some diagnostic output, and
possibly terminate the program, but why is it important to cater of
each exception individually?
 
J

jan V

How? why is it important to cater of each exception individually?

[The following text is taken from "Mastering Javabeans", Copyright (c) 1997
Sybex]


When a method throws a lot of different exceptions, it is tempting to
succumb to laziness and simply catch the root Exception type once
instead of laboriously specifying a catch clause for every possible
Exception subclass declared in the method's throws clause. As usual
when programming, the lazy "trick" can come back to haunt you when
you hit a bug. The problem with using a blanket catchall is that you will
stop the JVM from throwing those all-important subclasses of Runtime-
Exception at the spot where they occur. These include, among others,
the following tell-tale bug detectors par excellence:

· ArithmeticException
· ArrayIndexOutOfBoundsException
· ClassCastException
· ClassNotFoundException
· CloneNotSupportedException
· IllegalArgumentException
· IllegalMonitorStateException
· IndexOutOfBoundsException
· NullPointerException
· NumberFormatException
· SecurityException

Any Java programmer with a modicum of Java experience knows these
exceptions well, as they can be thrown by code involving daily breadand-
butter things like math, arrays, casting, cloning objects, invoking
methods, parameter passing, and threads. The problem with blanket
catchalls is that these usually have the side effect of throwing away
important information. Here is an example of a problematic catchall:

try {
// lots of code here
// more code here, can throw a whole mix of Exceptions
} catch (Exception allOfThem) { // LAZY !!
System.out.println("Oh dear, our XYZ step failed!");
}

If a common, and often bug-related, exception like a NullPointer-
Exception happened anywhere within the try block, then you would
not even have a clue that it happened because you would probably
think that some method threw a checked exception instead, and not that
an even more important unchecked exception occurred (unchecked
exceptions are all instances of classes RunTimeException and Error,
and their subclasses).

So, the lesson should be clear. Always explicitly include a catch clause
for every checked exception, so that run-time exceptions will halt your
program during development. For example, when you need to invoke
Constructor.newInstance(), you should use the following code template:

try {
object = someConstructor.newInstance(.);
} catch (InstantiationException x) {
// suitable code
} catch (IllegalAccessException x) {
// suitable code
} catch (IllegalArgumentException x) {
// suitable code
} catch (InvocationTargetException x) {
// suitable code
}

This verbosity will repay itself hundredfold in debugging hours saved.
So, in effect, this is the true lazy approach! (A good programmer is lazy,
but uses defensive coding techniques to be able to afford this laziness.)
 
K

kempshall

No kidding, but I'm not all that familiar with the regex API so I'm not
really sure what exceptions the String.replaceAll method can throw. The
StringOutOfBoundsException, for example, isn't even listed in the API
-- only the PatternSyntaxException is. I can always make the code
tighter once I have some idea of what's going on.
 
O

Oliver Wong

kempshall said:
No kidding, but I'm not all that familiar with the regex API so I'm not
really sure what exceptions the String.replaceAll method can throw. The
StringOutOfBoundsException, for example, isn't even listed in the API
-- only the PatternSyntaxException is. I can always make the code
tighter once I have some idea of what's going on.

Assuming you mean StringIndexOutOfBoundsException, this is a
RuntimeException which frequently aren't documented.

- Oliver
 
J

jan V

kempshall said:
No kidding, but I'm not all that familiar with the regex API so I'm not
really sure what exceptions the String.replaceAll method can throw. The
StringOutOfBoundsException, for example, isn't even listed in the API
-- only the PatternSyntaxException is. I can always make the code
tighter once I have some idea of what's going on.

StringIndexOutOfBoundsException is an unchecked exception.. and you need
these kinds of exceptions to halt your program (at the *very* least during
development) so that you can plug the cause of the exception. Your program
ending due to a StringIndexOutOfBoundsException is normally a sign that
you've got a bug somewhere..

"I can always make the code tighter once I have some idea of what's going
on." ... this is a very dangerous technique, because it's so easy to forget
to tighten things later.. mind you, there are a number of code quality
analysis tools which will flag catch Exception in your code.
 
T

Thomas Hawtin

I looked up the "not a bug" for this:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4689750
Assuming you mean StringIndexOutOfBoundsException, this is a
RuntimeException which frequently aren't documented.

The advice in Effective Java, IIRC, is to always document runtime
exceptions.

Often NPEs aren't documented, nor do the docs hint as to whether null is
acceptable. In fact you'll quite often get an NPE from a subsequent
method. That indicates to me that nulls and RuntimeExceptions should
have attention paid to them.
 
O

Oliver Wong

When I said this, I meant it as "Here's how it is in practice; learn to
deal with it." not "Here's the best practice that I recommend everyone
follows."
The advice in Effective Java, IIRC, is to always document runtime
exceptions.

I disagree with "always". More details below.
Often NPEs aren't documented, nor do the docs hint as to whether null is
acceptable. In fact you'll quite often get an NPE from a subsequent
method. That indicates to me that nulls and RuntimeExceptions should have
attention paid to them.

I think the majority of NullPointerExceptions come from unintentional
bugs in the code. Given that these bugs are unintentional, you can hardly
expect them to be documented. HOWEVER, in my opinion, if you're writing a
method, and you require that the parameters not be null, and you check them
and find out that they are indeed null, I recommend throwing
IllegalArgumentException instead of NullPointerException. I think the former
is much more descriptive of what the problem was, when seen in a stack
trace. Similarly, if you're writing a method which reads instance fields
which you require to not be null, but they do turn out to be null, I
recommend throwing IllegalStateException.

In both these situations, I think it's a bit overboard to put @throws
javadoc tags that explicitly say that IllegalArgumentException may be
thrown. I think it would suffice to put a comment stating your requirements
for the parameters, either in the main body of the JavaDoc comment, or in
the @param tags.

- Oliver
 
J

jan V

I think the majority of NullPointerExceptions come from unintentional
bugs in the code.

"Unintentional bugs"...?! For all those years I thought that was the only
type of bug... ;-)
in my opinion, if you're writing a
method, and you require that the parameters not be null, and you check them
and find out that they are indeed null, I recommend throwing
IllegalArgumentException instead of NullPointerException. I think the former
is much more descriptive of what the problem was, when seen in a stack
trace.

Totally with you on this one. When I'm bored, I browse my library methods to
try and find places where I could insert

if (arg == null) throw new IllegalArgumentException("...........");
In both these situations, I think it's a bit overboard to put @throws
javadoc tags that explicitly say that IllegalArgumentException may be
thrown.

I don't think so... I like my docs to be as explicit as possible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top