Confusion about String.matches method

L

laredotornado

Hi,

I'm using Java 1.6. How would I modify my regular expression below

^Starting at \$32,000*$

so that it will match the string, "Starting at $32,000*". In other
words, I have

String regEx = "^Starting at \\$32,000*$";
String text = "Starting at $32,000*";
if (!text.matches(regEx)) {
throw new RuntimeException("does not match");
}

but the exception is always thrown. Please let me know how can I
modify my reg ex to match. Thanks, - Dave
 
A

Andreas Leitgeb

laredotornado said:
Hi,
I'm using Java 1.6. How would I modify my regular expression below
^Starting at \$32,000*$

^Starting at \$32,000\*$
so that it will match the string, "Starting at $32,000*". In other
words, I have
String regEx = "^Starting at \\$32,000*$";

String regEx = "^Starting at \\$32,000\\*$";
 
L

laredotornado

^Starting at \$32,000\*$


String regEx = "^Starting at \\$32,000\\*$";

Is there a way I can match an arbitrary string without having to
escape everything? I tried putting my token in quotes ...

String regEx = "^Starting at (\"$32,000*\")$";

but that failed to match. - Dave
 
D

Daniele Futtorovic

Is there a way I can match an arbitrary string without having to
escape everything? I tried putting my token in quotes ...

String regEx = "^Starting at (\"$32,000*\")$";

but that failed to match. - Dave

"^Starting at (\\Q" + (almost) arbitrary string + "\\E)$"
 
A

Andreas Leitgeb

laredotornado said:
Is there a way I can match an arbitrary string without having to
escape everything?

Yes:
String text = "Starting at $32,000*";
...
if ("Starting at $32,000*".equals(text)) {
// ^^^^^^^^^^^^^^^^^^^^-no special escapes needed here ;-)
}

If all you want is an exact match, then regexes are nothing but a
nuissance for *that* task.
 
I

Ian Shef

Is there a way I can match an arbitrary string without having to
escape everything? I tried putting my token in quotes ...

String regEx = "^Starting at (\"$32,000*\")$";

but that failed to match. - Dave

Have you looked at Pattern.quote(String s) in java.util.regex.Pattern ? It
may be what you want. Here is an example:

import java.util.regex.Pattern;
public class PatternQuote {
public static void main(String[] args) {
String orig = "Starting at $32,000*" ;
String quoted = "^" + Pattern.quote(orig) + "$" ;
System.out.println(orig + " became " + quoted) ;
}
}
 
R

Roedy Green

Is there a way I can match an arbitrary string without having to
escape everything? I tried putting my token in quotes ...

yes. See http://mindprod.com/jgloss/regex.html#AWKWARD
for various techniques of quoting.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
L

laredotornado

yes. Seehttp://mindprod.com/jgloss/regex.html#AWKWARD
for various techniques of quoting.
--
Roedy Green Canadian Mind Productshttp://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

K, thought I had this all rigured out thanks to everyone's
suggestions, but I still have this one RE that's failing and I can't
figure out why. I have

"G37 Convertible\n$45,750*".matches("^.*\\Q$45,750\\E.*$")

which returns false. If I remove the new line ("\n"), it matches, but
I can't guarantee my input won't contain new lines. How can I modify
my regular expression to match? Thanks, - Dave
 
J

Jim Janney

laredotornado said:
Is there a way I can match an arbitrary string without having to
escape everything? I tried putting my token in quotes ...

String regEx = "^Starting at (\"$32,000*\")$";

but that failed to match. - Dave

String regEx= "^Starting at " + Pattern.quote("$32,000*") + "$";
 
N

Nigel Wade

K, thought I had this all rigured out thanks to everyone's
suggestions, but I still have this one RE that's failing and I can't
figure out why. I have

"G37 Convertible\n$45,750*".matches("^.*\\Q$45,750\\E.*$")

which returns false. If I remove the new line ("\n"), it matches, but
I can't guarantee my input won't contain new lines. How can I modify
my regular expression to match? Thanks, - Dave

Welcome to the wonderful world of RE, and strings.

To match '\n' inside an RE you need to escape the '\', because it's a
special RE character. To escape it you precede it with '\', the RE
escape character. So what you actually need in the RE is '\\n'. But '\'
is also a special character in a string, so you need to escape each '\'
in the string - with a '\'. So, to get your '\\n' in the RE you need to
have '\\\\n' in the string.

Simple?

[Anyone who claims they understand RE is just someone who hasn't yet
realized they don't].
 
J

Joshua Cranmer

which returns false. If I remove the new line ("\n"), it matches, but
I can't guarantee my input won't contain new lines. How can I modify
my regular expression to match? Thanks, - Dave

There is a flag that you can set to treat newlines as regular characters.
 
D

Daniele Futtorovic

There is a flag that you can set to treat newlines as regular characters.

Pattern.DOTALL, to be precise. You can't use that in combination with
String#matches() however, as it is an argument to Pattern#compile. So
either go that way, or use the embedded flag, "(?s)" (put it at the
start of your regex).
 
I

Ian Shef

(e-mail address removed):

K, thought I had this all rigured out thanks to everyone's
suggestions, but I still have this one RE that's failing and I can't
figure out why. I have

"G37 Convertible\n$45,750*".matches("^.*\\Q$45,750\\E.*$")

which returns false. If I remove the new line ("\n"), it matches, but
I can't guarantee my input won't contain new lines. How can I modify
my regular expression to match? Thanks, - Dave
You have not provided sufficient information. Could the new line be located
anywhere, or only adjacent to and in front of the dollar sign?

If the answeer is "anywhere", it may be easier to discard all newlines first.
e.g.

String s ;
..
..
..
s = s.replace("\n", "") ;
..
..
..


Another way if the line terminator could be anywhere is to enable dotall
mode. This causes period to also match line terminators. See the
documentation for Pattern for how to enable this mode.
 
R

Roedy Green

So, to get your '\\n' in the RE you need to
have '\\\\n' in the string.

Oops.

If you are trying to match a eol char in a regex the two chars in ram
will be \ n

If you are creating a string literal it will be "\\n"

The extra \ is to tell Java this is not a Java literal.

The easy way to create these strings is to use Quoter.
See http://mindprod.com/applet/quoter.html

One you get the hang of it, you can write them off the top of your
head.


--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
E

Esmond Pitt


No 'oops' about it. The poster is correct.
If you are trying to match a eol char in a regex the two chars in ram
will be \ n

If you are creating a string literal it will be "\\n"

And if you are creating a regex it will be "\\\\n".
The extra \ is to tell Java this is not a Java literal.

And the doubled \\ are there to tell the regex this is a backslashed
backslash, i.e. a real backslash, not a regex escape.
The easy way to create these strings is to use Quoter.

I don't see how any piece of software can understand whether a quoted
string is for use in a regular expression or not.
 
N

Nigel Wade


Oops yourself.
If you are trying to match a eol char in a regex the two chars in ram
will be \ n

If they are, you won't match a newline. The '\' needs to be escaped in
the RE. The string in the RE needs to be \\n.
If you are creating a string literal it will be "\\n"

The extra \ is to tell Java this is not a Java literal.

The easy way to create these strings is to use Quoter.
See http://mindprod.com/applet/quoter.html

One you get the hang of it, you can write them off the top of your
head.

and, apparently, get them wrong.

I repeat what I said in my previous post:

Anyone who claims they understand RE is just someone who hasn't yet
realized they don't.
 
R

Roedy Green

No 'oops' about it. The poster is correct.


I presumed you are trying to match a single 0x0a, the usual case.

If you were going to look for it without regexes you would look for
"\n".

If you wanted a regex you want the two chars \ and n in the regex
string.

However, if you do that, Java will think \ is an escape char, so you
need to escape the escape:

"\\n"

If you were trying to scan for the pair of characters \ n
then it gets weird since \ is a escape character both in Java and in
Regex. You scan for
"\\\\n"

Someday we will stop using "in-band" controls and the goofy quoting
problems will go away. You could use two "colours" one for commands
and one for data. We are hamstringing ourselved by imagining our
programming tools are limited to TTYs.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
R

Roedy Green

I don't see how any piece of software can understand whether a quoted
string is for use in a regular expression or not.

It can't. It presumes everything are data, and quotes minimally. You
then apply your commands on top of the prequoted sample string.

try it out. It can save you quite a bit of time going cross-eyed
proofreading.

The problem with regexes is all it takes is one char off an the whole
thing does not work. You have no clue where the problem is. You
rarely find errors with syntax checking. There is no trace.
The other problem is a regex will work 90% of the time. It may be
quietly rejecting a small percentage of the strings, and you might not
notice.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.
 
G

Gene Wirchenko

On Tue, 07 Jun 2011 21:18:54 -0700, Roedy Green

[snip]
The problem with regexes is all it takes is one char off an the whole
thing does not work. You have no clue where the problem is. You
rarely find errors with syntax checking. There is no trace.
The other problem is a regex will work 90% of the time. It may be
quietly rejecting a small percentage of the strings, and you might not
notice.

There are more problems than that.

I assume that you are familiar with this quote:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

I find regexes to be less than totally useful. I sometimes have
to define a format string with substitution parameters. Here is an
example:
Per client's instruction, the total of all invoices for the current
month will be charged against the supplied credit card number on %D
unless we hear otherwise prior to that date.

The date gets substituted for the %D. There are a few rules.
There must be one and only "%D" string. "%" is an escape character
and is doubled for the literal "%".

I could write a regex for this, BUT I also have to have a routine
for executing the string substitution, and regexes do not help with
this. I do not want two rather different versions of the code. (As
it is, I have two versions of code that are somewhat similar.) More
importantly, if one routine gets changed, so should the other, and it
should be obvious how to do it.

If I wanted to add a second variable to the example above, say a
contact name, and wanted the constraint of appearing once and only
once, using a regex would get even uglier.

I could use regexes for such things as validating with no
interpretation, but such data that I have to validate usually has
trivial formatting. For example, a Canadian Postal Code is "A9A 9A9"
with some limitations on the alphabetic characters. A regex would be
overkill.

Sincerely,

Gene Wirchenko
 
M

Michael Wojcik

Roedy said:
Someday we will stop using "in-band" controls and the goofy quoting
problems will go away. You could use two "colours" one for commands
and one for data. We are hamstringing ourselved by imagining our
programming tools are limited to TTYs.

While in-band signaling in strings is a problem (in fact a number of
problems, leading to many of the most common software vulnerabilities,
such as C buffer overflows and formatting errors), color-coding causes
as many problems as it solves. Not all programmers have "normal" color
vision, for one thing.

Color-coding in programming languages has been tried, notably in the
original Smalltalk. It didn't catch on, and for good reason.

Of course, many people find color-coded views of program source
useful. But since they're optional, they don't penalize programmers
who can't use them, or don't want to.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top