regex reserved chars

R

Roedy Green

I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
M

markspace

I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?

<http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>

"Note that a different set of metacharacters are in effect inside a
character class than outside a character class. For instance, the
regular expression . loses its special meaning inside a character class,
while the expression - becomes a range forming metacharacter. "

Learn to STFW.

I really hate to use language like that, but Jesus Roedy are you kidding me?
 
A

Arne Vajhøj

I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?

Typical it is.

Regex syntax vary a bit between implementations.

So one should study the documentation.

java.util.regex.Pattern has an excellent JavaDoc.

Read it!

Arne
 
L

Lew

markspace said:
Roedy said:
I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?

<http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>

"Note that a different set of metacharacters are in effect inside a
character class than outside a character class. For instance, the
regular expression . loses its special meaning inside a character class,
while the expression - becomes a range forming metacharacter. "

Learn to STFW.

I really hate to use language like that, but Jesus Roedy are you kidding me?

http://lmgtfy.com/?q=Regular+expression+metacharacters+character+class

HTH
 
M

markspace



My point was that Roedy is a Java programmer post in a Java newsgroup,
and he didn't even look at the existing Java docs for regex patterns. I
just can't even conceive why Roedy would post such a question. I wasn't
even going to post the lmgtfy link because the Java docs are such an
obvious solution.

That said the first link when I do Google is a really excellent
discussion exactly how character classes and meta-characters work, with
even different flavors of regex discussed (Posix is a bit different, the
others seem the same.)
 
R

Roedy Green

My point was that Roedy is a Java programmer post in a Java newsgroup,
and he didn't even look at the existing Java docs for regex patterns

I spent 15 minutes looking and did not find it. You can argue that I
should have, but the fact remains I did not. I did you no harm by
asking a question. You are not obligated to answer it. The answer
may be of general interest to people who never even thought to ask the
question. If I asked you face to face you would not dream of
answering that way.

The problem is too MUCH irrelevant crap you have to wade through when
you search.

The answer to many such a question is simply the magic vocabulary that
evokes the desired info. But you have to find the information to know
the magic vocabulary. Catch 22.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
R

Roedy Green

p://lmgtfy.com/?q=Regular+expression+metacharacters+character+class

Did you check to see if the question is actually answered in there
somewhere, or just that in any sane universe it should be?


Some of that material I previously waded through without success.

This question may be easier to answer with a set of experiments.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
R

Roedy Green

ve always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].

I have not found an official source however it is claimed only [ - ^]
are reserved in character classes i.e inside [...]

I don't think that can be right. Surely $ is reserved too, and of
course \.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
R

Robert Klemme

On Wed, 06 Feb 2013 16:28:29 -0800, Roedy Green

ve always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].

I have not found an official source however it is claimed only [ - ^]
are reserved in character classes i.e inside [...]

Is http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html not official enough? Or are you missing an more explicit explanation on that page?
I don't think that can be right.

Roedy, why???
Surely $ is reserved too, and of course \.

Dot has no special meaning and neither does $. Btw. you can easily test that. Apart from that dot with special meaning does not make sense in a character class if you think about it for a moment.

Cheers

robert
 
M

markspace

I spent 15 minutes looking and did not find it. You can argue that I
should have, but the fact remains I did not.

I really am not trying to be an ass (there's enough of that here
already). But seriously I'm not kidding when I say the answer is
literally the first link when I search.

<https://www.google.com/search?q=regex+character+class>

What the FIRST link when you use that search?
I did you no harm by
asking a question. You are not obligated to answer it. The answer
may be of general interest to people who never even thought to ask the
question.

"STFW" is the answer, and it's of general interest. C.f. Eric Raymond's
"How to Ask Questions the Smart Way," which says the same thing.
If I asked you face to face you would not dream of
answering that way.

I certainly might. Learning how answer your own questions is part of
your professional development; don't waste your colleagues' time with
silly questions. There's a Dilbert cartoon about "time wasting morons."
It's funny because "time wasting morons" are common enough to have a
Dilbert about it. Don't be the time wasting moron.
The problem is too MUCH irrelevant crap you have to wade through when
you search.

Not when I search, and so I have to conclude the same is true for you.
The answer to many such a question is simply the magic vocabulary that
evokes the desired info. But you have to find the information to know
the magic vocabulary. Catch 22.

You used all the words you needed in your question. What "magic
vocabulary" are you referring too? I get on your case Roedy because I
know you've got enough experience that you should know these things already.
 
G

Gene Wirchenko

I really am not trying to be an ass (there's enough of that here
already). But seriously I'm not kidding when I say the answer is
literally the first link when I search.

There is no guarantee that search results will be given in a
particular order.
<https://www.google.com/search?q=regex+character+class>

What the FIRST link when you use that search?

<http://www.regular-expressions.info/charclass.html>
which is not

<http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>
(the link you first posted).

Roedy is right about there being a lot to wade through. I got
About 3,790,000 results (0.22 seconds)
"STFW" is the answer, and it's of general interest. C.f. Eric Raymond's
"How to Ask Questions the Smart Way," which says the same thing.

I have often done, been unable to find something of use, asked
people for links on what I needed and gotten very useful links.
I certainly might. Learning how answer your own questions is part of
your professional development; don't waste your colleagues' time with
silly questions. There's a Dilbert cartoon about "time wasting morons."
It's funny because "time wasting morons" are common enough to have a
Dilbert about it. Don't be the time wasting moron.

There are also Dilbert cartoons about people throwing hissy fits.
Don't do it.
Not when I search, and so I have to conclude the same is true for you.

You did not get anywhere near 3,790,000 results?
You used all the words you needed in your question. What "magic
vocabulary" are you referring too? I get on your case Roedy because I
know you've got enough experience that you should know these things already.

Roedy was looking for "reserved regex chars". Your search uses
"regex character class". I would not have thought of that as it does
not seem to be what I would be looking for.

After one finds something, then it might be obvious then what to
search for. Until then, it is not. I have done searches for things
that I would have thought would be on the Net, but after trying many
different things, I have had to give up on it.

Sincerely,

Gene Wirchenko
 
M

markspace


Yup, that's the first link for me too. And it has the answer Roedy wants.
Roedy was looking for "reserved regex chars". Your search uses
"regex character class". I would not have thought of that as it does
not seem to be what I would be looking for.

If you're looking for information about meta-characters in regex
character classes, shouldn't that be obvious?

At first I might try is something more specific, but after 12+ years of
Google use, it's kind of obvious to also try a more general search word
list. Trying out five or ten different searches does not take 15
minutes total, as Roedy claimed.

Yes, I realize that some things are hard to search for, but this is
certainly not one of them. And for me personally that was the first
search I tried after reviewing the Java Pattern page.

Also, I'm surprised no one has pointed this out. From the Java Pattern
page:

"For a more precise description of the behavior of regular expression
constructs, please see Mastering Regular Expressions, 3nd Edition,
Jeffrey E. F. Friedl, O'Reilly and Associates, 2006."

Chris folks, it even points you at the very authoritative source you
claim to be looking for. Did anyone ever try reading that book?

SOME things on the web or in computer science are esoteric and hard to
find more information on. This is not one of those things.
 
J

Jim Janney

Martin Gregorie said:
I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?

Typical it is.

Regex syntax vary a bit between implementations.

So one should study the documentation.

java.util.regex.Pattern has an excellent JavaDoc.
That's normally the first place I look, but it doesn't answer Roedy's
question - apart, that is, from referring to the dead tree O'Reilly book.
Expanding the 'Character Classes' description in the Pattern class-level
documentation or linking to an online source would be more useful than
the implied suggestion of buying the book from Amazon and then waiting
for delivery.

I have a copy of that book, in an earlier edition, but I usually find it
more convenient to consult regular-expressions.info, e.g.

http://www.regular-expressions.info/charclass.html

which answers the question very nicely. It also explains things like
lookbehind that the JavaDoc only hints at.
 
A

Arne Vajhøj

ve always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].

I have not found an official source however it is claimed only [ - ^]
are reserved in character classes i.e inside [...]

I don't think that can be right. Surely $ is reserved too, and of
course \.

I don't think "surely" can overrule documentation and
experimentation.

There is a reason that it is called software engineering
and not software feelings.

Arne
 
A

Arne Vajhøj

I have always treated $ ( ) * + -. ? [ \ ] ^ { | }
as reserved regex chars.
I can't find any docs that say the list is different inside[ ].
is it?

Typical it is.

Regex syntax vary a bit between implementations.

So one should study the documentation.

java.util.regex.Pattern has an excellent JavaDoc.
That's normally the first place I look, but it doesn't answer Roedy's
question - apart, that is, from referring to the dead tree O'Reilly book.
Expanding the 'Character Classes' description in the Pattern class-level
documentation or linking to an online source would be more useful than
the implied suggestion of buying the book from Amazon and then waiting
for delivery.

Actually it does explain that the special characters are different
inside and outside.

"Note that a different set of metacharacters are in effect inside a
character class than outside a character class. For instance, the
regular expression . loses its special meaning inside a character class,
while the expression - becomes a range forming metacharacter."

And that was Roedy's question.

"I can't find any docs that say the list is different inside[ ]. is it?"

It did not answer the next question: what is the special
characters inside.

Arne
 
A

Arne Vajhøj

Did you check to see if the question is actually answered in there
somewhere, or just that in any sane universe it should be?

You asked the question.

So what about you check!!!!

The first link I get has the info.

Arne
 
A

Arne Vajhøj

I spent 15 minutes looking and did not find it. You can argue that I
should have, but the fact remains I did not.

Maybe you should work on reading more careful.
The problem is too MUCH irrelevant crap you have to wade through when
you search.

That type of sorting is basic skill for programmers.

Arne
 
A

Arne Vajhøj

There is no guarantee that search results will be given in a
particular order.


<http://www.regular-expressions.info/charclass.html>
which is not

<http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>
(the link you first posted).

But both explains what Roedy asked.

The first link in Google search also answers the next question.
Roedy is right about there being a lot to wade through. I got
About 3,790,000 results (0.22 seconds)

But since he did not have to wade through it, then it is not so relevant.
You did not get anywhere near 3,790,000 results?

He probably did. But when the info was in the first then ...

(it may not always turn up as the first but what so if it was #3)
Roedy was looking for "reserved regex chars". Your search uses
"regex character class". I would not have thought of that as it does
not seem to be what I would be looking for.

Actually Roedy was not looking for reserved regex characters. He was
looking for reserves characters in regex character class.

[] is called character class.

And that is in almost any regex documentation.

http://mindprod.com/jgloss/regex.html#MULTIPLE have chosen
to call them multiples, but ...

Arne
 
A

Arne Vajhøj

I have often done, been unable to find something of use, asked
people for links on what I needed and gotten very useful links.


There are also Dilbert cartoons about people throwing hissy fits.
Don't do it.

It is perfectly fine for anyone new to Java to ask even relative
simple questions that could be easily answered from standard
documentation.

And they should get links and explanations without any negative
remarks.

But that is not really the case here.

Here we have a person (Roedy) that about 50 times per month
claim to be Java knowledgeable by posting links to his web site
as answer to questions.

There is simply a conflict between asking questions that are
explained in Java Docs and providing a web site with answers
to Java questions.

That conflict can be called out more or less elegant.

Arne
 
M

markspace

Actually it does explain that the special characters are different
inside and outside.
....
It did not answer the next question: what is the special
characters inside.

I had to double-check this myself, but it does indeed answer the
question. The section on character classes lists all the special
character classes; they even order them by precedence for you.

They are:

1. Literal escape: \
2. Grouping: []
3. Range: - (as in a-z)
4. Union (implicit): [a-e][i-o]
5. Intersection: &&

That's it. The "note" there just to remind you that this list is in
fact distinct from the previous meta characters. (It appears to me that
^ actually makes a separate token with [, [^, which is different from
the non-negated character class. That's why you can use ^ anywhere
within the character class except the first position.)

And BTW, by "double check," I mean I read the Pattern Java docs page.
That's also a skill any programmer should have: reading. All the docs,
to the end. It's just something you have to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

video cards for Java 1
ctrl-c ctril-v 14
StringBuilder for byte[] 11
abbreviated generic syntax 14
probing SSL websites 1
creating byte[] with subfields 14
jdk 1.7.0_13 is out 4
slick progress bar 5

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top