Regular expression problem

S

Sky Wong

I would like to match this pattern
<title>(any character including chinese except '<' and '>')</title>

e.g.
1) <title>一二三</title> --> should be matched
2) <title>一二<三</title> --> should not be matched

I can match both (1)&(2) by this Regular expression:
<title>[\P{Cn}]*</title>, but after I change to RE to
<title>[\P{Cn}-[<>]]*</title> (minus the '<' and '>'), (2) still can
be matched, I don't know why.

would someone help me to change the RE so that it can distinguish (1)
and (2)?
 
P

pietdejong

How I see it, is that you're specifying in your adjusted RE that it
should match an expression between <title> and </title> that contains:
Chinese characters;
the character '-' ;
the characters '<' and '>'.

Try using <title>[^<]*</title>.

Piet
 
J

John C. Bollinger

How I see it, is that you're specifying in your adjusted RE that it
should match an expression between <title> and </title> that contains:
Chinese characters;
the character '-' ;
the characters '<' and '>'.

Try using <title>[^<]*</title>.

Right, as far as I can tell, the part about matching Chinese characters
is irrelevant -- Java doesn't treat them any differently from any other
character. That revised RE should do much better, but it's not quite
there yet. Given that this looks a lot like a homework, I'll just leave
it there.


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top