stuck on a REGEX (\S[^\s/>]*)

D

darrel

I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?
 
M

mikeb

darrel said:
I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?

In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

Post some examples of how you want the regex to behave, and maybe
someone can help put one together.
 
D

darrel

In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

But shouldn't this: [^/] stop it from doing that?

Here's how I want the regex to behave:

I want to find the first 'word' in the string. this would be any number of
characters in a row up to (but not including) a space, a new line, or a / or
so in this:

"hello there, how are you"

it should match 'hello'

in this:

"<blockquote>hello there, how are you"

it should match '<blockquote'

Thanks!

-Darrel
 
D

darrel

But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote
monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?
 
D

darrel

Got it!

The problem was the very next group I was using.

I had this:

(\S[^\s/>]*)
but had to add another group:
(\s|\n[^\S>]*)|(>))
which checks for whitespace/new lines OR a closing tag.
-Darrel
 
G

Guest

Use the Match Class of the regular expression object
Dim m as Match = yourRegEx.Match(string)
m will return the first match

darrel said:
But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote
monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top