My regexp stupidity needs assistance before loose all my hair!

T

trans. (T. Onoma)

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

Much obliged,
T.
 
Z

Zach Dennis

trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS


The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Zach
 
Z

Zach Dennis

Zach said:
trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I
don't care if it's because I stupid and suck at it. It shouldn't have
to be this hair pulling! Anyway... Can some one please give the
regular expression to match the first square bracket's contents. In
this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS



The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/


Almost forgot, $1 is the match you are looking for.
 
G

Glenn Parker

trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS


s =~ /\[([^\]]*)\]/
puts $1
 
T

trans. (T. Onoma)

| > Let me painfully honest: I hate parsing, especially w/ regexp, and I
| > don't care if it's because I stupid and suck at it. It shouldn't have to
| > be this hair pulling! Anyway... Can some one please give the regular
| > expression to match the first square bracket's contents. In this case it
| > would be "Hello".
| >
| > s = <<-EOS
| > [Hello]
| > This is[b.] a test.
| > [Hello.]
| > EOS
|
| The trick here is to make sure you are non-greedy.
|
| s =~ /\[([^\]]*)\]/

Thanks. I _see_ now why mine wasn't working, though I don't _understand_ why
it wasn't working. I was using the / /x extension, because I generally like
to space the parts my regexps out to read easier, but for some reason that
causes the above to match instead. Oh well, I just won't do that.

Thanks All for your responses!
T.
 
T

trans. (T. Onoma)

26 pm, Zach Dennis wrote:
| | trans. (T. Onoma) wrote:
| | > Let me painfully honest: I hate parsing, especially w/ regexp, and I
| | > don't care if it's because I stupid and suck at it. It shouldn't have
| | > to be this hair pulling! Anyway... Can some one please give the regular
| | > expression to match the first square bracket's contents. In this case
| | > it would be "Hello".
| | >
| | > s = <<-EOS
| | > [Hello]
| | > This is[b.] a test.
| | > [Hello.]
| | > EOS
| |
| | The trick here is to make sure you are non-greedy.
| |
| | s =~ /\[([^\]]*)\]/
|
| Thanks. I _see_ now why mine wasn't working, though I don't _understand_
| why it wasn't working. I was using the / /x extension, because I generally
| like to space the parts my regexps out to read easier, but for some reason
| that causes the above to match instead. Oh well, I just won't do that.

Oops scratch that. That's not the reason either (sigh). But I got it working
now anyway. Thanks.

T.
 
D

Douglas Livingstone

Ah, this is nicer and shorter then mine... I think I will use this one
to. =)

And I was thinking "ooh, Zach's looks like a better way to do it" :)

Douglas
 
A

Assaph Mehr

Thanks. I _see_ now why mine wasn't working, though I don't
_understand_ why it wasn't working. I was using the / /x extension,
because I generally like to space the parts my regexps out to
read easier, but for some reason that causes the above to
match instead. Oh well, I just won't do that.


It has todo with the pattern matching being greedy, not the /x flag.
your pattern will match a '[' then as many characters as possible -
including ']' - until a final closing ']'.
There are two solutions:
1. As shown, match any non ']'.
2. Make the match non greedy: %r{ \[(.+?)\] }x

HTH,
Assaph
ps. If you want all occurences in the string, use string#scan instead
of String#match.
 
M

Mark Hubbart

trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS


The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/


Or:

s =~ /\[.*?\]/

which uses the ? non-greedy modifier to ensure that only the very next
"]" is matched. For example:


str = <<EOT
[this] [is a test]
here are[some]brackets
[brackets ]
[] no words
no brackets
EOT
==>"[this] [is a test]\nhere are[some]brackets\n[brackets ]\n[] no
words\nno brackets\n"

str.each{|line| p line.scan(/\[.*?\]/)}
["[this]", "[is a test]"]
["[some]"]
["[brackets ]"]
["[]"]
[]

cheers,
Mark
 
T

trans. (T. Onoma)

On Monday 17 January 2005 04:51 pm, Assaph Mehr wrote:
| > Thanks. I _see_ now why mine wasn't working, though I don't
| > _understand_ why it wasn't working. I was using the / /x extension,
| > because I generally like to space the parts my regexps out to
| > read easier, but for some reason that causes the above to
| > match instead. Oh well, I just won't do that.
|
| It has todo with the pattern matching being greedy, not the /x flag.
| your pattern will match a '[' then as many characters as possible -
| including ']' - until a final closing ']'.
| There are two solutions:
| 1. As shown, match any non ']'.
| 2. Make the match non greedy: %r{ \[(.+?)\] }x
|
| HTH,
| Assaph
| ps. If you want all occurences in the string, use string#scan instead
| of String#match.

Thanks Assaph,

I had an escape character match in the regexp:

/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Just one of those things were you just over look what you think you know to
the point of seizure ;)

T.
 
A

Assaph Mehr

I had an escape character match in the regexp:
/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Thats because [^`] will match 'a single character that is not `'.
When you did the zero-width lookahead, you made into 'possibly a
character, so long as it's not ` '.

Hope this makes sense :)
 
M

Mark Hubbart

I had an escape character match in the regexp:

/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Thats because [^`] will match 'a single character that is not `'.
When you did the zero-width lookahead, you made into 'possibly a
character, so long as it's not ` '.

I may be reading this wrong, but I think that with the zero-width
lookahead, it is now ensuring that the first character of the match is
not a backtick. Which, since it's always going to be a square bracket,
makes the lookahead superfluous.

If you need escaping, try:
/(?: # escape sequence match
^ | [^`] # alternate: match either "start of line" or a non-backtick.
)
( # non-greedy [foo] match
\[.*?\]
)/

... then use $1. This one won't match any paired square brackets
immediately preceded by a backtick.

cheers,
Mark
 
J

John Carter

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it.

Given your other posts in this forum I cannot believe that you are stupid.

So here are some meta-hints on how to "suck less" at Regexes...

Always use the %r{}x form of regexs.

This neatly avoids the leaning toothpick syndrome when\/matching\/paths

The x modifier allows you to use white space and even comments within the
regex to make it readable. (Larry Wall of perl fame regrets he didn't make
it the default...)

My .emacs has a key-binding that will produce "=~ %r{ }x" and leave the
cursor in the middle.
(global-set-key [(control %)]
`(lambda ()
(interactive)
(insert "=~ %r{ }x")
(backward-char 4)
))

Pull the development of the regex outside the development of your app.
Unit tests are good for that, or even if you just make a wee small script
or do it on the command line or in irb.

If you are doing it on the command line beware of nasty interactions
between the string and quoting conventions of the shell and ruby.

(Speaking Unix now...)
eg. ruby -e "blah" is A Very Bad Idea. The shell will peek inside the
"blah" and do things that you really definitely don't want
happening in a regex. Solution, use single quotes, bash never looks in
side them. Downside, it means you must _never_ use single quotes in the
ruby fragment blah, but you can use double quotes.

ruby -e 'blah'

Grow the regex slowly. Start with the smallest thing, make it match.

If you immediately write down a large regex, odds on it will match
nothing.

Sheer murderous frustration lies that way.

Start small, or strip away stuff on the right hand side of the regex until
you match anything something. Then slowly start adding it back.

File.read(fileName) is cute. It allows you to pull the whole file in at
once as one string and then you can match across lines.

Be aware that since standards are such good things, everyone has their are
own one. ie. POSIX (grep) regexes are different to Emacs regexes which are
different to Ruby regexes. grep even provides too different regex
languages! Ruby and perl regexes are very similar.
It shouldn't have to be this hair pulling!

It isn't. Really. Do what I suggest and you will slowly find regexes are
really a very fun and powerful way of doing things.



John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

"The notes I handle no better than many pianists. But the pauses
between the notes -
ah, that is where the art resides!' - Artur Schnabel
 
T

trans. (T. Onoma)

On Monday 17 January 2005 06:33 pm, John Carter wrote:
| Grow the regex slowly. Start with the smallest thing, make it match.
|
| If you immediately write down a large regex, odds on it will match
| nothing.

Ah this is my major problem. I tend to write whole chunks of code at once and
then go back and tweak to perfection. Not always the best way to go. And
regexp is a perfect example of when not to do this.

Thanks. That lesson will surely help a great deal.

T.
 
B

Bertram Scharpf

Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:
I think that this is what you need: /\[[\w]+\]/

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Bertram
 
Z

Zach Dennis

Bertram said:
Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:
I think that this is what you need: /\[[\w]+\]/


What are the square brackets for? As far as I see /\[\w+\]/
does, too.

In a regular expression squares brackets represent a character class. A
charcter class looks for one character matching any of the characters
that make up that character class. Say you are looking for the words
"fix" or "fox" in sentence.

You could write:

/f(i|o)x/

or you could write:

/f[io]x/

You can also negate a character class, and match anything that is NOT in
the character class. You do this by starting your character class with a
carrot ^

Say you wanted to find anything f-x, but not "fox"

/f[^o]x/

this will find "fix", "fex", "fux", "fgx", etc.. but not "fox".

In the regular expression: /\[[\w]+\]/

\[ = you are looking for a literal left square bracket
[\w]+ = you are looking for a character class with any word character
one or more times
\] = you are looking for a closing right square bracket

This will find the "fix" in the sentence "This is a [fix]", but this
regular expression will fail if you do "This is a [ fix ]", because the
spaces before the "f" and after the "x" are not considered word
characters. A better regular expression is (sorry Doug, I"m taking it
back, I like mine better now):

/\[([^\]]*)\]/

which will match anything inside of square brackets. This will match:

"This is a [fix]" $1 will equal "fix"
"This is a [ fix ]" $1 will equal " fix "
"This is a [ *sentence inside of a fix* ]" $1 will equal " *sentence
inside of a fix* "

I hope this was helpful.

Zach
 
J

James Edward Gray II

[\w]+ = you are looking for a character class with any word character
one or more times

Shortcuts like \w define character classes, so the brackets are not
needed, as the other poster hinted at. ;)

\w+ and [\w]+ are identical

You can put them in classed if you want, mainly to add to them:

[\w']+ match word and ' characters

Hope that helps.

James Edward Gray II
 
Z

Zach Dennis

Bertram said:
Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:
I think that this is what you need: /\[[\w]+\]/


What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Almost forgot to hit up your question...

/\[[\w]+\]/

and

/\[\w+\]/

are basically the same since \w covers a whole character class of word
characters.

Zach
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top