My regexp stupidity needs assistance before loose all my hair!

trans. (T. Onoma) · Jan 17, 2005

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

Much obliged,
T.

Zach Dennis · Jan 17, 2005

trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Zach

Douglas Livingstone · Jan 17, 2005

I think that this is what you need: /\[[\w]+\]/

This little application might help you (not sure if it is 100% Ruby
compatible, but may be a start) called TestRexp, which you can get
here: http://regexpstudio.com/RegExpStudio.html

hth,
Douglas

Zach Dennis · Jan 17, 2005

Zach said:
trans. (T. Onoma) said:

Let me painfully honest: I hate parsing, especially w/ regexp, and I
don't care if it's because I stupid and suck at it. It shouldn't have
to be this hair pulling! Anyway... Can some one please give the
regular expression to match the first square bracket's contents. In
this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

Click to expand...

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Almost forgot, $1 is the match you are looking for.

Zach Dennis · Jan 17, 2005

Douglas said:
I think that this is what you need: /\[[\w]+\]/

Ah, this is nicer and shorter then mine... I think I will use this one
to. =)

Zach

Glenn Parker · Jan 17, 2005

trans. (T. Onoma) said:
Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

s =~ /\[([^\]]*)\]/
puts $1

trans. (T. Onoma) · Jan 17, 2005

| > Let me painfully honest: I hate parsing, especially w/ regexp, and I
| > don't care if it's because I stupid and suck at it. It shouldn't have to
| > be this hair pulling! Anyway... Can some one please give the regular
| > expression to match the first square bracket's contents. In this case it
| > would be "Hello".
| >
| > s = <<-EOS
| > [Hello]
| > This is[b.] a test.
| > [Hello.]
| > EOS
|
| The trick here is to make sure you are non-greedy.
|
| s =~ /\[([^\]]*)\]/

Thanks. I _see_ now why mine wasn't working, though I don't _understand_ why
it wasn't working. I was using the / /x extension, because I generally like
to space the parts my regexps out to read easier, but for some reason that
causes the above to match instead. Oh well, I just won't do that.

Thanks All for your responses!
T.

trans. (T. Onoma) · Jan 17, 2005

26 pm, Zach Dennis wrote:
| | trans. (T. Onoma) wrote:
| | > Let me painfully honest: I hate parsing, especially w/ regexp, and I
| | > don't care if it's because I stupid and suck at it. It shouldn't have
| | > to be this hair pulling! Anyway... Can some one please give the regular
| | > expression to match the first square bracket's contents. In this case
| | > it would be "Hello".
| | >
| | > s = <<-EOS
| | > [Hello]
| | > This is[b.] a test.
| | > [Hello.]
| | > EOS
| |
| | The trick here is to make sure you are non-greedy.
| |
| | s =~ /\[([^\]]*)\]/
|
| Thanks. I _see_ now why mine wasn't working, though I don't _understand_
| why it wasn't working. I was using the / /x extension, because I generally
| like to space the parts my regexps out to read easier, but for some reason
| that causes the above to match instead. Oh well, I just won't do that.

Oops scratch that. That's not the reason either (sigh). But I got it working
now anyway. Thanks.

T.

Douglas Livingstone · Jan 17, 2005

Ah, this is nicer and shorter then mine... I think I will use this one
to. =)

And I was thinking "ooh, Zach's looks like a better way to do it"

Douglas

Assaph Mehr · Jan 17, 2005

Thanks. I _see_ now why mine wasn't working, though I don't

_understand_ why it wasn't working. I was using the / /x extension,
because I generally like to space the parts my regexps out to
read easier, but for some reason that causes the above to
match instead. Oh well, I just won't do that.

It has todo with the pattern matching being greedy, not the /x flag.
your pattern will match a '[' then as many characters as possible -
including ']' - until a final closing ']'.
There are two solutions:
1. As shown, match any non ']'.
2. Make the match non greedy: %r{ \[(.+?)\] }x

HTH,
Assaph
ps. If you want all occurences in the string, use string#scan instead
of String#match.

Mark Hubbart · Jan 17, 2005

trans. (T. Onoma) said:
trans. (T. Onoma) said:

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it. It shouldn't have to be this
hair pulling! Anyway... Can some one please give the regular expression to
match the first square bracket's contents. In this case it would be "Hello".

s = <<-EOS
[Hello]
This is[b.] a test.
[Hello.]
EOS

Click to expand...

The trick here is to make sure you are non-greedy.

s =~ /\[([^\]]*)\]/

Or:

s =~ /\[.*?\]/

which uses the ? non-greedy modifier to ensure that only the very next
"]" is matched. For example:

str = <<EOT
[this] [is a test]
here are[some]brackets
[brackets ]
[] no words
no brackets
EOT
==>"[this] [is a test]\nhere are[some]brackets\n[brackets ]\n[] no
words\nno brackets\n"

str.each{|line| p line.scan(/\[.*?\]/)}
["[this]", "[is a test]"]
["[some]"]
["[brackets ]"]
["[]"]
[]

cheers,
Mark

trans. (T. Onoma) · Jan 17, 2005

On Monday 17 January 2005 04:51 pm, Assaph Mehr wrote:
| > Thanks. I _see_ now why mine wasn't working, though I don't
| > _understand_ why it wasn't working. I was using the / /x extension,
| > because I generally like to space the parts my regexps out to
| > read easier, but for some reason that causes the above to
| > match instead. Oh well, I just won't do that.
|
| It has todo with the pattern matching being greedy, not the /x flag.
| your pattern will match a '[' then as many characters as possible -
| including ']' - until a final closing ']'.
| There are two solutions:
| 1. As shown, match any non ']'.
| 2. Make the match non greedy: %r{ \[(.+?)\] }x
|
| HTH,
| Assaph
| ps. If you want all occurences in the string, use string#scan instead
| of String#match.

Thanks Assaph,

I had an escape character match in the regexp:

/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Just one of those things were you just over look what you think you know to
the point of seizure

T.

Assaph Mehr · Jan 17, 2005

I had an escape character match in the regexp:

/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Thats because [^`] will match 'a single character that is not `'.
When you did the zero-width lookahead, you made into 'possibly a
character, so long as it's not ` '.

Hope this makes sense

Mark Hubbart · Jan 17, 2005

I had an escape character match in the regexp:

/ [^`] \[(.+?)\] /x

That was messing it up (Don't really know why) but I just "zeroed" it:

/ (?=[^`]) \[(.+?)\] /x

And that did the trick.

Click to expand...

Thats because [^`] will match 'a single character that is not `'.
When you did the zero-width lookahead, you made into 'possibly a
character, so long as it's not ` '.

I may be reading this wrong, but I think that with the zero-width
lookahead, it is now ensuring that the first character of the match is
not a backtick. Which, since it's always going to be a square bracket,
makes the lookahead superfluous.

If you need escaping, try:
/(?: # escape sequence match
^ | [^`] # alternate: match either "start of line" or a non-backtick.
)
( # non-greedy [foo] match
\[.*?\]
)/

... then use $1. This one won't match any paired square brackets
immediately preceded by a backtick.

cheers,
Mark

John Carter · Jan 17, 2005

Let me painfully honest: I hate parsing, especially w/ regexp, and I don't
care if it's because I stupid and suck at it.

Given your other posts in this forum I cannot believe that you are stupid.

So here are some meta-hints on how to "suck less" at Regexes...

Always use the %r{}x form of regexs.

This neatly avoids the leaning toothpick syndrome when\/matching\/paths

The x modifier allows you to use white space and even comments within the
regex to make it readable. (Larry Wall of perl fame regrets he didn't make
it the default...)

My .emacs has a key-binding that will produce "=~ %r{ }x" and leave the
cursor in the middle.
(global-set-key [(control %)]
`(lambda ()
(interactive)
(insert "=~ %r{ }x")
(backward-char 4)
))

Pull the development of the regex outside the development of your app.
Unit tests are good for that, or even if you just make a wee small script
or do it on the command line or in irb.

If you are doing it on the command line beware of nasty interactions
between the string and quoting conventions of the shell and ruby.

(Speaking Unix now...)
eg. ruby -e "blah" is A Very Bad Idea. The shell will peek inside the
"blah" and do things that you really definitely don't want
happening in a regex. Solution, use single quotes, bash never looks in
side them. Downside, it means you must _never_ use single quotes in the
ruby fragment blah, but you can use double quotes.

ruby -e 'blah'

Grow the regex slowly. Start with the smallest thing, make it match.

If you immediately write down a large regex, odds on it will match
nothing.

Sheer murderous frustration lies that way.

Start small, or strip away stuff on the right hand side of the regex until
you match anything something. Then slowly start adding it back.

File.read(fileName) is cute. It allows you to pull the whole file in at
once as one string and then you can match across lines.

Be aware that since standards are such good things, everyone has their are
own one. ie. POSIX (grep) regexes are different to Emacs regexes which are
different to Ruby regexes. grep even provides too different regex
languages! Ruby and perl regexes are very similar.

It shouldn't have to be this hair pulling!

It isn't. Really. Do what I suggest and you will slowly find regexes are
really a very fun and powerful way of doing things.

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

"The notes I handle no better than many pianists. But the pauses
between the notes -
ah, that is where the art resides!' - Artur Schnabel

trans. (T. Onoma) · Jan 17, 2005

On Monday 17 January 2005 06:33 pm, John Carter wrote:
| Grow the regex slowly. Start with the smallest thing, make it match.
|
| If you immediately write down a large regex, odds on it will match
| nothing.

Ah this is my major problem. I tend to write whole chunks of code at once and
then go back and tweak to perfection. Not always the best way to go. And
regexp is a perfect example of when not to do this.

Thanks. That lesson will surely help a great deal.

T.

Bertram Scharpf · Jan 18, 2005

Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

I think that this is what you need: /\[[\w]+\]/

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Bertram

Zach Dennis · Jan 18, 2005

Bertram said:
Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

I think that this is what you need: /\[[\w]+\]/

Click to expand...

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

In a regular expression squares brackets represent a character class. A
charcter class looks for one character matching any of the characters
that make up that character class. Say you are looking for the words
"fix" or "fox" in sentence.

You could write:

/f(i|o)x/

or you could write:

/f[io]x/

You can also negate a character class, and match anything that is NOT in
the character class. You do this by starting your character class with a
carrot ^

Say you wanted to find anything f-x, but not "fox"

/f[^o]x/

this will find "fix", "fex", "fux", "fgx", etc.. but not "fox".

In the regular expression: /\[[\w]+\]/

\[ = you are looking for a literal left square bracket
[\w]+ = you are looking for a character class with any word character
one or more times
\] = you are looking for a closing right square bracket

This will find the "fix" in the sentence "This is a [fix]", but this
regular expression will fail if you do "This is a [ fix ]", because the
spaces before the "f" and after the "x" are not considered word
characters. A better regular expression is (sorry Doug, I"m taking it
back, I like mine better now):

/\[([^\]]*)\]/

which will match anything inside of square brackets. This will match:

"This is a [fix]" $1 will equal "fix"
"This is a [ fix ]" $1 will equal " fix "
"This is a [ *sentence inside of a fix* ]" $1 will equal " *sentence
inside of a fix* "

I hope this was helpful.

Zach

James Edward Gray II · Jan 18, 2005

[\w]+ = you are looking for a character class with any word character
one or more times

Shortcuts like \w define character classes, so the brackets are not
needed, as the other poster hinted at.

\w+ and [\w]+ are identical

You can put them in classed if you want, mainly to add to them:

[\w']+ match word and ' characters

Hope that helps.

James Edward Gray II

Zach Dennis · Jan 18, 2005

Bertram said:
Hi,

Am Dienstag, 18. Jan 2005, 06:26:35 +0900 schrieb Douglas Livingstone:

I think that this is what you need: /\[[\w]+\]/

Click to expand...

What are the square brackets for? As far as I see /\[\w+\]/
does, too.

Almost forgot to hit up your question...

/\[[\w]+\]/

and

/\[\w+\]/

are basically the same since \w covers a whole character class of word
characters.

Zach

[ANN] ruby_parser 2.0.0 Released	8	Oct 23, 2008
Why C Is Not My Favourite Programming Language	132	Feb 5, 2005
Ruby Weekly News 17th - 23rd January 2005	3	Jan 23, 2005
No-syntax Web-programming-IDE (was: Does turtle graphics have the wrong associations?)	0	Nov 22, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	May 1, 2007

My regexp stupidity needs assistance before loose all my hair!

trans. (T. Onoma)

Zach Dennis

Douglas Livingstone

Zach Dennis

Zach Dennis

Glenn Parker

trans. (T. Onoma)

trans. (T. Onoma)

Douglas Livingstone

Assaph Mehr

Mark Hubbart

trans. (T. Onoma)

Assaph Mehr

Mark Hubbart

John Carter

trans. (T. Onoma)

Bertram Scharpf

Zach Dennis

James Edward Gray II

Zach Dennis

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads