Small regexp question

F

francisrammeloo

Hi all,

I am writing some refactoring code for a C++ project.

I need to change:

class MyClass
{
...
}

to:

class IMP_EXP MyClass
{
...
}

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
with ";"

But it doesn't work... Why?

I ended up using: if line !~ /;/ and line =~ /^\s*class\s+(\w+)/

Hints?

Any help will be appreciated,
Best regards,

Francis
 
W

William James

Hi all,

I am writing some refactoring code for a C++ project.

I need to change:

class MyClass
{
...
}

to:

class IMP_EXP MyClass
{
...
}

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
with ";"

But it doesn't work... Why?

I ended up using: if line !~ /;/ and line =~ /^\s*class\s+(\w+)/

Hints?

Any help will be appreciated,
Best regards,

Francis

/^\s*class\s+(\w+)\s*$/
 
A

Antonin AMAND

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(e-mail address removed) a =E9crit :
Hi all,
=20
I am writing some refactoring code for a C++ project.
=20
I need to change:
=20
class MyClass
{
...
}
=20
to:
=20
class IMP_EXP MyClass
{
...
}
=20
The pattern I used to find a class definition line is:
=20
line =3D~ /^\s*class\s+(\w+)/
=20
But I want to exclude forward class declarations ( class MyClass; )
=20
So I changed my pattern to:
=20
line =3D~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line end= s
with ";"
=20
But it doesn't work... Why?
=20
I ended up using: if line !~ /;/ and line =3D~ /^\s*class\s+(\w+)/
=20
Hints?
=20
Any help will be appreciated,
Best regards,
=20
Francis
=20
=20

puts "ok" if "class MyClass;".match(/^\s*class\s+(\w+)\s*[^;]/)

=3D> "ok"

it works for me.

The problem may be somewhere else.

Antonin.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEEV+IrvKyD2MLOwsRAoY0AJ4nPyzvD8ZrfGviBWBmWewOu6GuQgCfcFSG
LwkwT0uAahhQbOg+7/eKwwI=3D
=3DOzYv
-----END PGP SIGNATURE-----
 
X

Xavier Noria

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
with ";"

But it doesn't work... Why?

I don't know exactly in what sense it does not work, but negations in
regexps are tricky.

A regexp engine *always* tries to match. If in a first attempt \w+
matches the whole class name and then the rest does not match, then
the regexp engine backtracks and happens to find a "shorter class
name" whose remaining characters are not semicolons, so it still
matches.

class Foo; (\w+ -> "Foo", fails, backtrack)
^
class Foo; (\w+ -> "Fo", no whitespace, "o" is not a semicolon,
matched)
^

A solution is to add an anchor for end of string. Another one is to
prevent \w+ from backtracking, that is known as "atomic grouping":

(?>\w+) # grab word characters and do not backtrack

In addition, the idiomatic way to say "and at this point I don't what
this to happen" is to use a negative look-ahead assertion. All in all
we get this:

/^\s*class\s+(?>\w+)(?!\s*;)/

-- fxn
 
R

Robert Klemme

Hi all,

I am writing some refactoring code for a C++ project.

I need to change:

class MyClass
{
...
}

to:

class IMP_EXP MyClass
{
...
}

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
with ";"

But it doesn't work... Why?

Because the match simply stops before the ";".
line = 'class Foo;' => "class Foo;"
line[/^\s*class\s+(\w+)\s*[^;]/]
=> "class Foo"

If you want to make sure there is no ";" between the class name and the
end of the line you need to anchor the RX at the end:
line = 'class Foo;' => "class Foo;"
line[/^\s*class\s+(\w+)[^;]*$/] => nil
line = 'class Foo' => "class Foo"
line[/^\s*class\s+(\w+)[^;]*$/]
=> "class Foo"

Kind regards

robert
 
B

benjohn

I need to change:
Don't you want to be looking _for_

class xxxx {

Where you can have at least one white space between class and xxxx, and
any ammount of white space between xxxx and { (I think none is allowable
too)? Any white space includes new lines too, as the following are all
valid class declarations:

class
AClass
{

class AClass {

class
AClass {

and I think even...
class
AClass{

? :) Or do you want to get the job done, rather than getting a perfect
solution? :)

I was playing with regexp yesterday, and wanted to have a pattern match
over multiple lines, but couldn't see how that is done (A friend wanted
a simple way of stripping out c comments, and they can over multiple
lines, of course). Could someone give me a hint on that?

Cheers,
Benjohn
 
R

Ross Bamford

Hi all,

I am writing some refactoring code for a C++ project.

I need to change:

class MyClass
{
...
}

to:

class IMP_EXP MyClass
{
...
}

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/ --> don't match if line ends
with ";"

But it doesn't work... Why?

Your regexp is trying to match:

+ zero or more spaces
+ the word 'class'
+ one or more spaces
+ one or more word characters (captured)
+ zero or more spaces
+ any single character except ';'

By the time you get to that ';' there likely won't be any input left, so
no character to be something except ';'. You could do it with lookahead,
but it's probably easier to do:

"class MyClass;" =~ /class\s+(\w+)[^;]*$/
# => nil

"class MyClass" =~ /class\s+(\w+)[^;]*$/
# => 0

"class MyClass {" =~ /class\s+(\w+)[^;]*$/
# => 0

"class MyClass { /* etc */ }" =~ /class\s+(\w+)[^;]*$/
# => 0

There are probably still things this will miss though. For example,
strange class names could well result in a failure to match...
 
F

francisrammeloo

irb(main):001:0> line = "class MyClass;"
=> "class MyClass;"
irb(main):002:0> line =~ /^\s*class\s+(\w+)\s*[^;]/
=> 0
irb(main):003:0> puts $1
MyClas
=> nil
 
L

Logan Capaldo

--Apple-Mail-35-496073858
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed


/^\s*class\s+(\w+)\s*[^;]/

This would match:
"class A ;" for instance

\s* can match the empty string which is followed by a space which is
not a semi-colon, so hey it matches!



--Apple-Mail-35-496073858--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top