Regular Expressions

J

Justin To

Hello! I'm trying this problem that says I must match versions in a CSV
file,

could be anything like:

v.6.0.3-3
aajd4-43_3
ABCD 5.0
ABCDv.5.0
A 3.40
...

With the other fields in mind, I thought "heck, looks like versions are
the only ones that contain a series of letters, digits, periods,
underscores and dashes..."

I'm pretty new to Ruby so I don't have very much experience with regular
expressions. Is it possible to make just one regular expression to
fulfill my problem? I need a regular expression that will return:

v.6.0.3-3: true because there's a v followed by a series of '.' and
digits
aajd4-43_3: true because there's a series of digits, '-', and '_'
ABCD 5.0: true because there's a series of digits and '.'
ABCDv.5.0: true...
A 3.40: true...

Thanks for the help!
 
J

Jesús Gabriel y Galán

Hello! I'm trying this problem that says I must match versions in a CSV
file,

could be anything like:

v.6.0.3-3
aajd4-43_3
ABCD 5.0
ABCDv.5.0
A 3.40
...

With the other fields in mind, I thought "heck, looks like versions are
the only ones that contain a series of letters, digits, periods,
underscores and dashes..."

I'm pretty new to Ruby so I don't have very much experience with regular
expressions. Is it possible to make just one regular expression to
fulfill my problem? I need a regular expression that will return:

v.6.0.3-3: true because there's a v followed by a series of '.' and
digits
aajd4-43_3: true because there's a series of digits, '-', and '_'
ABCD 5.0: true because there's a series of digits and '.'
ABCDv.5.0: true...
A 3.40: true...

I think there's some information missing here: how many of
these characters form a "series"? More than 1? Do they
have to be interleaved in some order, like, you need digits
followed by a . a - or a _ followed by more digits, or it doesn't matter.

The simplest case: two or more of those characters in a row:

irb(main):023:0> versions = ["v.6.0.3-3", "aajd4-43_3","ABCD 5.0",
"ABCDv.5.0", "A 3.40"]
=> ["v.6.0.3-3", "aajd4-43_3", "ABCD 5.0", "ABCDv.5.0", "A 3.40"]
irb(main):024:0> r = /[.-_1-9]{2,}/
=> /[.-_1-9]{2,}/
irb(main):025:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
v.6.0.3-3: true
aajd4-43_3: true
ABCD 5.0: true
ABCDv.5.0: true
A 3.40: true

1 or more digits, followed by . or _ or -, followed by one or more digits:

irb(main):030:0> r = /\d+[-._]\d+/
=> /\d+[-._]\d+/
irb(main):031:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
v.6.0.3-3: true
aajd4-43_3: true
ABCD 5.0: true
ABCDv.5.0: true
A 3.40: true

You will have to refine your requirements a little bit, in order to choose among
these (and any variations on this).

Jesus.
 
R

Robert Klemme

Hello! I'm trying this problem that says I must match versions in a CSV
file,

could be anything like:

v.6.0.3-3
aajd4-43_3
ABCD 5.0
ABCDv.5.0
A 3.40
..

With the other fields in mind, I thought "heck, looks like versions are
the only ones that contain a series of letters, digits, periods,
underscores and dashes..."

I'm pretty new to Ruby so I don't have very much experience with regular
expressions. Is it possible to make just one regular expression to
fulfill my problem? I need a regular expression that will return:

v.6.0.3-3: true because there's a v followed by a series of '.' and
digits
aajd4-43_3: true because there's a series of digits, '-', and '_'
ABCD 5.0: true because there's a series of digits and '.'
ABCDv.5.0: true...
A 3.40: true...

Thanks for the help!

Yes, that's easy, just /./ as an expression.

Seriously, it is similarly crucial what it does *not* match.

The easiest (but not most efficient approach) would be to create on
alternative for each variant you have, like

%r{
^(?:
v(?:\.\d+)+-\d+
| \w+\d+-[\d_]+
| ...
)$
}x

etc.

But given the number of alternatives you present it might be difficult
to avoid also matching other stuff. At least, you'll face a pretty
complex regular expression.

Kind regards

robert
 
D

Dave Bass

Robert said:
The easiest (but not most efficient approach) would be to create on
alternative for each variant you have, like

%r{
^(?:
v(?:\.\d+)+-\d+
| \w+\d+-[\d_]+
| ...
)$
}x

etc.

The problem with regular expressions is that they can easily get out of
hand and become incomprehensible, as the above code shows (though
presumably to RK it's totally transparent).

Better to write a number of small regexps, each testing for a specific
pattern. Then combine the results with a logical OR. This can be done
using a flag variable, or an if-elsif tree, a case statement, etc.,
whatever you feel happiest with. This approach will be a lot easier to
test and debug.
 
R

Robert Klemme

2008/6/18 Dave Bass said:
Robert said:
The easiest (but not most efficient approach) would be to create on
alternative for each variant you have, like

%r{
^(?:
v(?:\.\d+)+-\d+
| \w+\d+-[\d_]+
| ...
)$
}x

etc.

The problem with regular expressions is that they can easily get out of
hand and become incomprehensible, as the above code shows (though
presumably to RK it's totally transparent).

Actually the RX I presented was not complete and was intended to
convey your point. :)
Better to write a number of small regexps, each testing for a specific
pattern. Then combine the results with a logical OR. This can be done
using a flag variable, or an if-elsif tree, a case statement, etc.,
whatever you feel happiest with. This approach will be a lot easier to
test and debug.

Depends. If you build the one RX one alternative at a time and test
during each iteration I'd say that works pretty good as well. And if
the volume of data is hight the performance advantage of a single RX
might pay off.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top