regular expression help please

P

Paul

How do I extract the name and value from the following lines:

name=paul value=10 otherstuff=123

but the line may also be:
name='hello paul' value='10' otherstuff='123'

I know it has to do with \0 \1 etc, but cant figure out how to make
the re work for both cases

Thanks
 
F

Florian Gross

Paul said:
How do I extract the name and value from the following lines:

name=paul value=10 otherstuff=123

but the line may also be:
name='hello paul' value='10' otherstuff='123'

irb(main):001:0> lines.scan(/^name=('?)(.*?)\1\s+value=('?)(\S*?)\3/)
=> [["", "paul", "", ""], ["'", "hello paul", "'", "10"]]

Things start to get more interesting when Strings can also contain
quoted delimiters however. (As in 'Don\'t use PHP!')

Regexp::English lets us solve that case relatively easily however:
irb(main):035:0> re = Regexp::English.new do
irb(main):036:1* quoted_string = quoted_text("'")
irb(main):037:1> unquoted_string = non_whitespace
irb(main):038:1> name_val = (quoted_string | unquoted_string).capture:)name)
irb(main):039:1> value_val = (quoted_string | unquoted_string).capture:)value)
irb(main):040:1> literal("name=") + name_val + whitespace +
irb(main):041:1* literal("value=") + value_val
irb(main):042:1> end
=> /name=((?x:'((?x:(?!\\).(?:\\{2})?\\'|(?!').)*)'|\S+))\s+value=((?x:'((?x:(?!\\).(?:\\{2})?\\'|(?!').)*)'|\S+))/
irb(main):051:0> lines = %{
irb(main):052:0" name='hello. I\\'m paul' value='don\\'t do that'
irb(main):053:0" name=foobar value=3
irb(main):054:0" name='drei' value='three'
irb(main):055:0" }
irb(main):070:0> lines.scan(re)
=> [["'hello. I\\'m paul'", "hello. I\\'m paul", "'don\\'t do that'", "don\\'t do that"],
["foobar", nil, "3", nil],
["'drei'", "drei", "'three'", "three"]]

Regards,
Florian Gross
 
R

Robert Klemme

Paul said:
How do I extract the name and value from the following lines:

name=paul value=10 otherstuff=123

but the line may also be:
name='hello paul' value='10' otherstuff='123'

I know it has to do with \0 \1 etc, but cant figure out how to make
the re work for both cases

Thanks

You could do:

lines = <<'EOF'
name=paul value=10 otherstuff=123
name='hello paul' value='10' otherstuff='123'
name='hello paul, it\'s nice here' value='10' otherstuff='123'
name='hello paul, don't do that' value='10' otherstuff='123'
EOF

lines.scan( %r{
(name|value|otherstuff)
=
(?: '((?:[^'\\]|\\')*)' | (\S+) )
}x ) do |m|
key = m[0]
val = (m[1]||m[2]).gsub(/\\(.)/, '\\1')
puts "key=#{key}"
puts "value='#{val}'"
end

$ ./sc.rb
key=name
value='paul'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul, it's nice here'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul, don'
key=value
value='10'
key=otherstuff
value='123'

Of course you can replicate the expression to cover all three x=y pairs.

Regards

robert
 
P

Paul

Robert Klemme said:
Paul said:
How do I extract the name and value from the following lines:

name=paul value=10 otherstuff=123

but the line may also be:
name='hello paul' value='10' otherstuff='123'

I know it has to do with \0 \1 etc, but cant figure out how to make
the re work for both cases

Thanks

You could do:

lines = <<'EOF'
name=paul value=10 otherstuff=123
name='hello paul' value='10' otherstuff='123'
name='hello paul, it\'s nice here' value='10' otherstuff='123'
name='hello paul, don't do that' value='10' otherstuff='123'
EOF

lines.scan( %r{
(name|value|otherstuff)
=
(?: '((?:[^'\\]|\\')*)' | (\S+) )
}x ) do |m|
key = m[0]
val = (m[1]||m[2]).gsub(/\\(.)/, '\\1')
puts "key=#{key}"
puts "value='#{val}'"
end

$ ./sc.rb
key=name
value='paul'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul, it's nice here'
key=value
value='10'
key=otherstuff
value='123'
key=name
value='hello paul, don'
key=value
value='10'
key=otherstuff
value='123'

Of course you can replicate the expression to cover all three x=y pairs.

Regards

robert

Thanks guy this is great.

Where would I find Regexp::English ? is it a module in the RAA?

Thanks

Paul
 
F

Florian Gross

Paul said:
Where would I find Regexp::English ? is it a module in the RAA?

I'm planning to release it Real Soon Now. Still thinking about what of
the more advanced features should be in the final release and which
shouldn't.

Until then I'll just keep using it to solve other persons problem when
the situation demands it. :)
Thanks
Paul

No problem, glad I could help!

Regards,
Florian Gross
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top