Can't get subgroup of regex to repeat with +... what the ?

J

Jon

I'm trying to match these kinds of malformatted xml tags. I'm beginning
to question my sanity, so i'm posting here.

Example Strings:
=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"
=====


I've come up with this regex:
=====
/<(\w+?)(?:\s(\w+)=(\w+))+>/
=====


But when matching string B from above:
=====
md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)
=====


It will do this:
======
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0
nil
nil
nil
nil
=======


Why isn't the final + sign making the pattern "(?:\s(\w+)=(\w+))"
repeat?

As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???
 
W

Wolfgang Nádasi-donner

Jon said:
B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner
 
J

Jon Fi

Wolfgang said:
Jon said:
B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?
 
W

Wolfgang Nádasi-donner

Jon said:
Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

I would do it somehow like:

========== code ==========
texts = [ "<orderMsg biz=0>",
"<orderMsg type=7 size=0>",
"<orderMsg type=7 size=0 biz=1>"]

texts.each do |txt|
if (md=txt.match(/<(\w+?)((?:\s\w+=\w+)+)>/))
puts "\nkey '#{md[1]}' found"
md[2].scan(/\s(\w+)=(\w+)/) do |k, v|
puts " parameter '#{k}' has value '#{v}'"
end
else
puts "+++ no match for '#{txt}'"
end
end
========= result =========
key 'orderMsg' found
parameter 'biz' has value '0'

key 'orderMsg' found
parameter 'type' has value '7'
parameter 'size' has value '0'

key 'orderMsg' found
parameter 'type' has value '7'
parameter 'size' has value '0'
parameter 'biz' has value '1'
========== end ===========

Wolfgang Nádasi-Donner
 
H

Harry Kakueki

Example Strings:
=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"
=====

I've come up with this regex:
=====
/<(\w+?)(?:\s(\w+)=(\w+))+>/
=====


But when matching string B from above:
=====
md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)
=====


Why isn't the final + sign making the pattern "(?:\s(\w+)=(\w+))"
repeat?

As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???
Hi,

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on '=' . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1..-1].each {|f| p f.split("=")}

Harry
 
H

Harry Kakueki

Example Strings:
=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"
=====

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on '=' . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1..-1].each {|f| p f.split("=")}

Harry

Sorry for the double post.
This is a little cleaner and easier, I think.

C.strip.delete("<>").split(/\s/).each {|f| p f.split("=")}

Harry
 
R

Robert Klemme

Wolfgang said:
Jon said:
B=" <orderMsg type=7 size=0>"
...
/<(\w+?)(?:\s(\w+)=(\w+))+>/
...
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0
It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group?

You can't.
Or is there a better way to do this?

Probably. I am not sure what you are up to but you can use a two stage
approach like this:

texts = [
" <orderMsg biz=0>",
" <orderMsg type=7 size=0>",
" <orderMsg type=7 size=0 biz=1>",
]

texts.each do |t|
p t
md = /<([^\s>]+)((?:\s+\w+=\d+)*)/.match t

if md
tag = md[1]
attrs = md[2]

puts tag

attrs.scan(/(\w+)=(\d+)/) do |m|
print m[0], "=>", m[1], "\n"
end
end
end

Kind regards

robert
 
H

Harry Kakueki

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

If you want to use regular expressions, try 'scan'.

c=" <orderMsg type=7 size=0 biz=1>"
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top