regexp splitting problem

B

Brett S Hallett

Hi,
I am trying to split the following line of text:

<button> "btn Exit" "Exit Button" ( note the quotes may be
" or ' , read from a file)

in such a way that I can say

txt = line.split(/regrex/)

and get back

txt[0] = <button>
txt[1] = btn Exit
txt[2] = Exit Button

my current regexp

ans = tst.split(/[\"|\']/)

does this , except that the last set is missing ! ,


txt[0] = <button>
txt[1] = btn Exit
txt[2] =

so how do I get the expression to continue processing the line ??
Thanks
 
M

Maik Schmidt

Brett said:
ans = tst.split(/[\"|\']/)
Your regex can be simplified, because within a character class the pipe
character means "match a pipe character" and not "or". Additionally, you
do not have to escape the quotes, so the resulting regex would be /["']/.
does this , except that the last set is missing ! ,
That's not totally correct. The last set isn't missing, but the 3rd set
is empty. For easier debugging try:

puts text.split(/["']/).join("\n")
so how do I get the expression to continue processing the line ??
As mentioned before: That isn't the problem. Your are searching for a
regex that splits a line into tokens. Some of the tokens are enclosed in
quotes and some are not. Both tokens can contain whitespace. I am not
sure, if your problem can easily be solved by using a single regex. If
you can, you should change your input format.

Is the first token always enclosed in [<>] characters? Are the following
tokens always enclosed in quotes? Then it would be easier to split the
line, but you still would need more than one split call. Maybe then it
would fit in a single call of scan?

Cheers,

<maik/>
 
R

Robert Klemme

Brett S Hallett said:
Hi,
I am trying to split the following line of text:

<button> "btn Exit" "Exit Button" ( note the quotes may be
" or ' , read from a file)

in such a way that I can say

txt = line.split(/regrex/)

and get back

txt[0] = <button>
txt[1] = btn Exit
txt[2] = Exit Button

my current regexp

ans = tst.split(/[\"|\']/)

does this , except that the last set is missing ! ,


txt[0] = <button>
txt[1] = btn Exit
txt[2] =

so how do I get the expression to continue processing the line ??

txt = line.scan /"[^"]*" | '[^']*' | \S+/x

robert
 
A

Alan Chen

Brett S Hallett said:
Hi,
I am trying to split the following line of text:

<button> "btn Exit" "Exit Button" ( note the quotes may be
" or ' , read from a file)

in such a way that I can say

txt = line.split(/regrex/)

and get back

txt[0] = <button>
txt[1] = btn Exit
txt[2] = Exit Button

This works for your example, but may be somewhat fragile when you go
to expand its use over a wider range of inputs...

require 'test/unit'

class TC_one < Test::Unit::TestCase
def test_01
str = %Q/<button> "btn Exit" "Exit Button"/
ans = str.split( / *[\"\'] *\"?/)

assert_equal( ["<button>", "btn Exit", "Exit Button"], ans)
end
end

Cheers,
- alan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top