Regexp question

Mark Probert · Sep 30, 2004

Hi, Rubyists.

What is the best way of attacking field split on ';' when the string looks
like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

Simon Strandgaard · Sep 30, 2004

Hi, Rubyists.

What is the best way of attacking field split on ';' when the string looks
like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

How about something ala

irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
=> ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]

Brian Schröder · Sep 30, 2004

Mark said:
Hi, Rubyists.

What is the best way of attacking field split on ';' when the string looks
like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

Normally this would call for fixed width lookbehind,

/(?<!\\);/

but as far as I know its not included in the ruby regexp engine.

But for further clarification:
How should 'a;b\\;;c' be split?
If backslashs can be escaped (and you'd want that because otherwise you
can't have a field "b\" its more difficult.

And maybe the CSV library can help you here.

regards,

Brian

Simon Strandgaard · Sep 30, 2004

Hi, Rubyists.

What is the best way of attacking field split on ';' when the string
looks like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

Click to expand...

How about something ala

irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
=> ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]

maybe this one is better ?

irb(main):001:0> "aa;bbb\\;;abc;;d\\\\;e;f".scan(/(?:\A|

((?:\\[^.]|[^;])*)/)
{ p $1 }
"aa"
"bbb\\;"
"abc"
""
"d\\\\"
"e"
"f"
=> "aa;bbb\\;;abc;;d\\\\;e;f"
irb(main):002:0>

Mark Probert · Sep 30, 2004

Hi ..

Simon Strandgaard said:
How about something ala

irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
=> ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]

Thanks! That is close enough:

irb(main):019:0> s.scan(/(?:\\[^.]|[^;])*/).each do |it|
irb(main):020:1* next if it.empty?
irb(main):021:1> puts " --> #{it}"
irb(main):022:1> end
--> a is a word
--> b is too
--> c\; for fun
--> d -- forget it
=> ["a is a word", "", "b is too", "", "c\\; for fun", "", "d -- forget
it", "", ""]

Dany Cayouette · Sep 30, 2004

But for further clarification:
How should 'a;b\\;;c' be split?

Guess is that it should be
["a", "b\", nil, "c"]

characters escaped by backslash at semi-colon, colon and backslash i.e.

; => \; : => \: \ => \\

If backslashs can be escaped (and you'd want that because otherwise you
can't have a field "b\" its more difficult.

And maybe the CSV library can help you here.

thanks,
Dany

Dany Cayouette · Sep 30, 2004

But for further clarification:
How should 'a;b\\;;c' be split?

Click to expand...

Guess is that it should be
["a", "b\", nil, "c"]

Sorry... I meant
["a", "b\\", nil, "c"] where b\\ would utimately become b\ when the escape chars are process in the data portion

characters escaped by backslash at semi-colon, colon and backslash i.e.

; => \; : => \: \ => \\

Didn't think about that one... I thought this was simple and the problem was my lack of programming experience...

Dany

Florian Gross · Oct 1, 2004

Mark said:
Hi, Rubyists.
Moin!

What is the best way of attacking field split on ';' when the string looks
like:

s = 'a;b;c\;;d;'
s.split(/???;/)
=> ["a", "b", "c\;", "d"]

Or is it best to use s.each_byte and do it by hand?

This works, (even with escaped escape characters) but you might be
better off doing it by hand to keep complexity low:

irb(main):025:0> str = "hello;world;foo\\;bar;no escape\\\\;blar"; puts str
hello;world;foo\;bar;no escape\\;blar
=> nil
irb(main):026:0> str.scan(/(??!\\).(?:\\{2})*\\;|[^;])+/).map { |str| str.gsub(/\\(.)/, '\1') }
=> ["hello", "world", "foo;bar", "no escape\\", "blar"]

Regards,
Florian Gross

Robert Klemme · Oct 1, 2004

Mark Probert said:
Hi ..

Simon Strandgaard said:

How about something ala

irb(main):015:0> "aa;bbb\\;;abc;;d\\\\;e;".scan(/(?:\\[^.]|[^;])*;/)
=> ["aa;", "bbb\\;;", "abc;", ";", "d\\\\;", "e;"]

Click to expand...

Thanks! That is close enough:

irb(main):019:0> s.scan(/(?:\\[^.]|[^;])*/).each do |it|
irb(main):020:1* next if it.empty?
irb(main):021:1> puts " --> #{it}"
irb(main):022:1> end
--> a is a word
--> b is too
--> c\; for fun
--> d -- forget it
=> ["a is a word", "", "b is too", "", "c\\; for fun", "", "d -- forget
it", "", ""]

s = "aa;bbb\\;;abc;;d\\\\;e;" => "aa;bbb\\;;abc;;d\\\\;e;"
s.scan /(?:\\.|[^\\;])+/

Click to expand...

=> ["aa", "bbb\\;", "abc", "d\\\\", "e"]

Regards

robert

Simon Strandgaard · Oct 1, 2004

On Friday 01 October 2004 09:45, Robert Klemme wrote:
[snip]

s = "aa;bbb\\;;abc;;d\\\\;e;" => "aa;bbb\\;;abc;;d\\\\;e;"
s.scan /(?:\\.|[^\\;])+/

Click to expand...

Click to expand...

=> ["aa", "bbb\\;", "abc", "d\\\\", "e"]

If its a csv file.. shouldn't output then be?

["aa", "bbb\\;", "abc", "", "d\\\\", "e", ""]

Robert Klemme · Oct 1, 2004

Simon Strandgaard said:
On Friday 01 October 2004 09:45, Robert Klemme wrote:
[snip]

s = "aa;bbb\\;;abc;;d\\\\;e;" => "aa;bbb\\;;abc;;d\\\\;e;"
s.scan /(?:\\.|[^\\;])+/

Click to expand...

=> ["aa", "bbb\\;", "abc", "d\\\\", "e"]

Click to expand...

If its a csv file.. shouldn't output then be?

["aa", "bbb\\;", "abc", "", "d\\\\", "e", ""]

Darn! You're right. Unfortunately using "*" instead of "+" is not
sufficient: far too many empty strings are found that way.

robert

Nuby - help on string spliting	3	Sep 30, 2004
Padding strings for a clean visual print out...	5	Dec 23, 2023
Question about my projects	3	Jul 23, 2021
question about regexp	1	Jan 26, 2012
short regexp question	18	Sep 18, 2008
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
Regexp simple question	5	May 11, 2009
Can D simulated by H terminate normally?	4	Jun 12, 2023

Regexp question

Mark Probert

Simon Strandgaard

Brian Schröder

Simon Strandgaard

Mark Probert

Dany Cayouette

Dany Cayouette

Florian Gross

Robert Klemme

Simon Strandgaard

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads