String#each_*slice* methods (like Enumerable#each_slice)

  • Thread starter Aaron D. Gifford
  • Start date
A

Aaron D. Gifford

Hi,

I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don't
see an equivalent for the String class.

So I've monkey-patched String a bit like this:



## Monkeypatch String to add some each_*slice* methods:
class String
## Like Enumerable#each_slice() only it yields a string
## of chars characters (the slice):
def each_slice(chars)
self.scan(/.{1,#{chars}}/m).each do |s|
yield s
end
end

## Like Enumerable#each_slice() only it yields an array
## of Fixnum bytes from the string (the slice):
def each_byteslice(bytes)
self.bytes.to_a.each_slice(bytes) do |s|
yield s
end
end

## Like Enumerable#each_slice() only it yields a binary
## string of specified bytes (the slice):
def each_bslice(bytes)
if encoding == Encoding::BINARY
str = self
else
str = self.dup.force_encoding(Encoding::BINARY)
end
str.scan(/.{1,#{bytes}}/m).each do |s|
yield s
end
end

end



So now for the question. Is there a better way to accomplish
something similar? I'm not debating whether to do it as a monkey
patch or not--that's irrelevant to me. But is there a more efficient
way to slice up strings and iterate over fixed sized chunks?

One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.

Aaron out.
 
Q

Quintus

Am 06.04.2011 19:52, schrieb Aaron D. Gifford:
So now for the question. Is there a better way to accomplish
something similar? I'm not debating whether to do it as a monkey
patch or not--that's irrelevant to me. But is there a more efficient
way to slice up strings and iterate over fixed sized chunks?

One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.

Aaron out.

Use Enumarators:
================================
irb(main):001:0> str = "ÄÄÄÖÖÖÜÜÜ"
=> "ÄÄÄÖÖÖÜÜÜ"
irb(main):002:0> str.chars.each_slice(3){|x| p x}
["Ä", "Ä", "Ä"]
["Ö", "Ö", "Ö"]
["Ü", "Ü", "Ü"]
=> nil
irb(main):003:0> str.bytes.each_slice(3){|x| p x}
[195, 132, 195]
[132, 195, 132]
[195, 150, 195]
[150, 195, 150]
[195, 156, 195]
[156, 195, 156]
=> nil
irb(main):004:0>
================================

Vale,
Marvin
 
A

Aaron D. Gifford

Quintus said:
Use Enumarators:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D
irb(main):001:0> str =3D "=C4=C4=C4=D6=D6=D6=DC=DC=DC"
=3D> "=C4=C4=C4=D6=D6=D6=DC=DC=DC"
irb(main):002:0> str.chars.each_slice(3){|x| p x}
["=C4", "=C4", "=C4"]
["=D6", "=D6", "=D6"]
["=DC", "=DC", "=DC"]
=3D> nil
irb(main):003:0> str.bytes.each_slice(3){|x| p x}
[195, 132, 195]
[132, 195, 132]
[195, 150, 195]
[150, 195, 150]
[195, 156, 195]
[156, 195, 156]
=3D> nil
irb(main):004:0>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D

Vale,
Marvin

Yes, I agree, that can work.

As I said in my original post:
That implementation did use enumerators. But it was slower than
str.scan. Hence my asking if there was a better (faster/more
efficient) way.

I didn't try benchmarking str.chars.each_slice vs str.scan. I'll have
to check that out. Thanks for pointing that out to me!

Aaron out.
 
A

Aaron D. Gifford

Looking more closely on the use of str.scan vs str.chars.each_slice
string slicing, it appears that the best one to use depends on what
form of slice one needs.

If I need a string yielded that is a substring (a slice) vs. an array
of characters or array of bytes, then the scan method is consistently
faster on my machine. However, if I want an array of characters or
bytes, then str.chars.each_slice or str.bytes.each_slice is faster.

Most of the time for me, however, I need a substring slice.

Aaron out.
 
7

7stud --

Aaron D. Gifford wrote in post #991274:
Hi,

I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don't
see an equivalent for the String class.

How about:

str = "hello world"

while str.size > 0
substr = str.slice!(0, 3) #(offset, length)
puts "-->#{substr}<--"
end

--output:--
-->hel<--
-->lo <--
-->wor<--
-->ld<--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top