duplicating characters in a string

R

Robert Klemme

Robert said:
2008/3/6 said:
It's only about a factor of two faster, according to the following, but
that matters sometimes.
Rehearsal ---------------------------------------------------
s.gsub(re){...} 5.970000 0.000000 5.970000 ( 6.009089)
s.gsub(re,...) 2.900000 0.060000 2.960000 ( 3.148006)
------------------------------------------ total: 8.930000sec

user system total real
s.gsub(re){...} 5.760000 0.010000 5.770000 ( 5.925049)
s.gsub(re,...) 2.800000 0.060000 2.860000 ( 2.914432)
What version did you test with? I get much more dramatic differences:

15:56:11 ~
$ ruby /c/Temp/gs.rb
Rehearsal -----------------------------------------------------------
s.gsub(re){|x| x * 2} 25.313000 0.031000 25.344000 ( 25.354000)
s.gsub(re){|x| x << x} 22.812000 0.000000 22.812000 ( 22.845000)
s.gsub(re, '\1\1') 6.516000 0.015000 6.531000 ( 6.539000)
s.gsub(re, '\&\&') 6.578000 0.016000 6.594000 ( 6.595000)
s.gsub(/./){|x| x * 2} 25.172000 0.016000 25.188000 ( 25.182000)
s.gsub(/./){|x| x << x} 22.843000 0.000000 22.843000 ( 22.857000)
s.gsub(/(.)/, '\1\1') 6.344000 0.015000 6.359000 ( 6.355000)
s.gsub(/./, '\&\&') 6.188000 0.032000 6.220000 ( 6.217000)
------------------------------------------------ total: 121.891000sec

user system total real
s.gsub(re){|x| x * 2} 25.484000 0.015000 25.499000 ( 25.502000)
s.gsub(re){|x| x << x} 22.813000 0.031000 22.844000 ( 22.856000)
s.gsub(re, '\1\1') 6.312000 0.000000 6.312000 ( 6.337000)
s.gsub(re, '\&\&') 6.359000 0.000000 6.359000 ( 6.359000)
s.gsub(/./){|x| x * 2} 25.922000 0.000000 25.922000 ( 25.994000)
s.gsub(/./){|x| x << x} 22.672000 0.015000 22.687000 ( 22.707000)
s.gsub(/(.)/, '\1\1') 6.375000 0.016000 6.391000 ( 6.389000)
s.gsub(/./, '\&\&') 6.235000 0.000000 6.235000 ( 6.239000)
16:00:22 ~
$ ruby -v
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]
16:00:26 ~
$ cat /c/Temp/gs.rb
require 'benchmark'

s = ("abc" * 1_000_000).freeze
re = /(.)/
re2 = /./

Benchmark.bmbm do |b|
b.report("s.gsub(re){|x| x * 2}") do
s.gsub(re) { |x| x * 2 }
end

b.report("s.gsub(re){|x| x << x}") do
s.gsub(re) { |x| x << x }
end

b.report("s.gsub(re, '\\1\\1')") do
s.gsub(re, "\\1\\1")
end

b.report("s.gsub(re, '\\&\\&')") do
s.gsub(re, "\\&\\&")
end


b.report("s.gsub(/./){|x| x * 2}") do
s.gsub(/./) { |x| x * 2 }
end

b.report("s.gsub(/./){|x| x << x}") do
s.gsub(/./) { |x| x << x }
end

b.report("s.gsub(/(.)/, '\\1\\1')") do
s.gsub(/(.)/, "\\1\\1")
end

b.report("s.gsub(/./, '\\&\\&')") do
s.gsub(/./, "\\&\\&")
end
end
16:00:31 ~
$

And taking your non-perl champion:

s.gsub(/(.)/, '\1\1')

(I refuse to consider any code that uses perl syntax), and pitting it
against:

??? What exactly did you mean by the above text?

Cheers

robert
 
R

Robert Klemme

Robert said:
What version did you test with? I get much more dramatic differences:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

Surprising. The slower cases in your results are all the block cases.
Does cygwin ruby perform poorly with blocks in general? Or is it because
my ruby is compiled for i686 instead of i386?

Probably. But note that blocks must be slower because they are Ruby
code while the replacement without block is done in compiled C code.

Kind regards

robert
 
W

William James

Rehearsal ---------------------------------------------------
s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
7stud 4.110000 0.000000 4.110000 ( 4.142912)
7stud 2.050000 0.000000 2.050000 ( 2.072398)
----------------------------------------- total: 15.470000sec

user system total real
s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
7stud 4.100000 0.020000 4.120000 ( 4.300772)
7stud 2.030000 0.010000 2.040000 ( 2.068190)

Pascal (FreePascal):

uses sysutils; // for timing

function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;

var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := '';
for i:=1 to 1000000 do s += 'abc';
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.

--->0.031 seconds
 
R

Robert Klemme

2008/3/11 said:
Rehearsal ---------------------------------------------------
s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
7stud 4.110000 0.000000 4.110000 ( 4.142912)
7stud 2.050000 0.000000 2.050000 ( 2.072398)
----------------------------------------- total: 15.470000sec

user system total real
s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
7stud 4.100000 0.020000 4.120000 ( 4.300772)
7stud 2.030000 0.010000 2.040000 ( 2.068190)


Pascal (FreePascal):

uses sysutils; // for timing

function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;

var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := '';
for i:=1 to 1000000 do s += 'abc';
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.

--->0.031 seconds


Your point being?

robert
 
W

William James

Pascal (FreePascal):
uses sysutils; // for timing
function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;

var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := '';
for i:=1 to 1000000 do s += 'abc';
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.
--->0.031 seconds

Your point being?

robert


Pascal (FreePascal):
uses sysutils; // for timing
function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;

var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := '';
for i:=1 to 1000000 do s += 'abc';
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.
--->0.031 seconds

Your point being?

robert


On my computer:

s.gsub(re, ...) --> 9.641000 seconds
Lua --> 1.078 seconds

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )
 
J

Joel VanderWerf

William said:
Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!

I have not been able to reproduce a benchmark in which each_byte was
faster than gsub (at least, without resorting to the loop unrolling trick).
 
R

Robert Klemme

On 11.03.2008 19:53, William James wrote:

Btw, is there any particular reason why you duplicated the mail content?
2008/3/11 said:
Rehearsal ---------------------------------------------------
s.gsub(re){...} 6.220000 0.010000 6.230000 ( 6.325230)
s.gsub(re,...) 3.020000 0.060000 3.080000 ( 3.115286)
7stud 4.110000 0.000000 4.110000 ( 4.142912)
7stud 2.050000 0.000000 2.050000 ( 2.072398)
----------------------------------------- total: 15.470000sec
user system total real
s.gsub(re){...} 5.910000 0.020000 5.930000 ( 5.993456)
s.gsub(re,...) 3.000000 0.000000 3.000000 ( 3.017119)
7stud 4.100000 0.020000 4.120000 ( 4.300772)
7stud 2.030000 0.010000 2.040000 ( 2.068190)
Pascal (FreePascal):
uses sysutils; // for timing
function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;
var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := '';
for i:=1 to 1000000 do s += 'abc';
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ' seconds' )
end.
--->0.031 seconds

Your point being?

robert


On my computer:

s.gsub(re, ...) --> 9.641000 seconds
Lua --> 1.078 seconds

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )


I do not know Lua. Does this return a copied string or does it do it
inplace?

Kind regards

robert
 
D

Drew Olson

Adam said:
If i have a string "abc" and want to make it like this "aabbcc", how do
i go about it?

I thought convert it into an array and then use string.map{|x| x << x}

though i havnt tested that yet. Is there a better way or a string method
that saves me from having to convert it into an array?

Another approach:


"abc".split(//).map{|c| c*2}.join
 
P

Paul Brannan

Some things to keep in mind:
- captures can be slow, so if you can avoid them, you can get better
performance
- looping using "for" is usually faster than using "each { ... }",
because it doesn't have to create a new scope every time the block
is called

Three alternative solutions added given the above (my computer is slower
than yours, so I reduced the number of iterations too). Interesting
that I was able to write an enumerator class in ruby that was faster
than the builtin enumerator class (which is written in C). Also
interesting is how much slower the enumerator solution is on YARV:

require 'benchmark'
require 'enumerator'

class EachByteEnumerator
def initialize(obj)
@obj = obj
end

def each(&block)
@obj.each_byte(&block)
end
end


Benchmark.bmbm do |b|
s = "abc" * 100_000
re = /(.)/

b.report("s.gsub(re) {|x| x*2}") do
s.gsub(re) {|x| x*2}
end

b.report("s.gsub(re, '\\1\\1')") do
s.gsub(re, '\1\1')
end

b.report("s.gsub(/./, '\\0\\0')") do
s.gsub(/./, '\0\0')
end

b.report("7stud1") do
new_str = ""
s.each_byte do |byte|
2.times do
new_str << byte
end
end
end

b.report("7stud2") do
new_str = ""
s.each_byte do |byte|
new_str << byte << byte
end
end

b.report("7stud_enum") do
new_str = ""
o = s.to_enum:)each_byte)
for byte in o do
new_str << byte << byte
end
end

b.report("7stud_enum2") do
new_str = ""
o = EachByteEnumerator.new(s)
for byte in o do
new_str << byte << byte
end
end
end

[pbrannan@zaphod tmp]$ ruby test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.410000 0.010000 2.420000 ( 2.431675)
s.gsub(re, '\1\1') 0.900000 0.140000 1.040000 ( 1.035942)
s.gsub(/./, '\0\0') 0.840000 0.070000 0.910000 ( 0.919175)
7stud1 1.430000 0.000000 1.430000 ( 1.430979)
7stud2 0.580000 0.010000 0.590000 ( 0.585925)
7stud_enum 0.820000 0.010000 0.830000 ( 0.837720)
7stud_enum2 0.510000 0.000000 0.510000 ( 0.511776)
----------------------------------------------- total: 7.730000sec

user system total real
s.gsub(re) {|x| x*2} 2.450000 0.000000 2.450000 ( 2.455824)
s.gsub(re, '\1\1') 0.860000 0.010000 0.870000 ( 0.861705)
s.gsub(/./, '\0\0') 0.820000 0.000000 0.820000 ( 0.818279)
7stud1 1.460000 0.010000 1.470000 ( 1.483312)
7stud2 0.590000 0.000000 0.590000 ( 0.589141)
7stud_enum 0.820000 0.000000 0.820000 ( 0.817231)
7stud_enum2 0.520000 0.000000 0.520000 ( 0.514009)

[pbrannan@zaphod tmp]$ ruby1.9 test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.880000 0.030000 2.910000 ( 2.961747)
s.gsub(re, '\1\1') 1.850000 0.120000 1.970000 ( 1.972712)
s.gsub(/./, '\0\0') 1.770000 0.040000 1.810000 ( 1.815295)
7stud1 1.040000 0.000000 1.040000 ( 1.056692)
7stud2 0.640000 0.000000 0.640000 ( 0.660844)
7stud_enum 1.710000 0.080000 1.790000 ( 1.786316)
7stud_enum2 1.640000 0.120000 1.760000 ( 1.770983)
---------------------------------------------- total: 11.920000sec

user system total real
s.gsub(re) {|x| x*2} 2.950000 0.030000 2.980000 ( 2.999795)
s.gsub(re, '\1\1') 1.910000 0.030000 1.940000 ( 1.938242)
s.gsub(/./, '\0\0') 1.760000 0.030000 1.790000 ( 1.811339)
7stud1 1.040000 0.000000 1.040000 ( 1.055916)
7stud2 0.650000 0.030000 0.680000 ( 0.685329)
7stud_enum 1.520000 0.120000 1.640000 ( 1.661215)
7stud_enum2 1.450000 0.120000 1.570000 ( 1.590461)

[pbrannan@zaphod tmp]$ ruby1.9 -v
ruby 1.9.0 (2008-01-24 revision 0) [i686-linux]
 
P

Paul Brannan

Ruby's gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( 'abc', 1000000 )

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print( os.clock() - time, "seconds" )

I don't think your code is doing what you think it is doing:

[pbrannan@zaphod tmp]$ cat test.lua
-- s = string.rep( 'abc', 100000 )
s = 'abcabcabc'

time = os.clock()
string.gsub( s, '(.)', "%1%1" )
print(s)
print( os.clock() - time, "seconds" )

[pbrannan@zaphod tmp]$ lua test.lua
abcabcabc
0 seconds
[pbrannan@zaphod tmp]$ lua -v
Lua 5.0.2 Copyright (C) 1994-2004 Tecgraf, PUC-Rio

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top