Fast implementation of XML escape

Discussion in 'Ruby' started by Bob Hutchison, Aug 3, 2007.

  1. Hi,

    Does anyone know of a fast implementation of the XML escape method
    (the one that converts '"<>& to &quot; etc.)?

    I did some benchmarking on one of my applications and the
    implementation I have, which I thought was okay -- simple minded for
    sure, but okay -- turns out to be a bottle neck in certain operations.

    I used ruby-prof with a simple test, running over a 400 character
    string 50,000 times or so. Running the profiler on version0 (below)
    took 1.39 seconds.

    def version0(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    return result
    end

    The original simple minded way was, under ruby-prof ran in 8.74 seconds:

    def version1(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    result.gsub!("&", "&amp;")
    result.gsub!("<", "&lt;")
    result.gsub!(">", "&gt;")
    result.gsub!("'", "&apos;")
    result.gsub!("\"", "&quot;")

    return result
    end

    The best I've come up with so far is, under ruby-prof ran in 3.33:

    def version2(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    result.gsub!(/[&<>'"]/) do | match |
    case match
    when '&' then return '&amp;'
    when '<' then return '&lt;'
    when '>' then return '&gt;'
    when "'" then return '&apos;'
    when '"' then return '&quote;'
    end
    end

    return result
    end

    After accounting for overhead, 3.8 times faster is good, I'd like it
    faster still. BTW, gsub is only marginally slower that gsub! I've
    tried using simple iteration, gsub with a hash to avoid the case, and
    variations, all slower to a lot slower than version 1, nothing really
    near version2 (which really was the first variation I tried).

    Any ideas?

    Cheers,
    Bob


    ----
    Bob Hutchison -- tumblelog at http://
    www.recursive.ca/so/
    Recursive Design Inc. -- weblog at http://www.recursive.ca/
    hutch
    http://www.recursive.ca/ -- works on http://www.raconteur.info/
    cms-for-static-content/home/
     
    Bob Hutchison, Aug 3, 2007
    #1
    1. Advertising

  2. 2007/8/3, Bob Hutchison <>:
    > Hi,
    >
    > Does anyone know of a fast implementation of the XML escape method
    > (the one that converts '"<>& to &quot; etc.)?
    >
    > I did some benchmarking on one of my applications and the
    > implementation I have, which I thought was okay -- simple minded for
    > sure, but okay -- turns out to be a bottle neck in certain operations.
    >
    > I used ruby-prof with a simple test, running over a 400 character
    > string 50,000 times or so. Running the profiler on version0 (below)
    > took 1.39 seconds.
    >
    > def version0(input)
    > # all kinds of other processing of input simulated by the input.dup
    > result = input.dup
    >
    > return result
    > end
    >
    > The original simple minded way was, under ruby-prof ran in 8.74 seconds:
    >
    > def version1(input)
    > # all kinds of other processing of input simulated by the input.dup
    > result = input.dup
    >
    > result.gsub!("&", "&amp;")
    > result.gsub!("<", "&lt;")
    > result.gsub!(">", "&gt;")
    > result.gsub!("'", "&apos;")
    > result.gsub!("\"", "&quot;")
    >
    > return result
    > end
    >
    > The best I've come up with so far is, under ruby-prof ran in 3.33:
    >
    > def version2(input)
    > # all kinds of other processing of input simulated by the input.dup
    > result = input.dup
    >
    > result.gsub!(/[&<>'"]/) do | match |
    > case match
    > when '&' then return '&amp;'
    > when '<' then return '&lt;'
    > when '>' then return '&gt;'
    > when "'" then return '&apos;'
    > when '"' then return '&quote;'
    > end
    > end
    >
    > return result
    > end
    >
    > After accounting for overhead, 3.8 times faster is good, I'd like it
    > faster still. BTW, gsub is only marginally slower that gsub! I've
    > tried using simple iteration, gsub with a hash to avoid the case, and
    > variations, all slower to a lot slower than version 1, nothing really
    > near version2 (which really was the first variation I tried).
    >
    > Any ideas?


    You are on the right track. There is just one thing to improve: get
    rid of "case":

    class Converter
    MAP = {
    "&" => "&amp;",
    # ...
    }

    def self.convert(s)
    s.gsub(/[&<>'"]/) do |m|
    MAP[m] || "ERROR"
    end
    end
    end

    Also, I believe x.dup.gsub! is less efficient than doing just a single x.gsub.

    Kind regards

    robert
     
    Robert Klemme, Aug 3, 2007
    #2
    1. Advertising

  3. Sorry, There are some corrections to this post. I made two STUPID
    errors, one a cut and paste error into the message, and another in
    recording my test results.

    I apologise again.

    Here is the real situation:

    ------

    Does anyone know of a fast implementation of the XML escape method
    (the one that converts '"<>& to &quot; etc.)?

    I did some benchmarking on one of my applications and the
    implementation I have, which I thought was okay -- simple minded for
    sure, but okay -- turns out to be a bottle neck in certain operations.

    I used ruby-prof with a simple test, running over a 400 character
    string 50,000 times or so. Running the profiler on version0 (below)
    took 1.39 seconds.

    def version0(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    return result
    end

    The original simple minded way was, under ruby-prof ran in 8.74 seconds:

    def version1(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    result.gsub!("&", "&amp;")
    result.gsub!("<", "&lt;")
    result.gsub!(">", "&gt;")
    result.gsub!("'", "&apos;")
    result.gsub!("\"", "&quot;")

    return result
    end

    [ the version2 that I originally posted was a cut and paste error
    from an old file (too many windows open), there should not have been
    those returns in the gsub. Then to compound things, I had made a typo
    in the test case and the loop was not run as often, so rather than
    taking 3.33 seconds it in fact took 105 seconds ]

    The simple-minded approach is the fastest I've come up with.

    Any any better ideas?

    Cheers,
    Bob


    ----
    Bob Hutchison -- tumblelog at http://
    www.recursive.ca/so/
    Recursive Design Inc. -- weblog at http://www.recursive.ca/
    hutch
    http://www.recursive.ca/ -- works on http://www.raconteur.info/
    cms-for-static-content/home/





    ----
    Bob Hutchison -- tumblelog at http://
    www.recursive.ca/so/
    Recursive Design Inc. -- weblog at http://www.recursive.ca/
    hutch
    http://www.recursive.ca/ -- works on http://www.raconteur.info/
    cms-for-static-content/home/
     
    Bob Hutchison, Aug 3, 2007
    #3
  4. Hi Robert,

    Thanks for the response.


    On 3-Aug-07, at 8:47 AM, Robert Klemme wrote:
    >
    > You are on the right track. There is just one thing to improve: get
    > rid of "case":
    >
    > class Converter
    > MAP = {
    > "&" => "&amp;",
    > # ...
    > }
    >
    > def self.convert(s)
    > s.gsub(/[&<>'"]/) do |m|
    > MAP[m] || "ERROR"
    > end
    > end
    > end



    My version3 was:

    @@convert = {
    '&' => '&amp;',
    '<' => '&lt;',
    '>' => '&gt;',
    "'" => '&apos;',
    '"' => '&quot;'
    }

    def version3(input)
    # all kinds of other processing of input simulated by the input.dup
    result = input.dup

    result.gsub!(/[&<>'"]/) do | match |
    @@convert[match]
    end

    return result
    end

    which is pretty much what you suggested, but it takes 29.71 seconds
    (rather than 8.74 seconds).

    >
    > Also, I believe x.dup.gsub! is less efficient than doing just a
    > single x.gsub.


    That dup is just so I could 1) simulate some extra work that I had to
    do; and 2) make sure I didn't permanently change the test string with
    the gsub!.

    Thanks!

    Cheers,
    Bob

    >
    > Kind regards
    >
    > robert
    >


    ----
    Bob Hutchison -- tumblelog at http://
    www.recursive.ca/so/
    Recursive Design Inc. -- weblog at http://www.recursive.ca/
    hutch
    http://www.recursive.ca/ -- works on http://www.raconteur.info/
    cms-for-static-content/home/
     
    Bob Hutchison, Aug 3, 2007
    #4
  5. 2007/8/3, Bob Hutchison <>:
    > Hi Robert,
    >
    > Thanks for the response.
    >
    >
    > On 3-Aug-07, at 8:47 AM, Robert Klemme wrote:
    > >
    > > You are on the right track. There is just one thing to improve: get
    > > rid of "case":
    > >
    > > class Converter
    > > MAP = {
    > > "&" => "&amp;",
    > > # ...
    > > }
    > >
    > > def self.convert(s)
    > > s.gsub(/[&<>'"]/) do |m|
    > > MAP[m] || "ERROR"
    > > end
    > > end
    > > end

    >
    >
    > My version3 was:
    >
    > @@convert = {
    > '&' => '&amp;',
    > '<' => '&lt;',
    > '>' => '&gt;',
    > "'" => '&apos;',
    > '"' => '&quot;'
    > }
    >
    > def version3(input)
    > # all kinds of other processing of input simulated by the input.dup
    > result = input.dup
    >
    > result.gsub!(/[&<>'"]/) do | match |
    > @@convert[match]
    > end
    >
    > return result
    > end
    >
    > which is pretty much what you suggested, but it takes 29.71 seconds
    > (rather than 8.74 seconds).


    The overhead of the block form is significant. It probably pays off
    when you have longer strings. That depends.

    Kind regards

    robert
     
    Robert Klemme, Aug 3, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jens Mander
    Replies:
    0
    Views:
    506
    Jens Mander
    Jun 10, 2005
  2. Replies:
    0
    Views:
    672
  3. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    569
  4. Juha Nieminen
    Replies:
    22
    Views:
    1,036
    Kai-Uwe Bux
    Oct 12, 2007
  5. slomo
    Replies:
    5
    Views:
    1,546
    Duncan Booth
    Dec 2, 2007
Loading...

Share This Page